Interactive comment on “ The STRatospheric Estimation Algorithm from Mainz ( STREAM ) : Estimating stratospheric NO 2 from nadir-viewing satellites by weighted convolution ”

This paper presents a good and thorough description of the STREAM algorithm for separating stratospheric and tropospheric NO2 in satellite retrievals. STREAM is a logical next step in the series of increasingly sophisticated RSMand MRSMtype STS algorithms (e.g. Richter and Burrows, 2002 to Valks et al 2011 to Bucsela et al 2013). To estimate the stratosphere, the authors use measurement-based a priori pollution amounts, cloud conditions (both coverage and height) and then make an iterative correction for tropospheric contamination that may have been missed in the previous estimate. They do extensive comparisons of STREAM relative to other algorithms as applied to OMI, GOME-2 and SCIAMACHY as well as looking at results from an in-


Introduction
Beginning with the launch of the Global Ozone Monitoring Experiment (GOME) on the ERS-2 satellite in 1995 (Burrows et al., 1999), several instruments (SCIAMACHY, OMI, GOME-2; see Table 1 for acronyms and references) perform spectrally resolved measurements of sunlight reflected by the Earth's surface and atmosphere.With differential absorption spectroscopy (DOAS) (Platt and Stutz, 2008), the column densities (denoted as "columns" henceforth) of numerous important atmospheric absorbers can be determined by their characteristic spectral "fingerprints", amongst others nitrogen dioxide (NO 2 ).
Nitrogen oxides (NO x = NO 2 + NO) play a key role in the chemistry of both the stratosphere and the troposphere.Stratospheric NO x has been a research topic for several decades due in particular to its role in ozone and halogen chemistry.
Satellite measurements provide long-term global information on spatiotemporal patterns of stratospheric NO 2 (e.g., Wenig et al., 2004;Dirksen et al., 2011).During the last decades, the analysis of tropospheric trace gases from nadirviewing satellite instruments moved more and more into focus, supported by the availability of longer time series and improved spatial resolution.Tropospheric NO 2 columns derived from satellite are nowadays widely used by the scientific community to deduce spatial patterns, source type and strength, and trends of NO x emissions from fossil fuel combustion, biomass burning, soil emissions, and lightning.
Overviews over the wide range of scientific applications of satellite-based tropospheric NO 2 products are given in, e.g., Martin (2008) or Monks and Beirle (2011).
The retrieval of tropospheric NO 2 columns from total column measurements requires the estimation and removal of the stratospheric column, a procedure we refer to as "stratosphere-troposphere separation" (STS) as in Bucsela et al. (2006).
One of the first STS algorithms is the reference sector method (RSM), which estimates the global stratospheric NO 2 fields from measurements over the remote Pacific (Richter and Burrows, 2002;Martin et al., 2002;Beirle et al., 2003), based on the assumptions of (a) longitudinal homogeneity of stratospheric NO 2 and (b) negligible tropospheric contribution over the reference region in the Pacific.This procedure is quite simple, transparent, and robust.A further side effect is that any systematic bias in the NO 2 columns, which might be introduced by the instrument (e.g., degradation or spectral interference caused by the diffusor plate used for measurements of the solar reference; Richter and Burrows, 2002) or sub-optimal spectral analysis ( van Geffen et al., 2015;Marchenko et al., 2015), is classified as stratospheric signal and thereby removed from the tropospheric column.
The RSM was applied by different groups to different satellite instruments and generally performs well.However, the resulting tropospheric NO 2 columns are affected by systematic biases caused by the following simplifying assumptions.
a.The tropospheric background column in the Pacific is very low (compared to columns over regions exposed to significant NO x sources) but not 0. Neglecting the tropospheric background results in tropospheric columns that are biased low by about some 10 14 molec cm −2 (Martin et al., 2002;Valks et al., 2011;Hilboll et al., 2013).Some algorithms explicitly correct for this tropospheric background: Martin et al. (2002) perform a correction based on GEOS-CHEM, while Valks et al. (2011) assume a constant background of 0.1 × 10 15 molec cm −2 .Other algorithms prefer to stick to the tropospheric "excess" columns, which are slightly biased low but do not need any model input (Richter and Burrows, 2002;Bucsela et al., 2006).
b.The assumption of longitudinal homogeneity is generally reasonable, at least in temporal means when smallscale stratospheric dynamic features cancel out.However, large longitudinal variations can occur in particular close to the polar vortex, as already discussed by Richter and Burrows (2002), Martin et al. (2002), and Boersma et al. (2004).Thus, tropospheric columns derived by RSM can be off by more than 10 15 molec cm −2 in winter at latitudes from 50 • polewards, thereby affecting scientific interpretations of tropospheric columns over North America or northern Europe.Note that also at low latitudes, systematic artifacts show up in tropospheric columns resulting from RSM, in particular over the Indian ocean, which are related to longitudinal inhomogeneities.
To overcome the artifacts caused by the assumption of longitudinal homogeneity, several modifications of the RSM have been proposed in recent years, while the basic approach of using nadir measurements over clean regions for STS has been retained.We refer to this group of algorithms as "modified RSM" (MRSM).MRSMs generally define a "pollution mask" of regions with potentially non-negligible tropospheric columns.Measurements over these regions are skipped within the stratospheric estimation.Thus, in order to define stratospheric columns over the masked areas, interpolation is required.For this purpose, Leue et al. (2001) and Wenig et al. (2004) applied "normalized convolution" (Knutsson and Westin, 1993), an efficient algorithm which combines interpolation and smoothing.Bucsela et al. (2006) realized interpolation by fitting harmonics (wave-2) over the "clean" areas.Valks et al. (2011) applied a zonal boxcar filter of 30 • width.
All of these algorithms applied a rather conservative masking approach for potentially polluted pixels.Continents were masked out almost completely.At northern midlatitudes, the masked area is often even larger than the area used for the stratospheric estimation, and over the Eurasian continent the MRSM algorithms miss any supporting measurement points over about 10 000 km.This can lead to significant errors during interpolation.In particular the wave fitting approach can lead to large biases (Dirksen et al., 2011).Leue et al. (2001) estimated the stratospheric fields based on clouded measurements over the ocean and subsequent interpolation.The focus on clouded observations provides a direct stratospheric measurement, as the tropospheric column is mostly shielded; thus, no further correction of the tropospheric background should be needed.However, clouded pixels possibly contain NO x produced by lightning (e.g., Beirle et al., 2006).Therefore, Wenig et al. (2004) changed the Heidelberg STS algorithm (Leue et al., 2001) by switching from clouded to cloud-free observations as input for the stratospheric estimate 1 .
Recently, Bucsela et al. (2013) proposed an MRSM which defines "unpolluted" pixels not with a fixed mask but according to the a priori expected tropospheric contribution to the total column for each individual satellite observation.This is determined from radiative transfer calculations based on a monthly mean NO 2 profile from a chemical transport model (CTM) and the actual cloud conditions.This procedure results in additional supporting points over continents in cases of clouds shielding the tropospheric column and thereby largely reduces potential interpolation artifacts.
Apart from (modified) reference sector methods, there are two further completely different approaches used for STS, which are based on (a) independent measurements or (b) CTMs.a. Coincident, but independent, stratospheric measurements are available for SCIAMACHY (Bovensmann et al., 1999).It was operated in alternating nadir/limb geometry, such that the stratospheric air masses sensed in 1 This aspect will be discussed in detail in Sect.5.4.
nadir were scanned in limb shortly before ("limb-nadir matching", LNM).This unique instrumental setup allowed for a direct stratospheric correction, although systematic offsets between limb and nadir measurements still had to be corrected empirically.STS by LNM was successfully applied for NO 2 (Sioris et al., 2004;Sierk et al., 2006;Beirle et al., 2010a;Hilboll et al., 2013) and ozone (Ebojie et al., 2014).However, such direct coincident measurements of total columns (nadir) and stratospheric concentration profiles (limb) are not available for other satellite instruments, and merging measurements from different sensors always faces the problem of spatiotemporal mismatching, requiring interpolation and photochemical corrections (compare Belmonte Rivas et al., 2014), and thus cannot be easily used for consistent long-term operational retrievals.
b. Stratospheric NO 2 concentrations provided by CTMs can be used directly for STS after empirical correction of systematic offsets between satellite and model columns, e.g., by matching both over the Pacific (Richter et al., 2005;Hilboll et al., 2013).A more sophisticated way to incorporate CTMs in STS is data assimilation (Eskes et al., 2003;Dirksen et al., 2011), in which modeled 3-D distributions of NO 2 are regularly updated such that the modeled stratospheric column is in close agreement with the satellite measurement when the tropospheric contribution (as forecasted by the CTM) is low.
In 2016, the ESA's Sentinel 5 precursor (S5p) satellite (Ingmann et al., 2012) will be launched, carrying the TROPOspheric Monitoring Instrument (TROPOMI) (Veefkind et al., 2012).The operational ("prototype") tropospheric column product of NO 2 from TROPOMI will be derived by a STS using data assimilation (van Geffen et al., 2014), based on the expertise of the Koninklijk Nederlands Meteorologisch Instituut (KNMI) as demonstrated by a 20-year record of tropospheric columns from different satellite sensors provided by the Tropospheric Emission Monitoring Internet Service (TEMIS, www.temis.nl;Boersma and Eskes, 2004;Boersma et al., 2011).
Within the S5p level 2 project, for each prototype product a "verification" product was developed in order to verify the prototype algorithms, detect possible shortcomings, and reveal potential improvements.The TROPOMI verification algorithm for NO 2 STS, the STRatospheric Estimation Algorithm from Mainz (STREAM), was developed at the Max Planck Institute for Chemistry (MPI-C), Mainz.It is an MRSM, requiring no further model input, and can thus be considered as a complementary approach to data assimilation.
STREAM does not apply a strict discrimination of "clean" vs. "polluted" satellite pixels.Instead, weighting factors are defined for each satellite pixel determining its impact on the stratospheric estimate (similar to data assimilation).In particular, clouded observations are weighted high, as they provide direct measurements of the stratospheric field.This approach dampens the small but systematically high bias of stratospheric columns estimated from total column measurements and the resulting low bias of tropospheric columns.
The paper is organized in the following way.In Sect.2, the STREAM algorithm is described in detail.Section 3 provides information on the satellite and model data sets used in this study.Section 4 analyses the performance of STREAM and its sensitivity to input parameters based on both actual satellite measurements and synthetic data.In Sect.5, the STREAM results are discussed in comparison to other STS algorithms, including the TROPOMI prototype algorithm.A general discussion on the challenges and uncertainties of STREAM in particular, and STS in general, is given, followed by conclusions (Sect.6).Several additional images and tables are provided in the Supplement and referenced by a prefix "S".

Methods
STREAM is in the tradition of MRSM algorithms that estimate the stratospheric field directly from satellite measurements for which the tropospheric contribution is considered to be negligible.For this purpose, measurements over remote regions with negligible tropospheric sources, as well as cloudy measurements, are used.In contrast to other MRSMs, however, no strict pollution mask is applied.Instead, weighting factors are used.
STREAM consists basically of two steps: 1.A set of weighting factors is calculated for each satellite pixel: a "pollution weight" that reduces the contribution of potentially polluted pixels, a "cloud weight" that increases the contribution of cloudy observations, and the "tropospheric residue (TR) weight" that adjusts the total weight in case of exceptionally large or negative TRs.The product of these weighting factors determines to what extent the associated NO 2 total columns contribute to the estimated stratospheric field (Sect.2.2).
Before describing the details of the STREAM algorithm, however, we first define the investigated quantities and abbreviations used hereafter, as summarized in Table 2.

NO 2 column densities and units
With differential optical absorption spectroscopy (DOAS; Platt and Stutz, 2008), so-called slant column densities (SCDs) S, i.e., concentrations integrated along the mean light path, are derived.SCDs are converted into VCDs (vertical column densities, i.e., vertically integrated concentrations) V via the air-mass factor (AMF) A: V = S/A.The AMF A depends on radiative transfer (determined by wavelength, atmospheric absorbers, viewing geometry, surface albedo, clouds, and aerosols) and the trace gas profile.For the stratospheric column of NO 2 , A is basically determined by viewing geometry.

Total vertical column V *
We define V * as "total" vertical column, given by the SCD S divided by the stratospheric AMF A strat : The application of the stratospheric AMF basically removes the dependencies of S on viewing angles.Over clean regions with negligible tropospheric columns, V * represents the actual total VCD and can be used for the estimation of stratospheric fields.In case of tropospheric pollution, however, V * underestimates the actual total VCD, as the AMF is generally smaller in the troposphere than in the stratosphere (see also next section).These situations are, to the best of our knowledge, excluded from the stratospheric estimate by the definition of appropriate weighting factors (see Sect. 2.2).

Stratospheric vertical column and tropospheric residue
STREAM yields an estimate for the stratospheric VCD V strat based on the assumption that V * can be considered as proxy for V strat in "clean" regions and over cloudy scenes.
In order to evaluate the performance of the stratospheric estimation, we define the TR as the difference of total and stratospheric VCDs (based on a stratospheric AMF): Tropospheric VCDs (TVCDs), which are the final product of NO 2 retrievals used for further tropospheric research, are connected to T * via the ratio of stratospheric and tropospheric AMF: For cloud-free satellite pixels, the ratio A strat / A trop typically ranges from about 1 above clean oceans at low and midlatitudes to ≈ 2-3 above moderately polluted regions and up to > 4 at high latitudes and over strong NO x sources, where NO 2 profiles peak close to the ground, causing low A trop .Figure S1   In this study, we focus on the tropospheric residue T * instead of V trop for several reasons.
1.As only stratospheric AMFs are applied, biases in the stratospheric estimation can directly be related (factor −1) to the respective biases in T * .
2. The comparison of TRs among different algorithms instead of TVCDs isolates the effect of the different STS and excludes differences in tropospheric AMFs (which are beyond the scope of this study).
3. T * can be determined and is of high interest for the evaluation of STS performance also for clouded scenes with very low tropospheric AMFs.

Version
The description given in this paper and the definition of a priori settings refer to STREAM version v0.92.

Definition of weighting factors
MRSMs usually flag satellite pixels as either clean or (potentially) polluted and skip the latter for the stratospheric esti-mation.In STREAM, instead, weighting factors for individual satellite pixels determine how strongly they are considered in the stratospheric estimation.Satellite measurements which are expected to have low/high tropospheric contribution are assigned a high/low weighting factor, respectively.

Pollution weight
In order to estimate the stratospheric NO 2 field from total column measurements, only "clean" measurements where the tropospheric column is negligible should be considered.
In cases of very high total columns (V * > 10 CDU), which clearly exceed the domain of stratospheric columns, a tropospheric contribution is obvious, and these measurements are excluded by assigning them a weighting factor of 0.
In most cases, however, the tropospheric contribution to the total column is not that easy to determine.We thus define a pollution weight w pol based on our a priori knowledge about the mean spatial distribution of tropospheric NO 2 , reflecting potentially polluted regions.For this purpose, we make use of the multiannual mean tropospheric NO 2 column as derived from SCIAMACHY (Beirle and Wagner, 2012).Based on this climatology, a "pollution proxy" P is defined as function of latitude ϑ and longitude ϕ.P indicates the regions affected by tropospheric pollution plus a "safety margin" in order to account for possible advection, while it is undefined for remote unpolluted regions.Details on the definition of P are given in the Supplement (Sect.S2.2.1), and P is displayed in Fig. S2d.The pollution weight w pol is then defined as where P is defined, and w pol = 1 elsewhere.Hence, the higher the pollution proxy P , the lower the weighting factor and the less the measurement contributes to the stratospheric estimate.Equation ( 5) is displayed in Fig. 1a, and the resulting map for w pol is shown in Fig. 2a.Large continental regions are assigned with a weight ≤ 0.1.Strongly polluted regions like the USA, Europe, or China have weights of 0.01 down to below 0.001.Note that the additional application of the tropospheric residue weight (Sect.2.2.3) further decreases the weight of satellite measurements containing high tropospheric pollution.

Cloud weight
In addition to measurements over remote regions free of tropospheric sources, clouded satellite measurements, where the tropospheric column is shielded, also provide a good proxy for the stratospheric column.Thus, the factor w cld is used to increase the weight of clouded satellite pixels.This is achieved by the following definition: with w c : = c 4 (6b) and w p : = e w c reflects the dependency on the cloud radiance fraction (CRF) c.Due to the exponent of 4, only pixels with large cloud radiance fraction obtain a high weighting factor and contribute strongly to the stratospheric estimation.w p describes the dependency on cloud pressure (CP) p cld .It is defined as a modified Gaussian (with exponent 4 instead of 2, making it flat-topped) centered at p ref = 500 hPa with a width ς = 150 hPa; i.e., only cloudy measurements at medium altitudes are assigned a high weighting factor, while high clouds (potentially contaminated by lightning NO x ) as well as low clouds (where tropospheric pollution might still be visible) are excluded.
As both w c and w p yield values in the range from 0 to 1, the factor of 2 in the exponent of Eq. (6a) sets the maximum value of w cld to 10 2 , which would compensate for pollution weights down to 10 −2 .
The dependencies of w cld on CRF and CP, as defined in Eq. ( 6), are displayed in Fig. 1b and c, respectively.The spatial pattern of w cld is shown exemplarily for OMI CP and CRF on 1 January 2005 in Fig. 2b.w cld reaches values up to 100 in several parts of the world, including regions which were pre-classified as potentially polluted, thus competing with a low w pol (Fig. 2a).

Tropospheric residue weight
STREAM yields global fields of stratospheric VCDs V strat , explained in detail below (Sect.2.3), which allow us to calculate tropospheric residues T * according to Eq. (3).While the "true" tropospheric fields are not known, the resulting T * can still be used in order to evaluate the STS performance and improve the stratospheric estimate in a second iteration, whenever T * clearly indicates an under-or overestimation of V strat .
-A high value of T * likely indicates tropospheric pollution, in particular over potentially polluted regions.
The respective satellite pixels should not be used for the stratospheric estimation.
-As negative columns are nonphysical, T * < 0 indicates that the stratospheric field has been overestimated.This happens when the weighted convolution with neighboring pixels with high total columns causes the estimated stratosphere to be even higher than the local total columns.In order to avoid this effect, consequently, the respective local total columns should be assigned a higher weighting factor such that they contribute more strongly to the stratospheric estimate.
We thus define a further weighting factor w TR , which weights down/up the pixels associated with a large positive/negative TR, respectively.It turned out, however, that the stratospheric estimate is very sensitive to the definition of w TR , and a simple definition based on the TR of individual satellite pixels can easily result in systematic artifacts.This results from T * being defined as the difference of V * and V strat (Eq.3), i.e., two quantities of the same order of magnitude with nonnegligible errors.Thus, the resulting statistical distribution of T * inevitably includes negative values.These negative values caused by statistical fluctuations must not be excluded from the probability density function in order to keep the mean unbiased, but they should also not be used as a trigger for weighting up the respective measurement within the stratospheric estimation.Thus, w TR should be only applied to significant and systematic deviations of T * from 0. This is achieved by the following settings.
1.In contrast to w cld , which is defined for each individual satellite measurement, w TR is defined based on the TRs averaged over 1 • × 1 • grid pixels; i.e., first the values of T * within one grid pixel are averaged, reducing statistical noise, before w TR is calculated, and the resulting weight is then assigned to all satellite measurements within the grid pixel.
2. w TR is only applied when the absolute value of the mean grid box T * exceeds a threshold of 0.5 CDU, which is typically larger than the spectral fitting error: 3. w TR is only applied for grid pixels where the adjacent grid pixels exceed the threshold as well.By this additional condition it is guaranteed that a single outlier in the satellite measurements cannot trigger w TR , as every satellite measurement is assigned to exactly one grid pixel (see Sect. 2.3).
4. w TR < 1, which is meant to decrease the weight of polluted pixels, is only applied over potentially polluted regions with w pol < 1.Without this additional condition, patterns of erroneously enhanced TR caused by stratospheric dynamics would even be amplified by w TR .
w TR could in principle be tuned in multiple iterations.In STREAM v0.92,only one iteration is performed, as a second iteration turned out to have marginal effect (see Sect.S4.2.5).
The dependency of w TR on TR (grid pixel average), as defined in Eq. 7, is displayed in Fig. 1d, and the resulting map for w TR on 1 January 2005 is shown in Fig. 2c.After the initial stratospheric estimate, STREAM yields high values for T * over parts of the USA, Europe, central Africa, and China, resulting in low w TR .Observations over these regions are already associated with a low pollution weight.However, due to the additional application of w TR , the net weight is lowered further by orders of magnitude, and the respective satellite pixels will hardly contribute to the stratospheric estimate in the next iteration, even in the case of high w cld .
In the initial STREAM run, the resulting TR is systematically < 0 over east Canada and Greenland, caused by the asymmetric polar vortex.Over the Labrador Sea, initial values for T * are systematically below the threshold of 0.5 CDU and thus trigger a high w TR , and the respective observations of low total VCDs contribute strongly to the stratospheric estimate in the next iteration.
Note that, due to the threshold of 0.5 CDU (criterion 2), w TR cannot correct small biases such as the expected low bias in TR caused by estimating the stratospheric column from total column measurements.

Total weight
The total weight of each satellite pixel is defined as the product of the individual weighting factors: (i.e., the logarithms as shown in Fig. 2a-c are simply added, resulting in Fig. 2d).The a priori pollution weight can still be recognized in the global pattern but is significantly modified by w TR (further reducing the overall weight over, e.g., the USA and China) and w cld , which competes with the pollution weights < 1.In some regions (e.g., west of the Great Lakes, Scotland, or the Himalayas) the cloud weight shifts the initially low w pol to a net weight > 1.
The concept of the combination of different weighting factors is easily extendible by further weights, e.g., based on fire or flash counts in order to account for NO x emissions from biomass burning or lightning.

Weighted convolution
Global daily maps of the stratospheric column are derived by applying "weighted convolution", i.e., a spatial convolution which takes the individual weights for each satellite pixel into account.This approach is an extension of the "normalized convolution" presented in Knutsson and Westin (1993).Weighted convolution at the same time smoothes and interpolates the stratospheric field.A similar approach was used by Leue et al. (2001), who applied the fitting errors of NO 2 SCDs as single weights.
The algorithm is implemented as follows.
-A lat/long grid is defined with 1 • resolution.Each satellite pixel is sorted into the matching grid pixel according to its center coordinates.At the j th latitudinal/ith longitudinal grid position, there are K satellite pixels with the total columns V ij k (k = 1 . . .K) and the weights w ij k .We define and In the case of measurement gaps (i.e., K = 0), both C ij and W ij are set to 0.
The weighted mean VCD for each grid pixel is then given as for K > 0 and undefined for K = 0 (gaps).
-A convolution kernel (CK) G is defined (see below).Spatial convolution is applied to both C and W (taking the dateline into account appropriately, i.e., i = 1 and i = 360 are adjacent grid pixels): -The smoothed stratospheric VCD for each grid pixel as derived from weighted convolution is then given as We illustrate this procedure for a simple 1-D example in the Supplement (Sect.S2.3 and Fig. S3).
The degree of smoothing is determined by the definition of the CK G, which is defined as a 2-D Gaussian in STREAM v0.92 with the longitudinal/latitudinal variances σ 2 ϕ and σ 2 ϑ , respectively.Generally, information on the stratospheric column over polluted regions should be taken from clean measurements at the same latitude.Thus, σ ϕ has to be sufficiently large, while σ ϑ has to be low as gradients in latitudinal dimension should be mostly conserved.For high latitudes, however, the longitudinal extent of the CK has to be small enough as well in order to be able to resolve the strong gradients caused by the polar vortex.
In order to meet these requirements, we implement the convolution in the following way: 1. Two CKs are defined in order to meet the different requirements for polar vs. equatorial regions (see Fig. S4): Note that the difference of the CKs, which are defined on a regular degree grid, is even more drastic in kilometer space.
3. The final stratospheric VCD is defined as the weighted mean of both, depending on latitude ϑ: By this method, spatial smoothing is wide enough at the equator (needed to interpolate, e.g., the stratosphere over central Africa) but small enough at the polar vortex.
In latitudinal direction, this procedure can cause small, but systematic, biases when stratospheric NO 2 shows significant latitudinal gradients on scales of σ lat or smaller.To overcome this, STREAM provides the (default) option to run the weighted convolution on "latitude-corrected" VCDs; i.e., the mean dependency of V * on latitude is (1) determined (again over the Pacific), ( 2) subtracted from all individual V ij k , and (3) added again to the stratospheric estimate from weighted convolution.By this procedure, latitudinal gradients are largely removed for the convolution (but not from the final stratospheric fields), and the systematic biases vanish (as shown in Sect.S2.3).

Data processing
STREAM estimates stratospheric fields and tropospheric residues for individual orbits, using NO 2 measurements of the dayside of the orbit.Note that the effect of changes of local time on stratospheric NO 2 across orbit is generally low (see Sect.S2.4) and is thus neglected within STREAM.For each orbit under investigation, the orbit itself plus the seven previous and subsequent orbits (corresponding to about ±12 h in time, or ±180 • in space (longitude), for the investigated satellite instruments in polar sun-synchronous orbits) are used for the calculation of V * , weighting factors, and thus V strat via weighted convolution.For the daily means presented in this study, all orbits where the orbit start date matches the day of interest are averaged.
Alternatively, STREAM can be run in "near-real time" (NRT) mode, in which the 14 past, but no future, orbits are included in the weighted convolution.We discuss the performance of STREAM NRT for the example of GOME-2 in Sect.5.2.STREAM v0.92, implemented as a MATLAB script at MPI-C, requires about 10 s for processing one orbit of OMI data on a normal desktop computer (3.4 GHz).Timeconsuming steps are, at about equal parts, the sorting of the satellite pixels on the global grid V ij k and the convolution process, while the time needed for the calculation of weighting factors is negligible.Several UV/vis satellite instruments provide column measurements of atmospheric NO 2 .Table 1 summarizes the characteristics of the instruments and provides references to the data products used in this study, from which the total NO 2 SCD, the stratospheric AMF, and the cloud fraction/cloud top height are taken as input for STREAM.Below we provide details on the satellite characteristics and the data sets used in this study, starting with OMI (as STREAM was optimized for OMI within TROPOMI verification) and GOME-2, followed by older instruments with particular challenges such as poor spatial coverage (SCIA-MACHY) or resolution (GOME).

OMI
In this study we mainly focus on OMI for two reasons.
1. OMI provides daily coverage with small ground pixels.
While this already results in a high number of available satellite pixels per day (> 10 6 ), the number of clouded pixels matching the requirements to cause a high w cld is also high (more than 10 5 pixels have a w cld > 5).
2. STREAM is the STS verification algorithm for TROPOMI.Algorithm testing within TROPOMI verification and comparisons to the TROPOMI prototype algorithm are performed based on actual OMI measurements.
STREAM basically requires V * (=S/A strat ) as input.For OMI, we use the level 2 "OMNO2" data product (version 3) provided by NASA (Bucsela et al., 2013) and labeled as "Standard Product 2" (SP2) therein, which provides de-striped NO 2 SCDs and stratospheric AMFs2 .In addition, quality proxies are used to exclude dubious measurements (like those affected by the "row anomaly"3 ).Also information on CRF and CP, which is needed for the calculation of w cld , is provided by the OMNO2 v003 hdf files, based on the "improved OMI O 2 -O 2 cloud algorithm" (Bucsela et al., 2013) OMCLDO2.
The NASA v003 product involves a STS algorithm based on an MRSM as well.The resulting tropospheric residues of STREAM and NASA v003 are compared and discussed in detail in Sect.5.1.2.
In addition to the NASA product, we also extract the DOMINO (version 2) level 2 data as provided by TEMIS, for two purposes.
2. DOMINO provides TM4 model profiles of NO 2 (needed for the calculation of DOMINO tropospheric AMFs).Here, we use the TM4 data in order to construct synthetic total columns of NO 2 for performance tests of STREAM (see Sect. 3.3).
Both OMI products are based on the same spectroscopic analysis; i.e., both start with the same NO 2 SCD.Note that this SCD is biased high by about 1 CDU due to shortcomings in the spectral retrieval (see van Geffen et al. (2015) and references therein).Recent algorithm refinements have removed this bias (van Geffen et al., 2015;Marchenko et al., 2015), but updated NASA or TEMIS products are not available yet.However, such an overall bias will be interpreted as stratospheric feature by STREAM and thus does not affect its performance (the same holds for the operational NASA and TEMIS STS algorithms).Still, the resulting TRs are expected to decrease slightly as the bias decreases for larger SCDs (Marchenko et al., 2015, Fig. 3 therein).

GOME-2
The GOME-2 instruments on the Metop-A and B satellites provide a time series of almost 10 years with the perspective of continuation until 2025 due to the upcoming instrument on Metop-C.GOME-2 provides a good spatial coverage with moderate satellite ground pixel size.We applied STREAM to total NO 2 columns from the operational product (GDP 4.7), as provided by DLR in the framework of the Ozone Satellite Application Facilities (O3M SAF), for Metop-A.
The operational product uses an MRSM for STS (Valks et al., 2011(Valks et al., , 2015) ) as well.We compare the results of STREAM and the GDP 4.7 algorithm in Sect.5.2.

SCIAMACHY
STREAM was applied to the SCIAMACHY VCDs retrieved at MPIC Mainz (Beirle et al., 2010a;Beirle and Wagner, 2012).While OMI provides daily global coverage, the coverage of SCIAMACHY is rather poor (only about one-sixth of the Earth per day), and ground pixels are larger than for OMI (except for swath edges).Consequently, also the number of total (about 60 000) and cloudy (about 4000) pixels per day is much lower than for OMI.Thus, SCIAMACHY can be considered as extreme test case for the performance of STREAM.
One reason for the poor spatial coverage of SCIAMACHY is the measurement mode alternating between nadir and limb geometry.This, however, provides the unique SCIAMACHY feature of a direct measurement of the stratospheric column.We thus compare the TR resulting from STREAM to the MPI-C SCIAMACHY product based on LNM (Beirle et al., 2010a), using the MPI-C retrieval scheme for NO 2 concentration profiles from limb measurements (Kühl et al., 2008) (Sect.5.3).

GOME
GOME was the first nadir-viewing spectrometer in the UV/vis spectral range with a spectral resolution enabling DOAS analyses.Due to large ground pixel size (320 km across track), only a low number of (total as well as clouded) satellite pixels per day is available.We nevertheless included GOME in this analysis in order to investigate to what extent STREAM can be applied within homogenized retrievals for multiple satellite instruments, as planned within the QA4ECV (Quality Assurance for Essential Climate Variables) project4 .We apply STREAM to the VCDs provided by TEMIS (Boersma and Eskes, 2004) and compare the resulting TRs to a simple RSM (Sect.5.4).

Model data
For comparisons, and for the calculation of synthetic total columns for performance tests of STREAM, we make use of stratospheric NO 2 as provided by the ECHAM5/MESSy Atmospheric Chemistry (EMAC) model, which is a modular global climate and chemistry simulation system (Jöckel et al., 2006(Jöckel et al., , 2010(Jöckel et al., , 2016)).
We use the results from simulation RC1SD-base-10a of the ESCiMo (Earth System Chemistry integrated Modelling) project as detailed by Jöckel et al. (2016).Here, only basic information on this specific simulation is summarized.
The model results were obtained with ECHAM5 version 5.3.02(Roeckner et al., 2006) and MESSy version 2.51 at T42L90MA resolution, i.e., with a spherical truncation of T42, corresponding to a quadratic Gaussian grid of approx.2.8 • by 2.8 • in latitude and longitude, and 90 vertical hybrid pressure levels up to 0.01 hPa.The dynamics of the general circulation model was nudged by Newtonian relaxation towards ERA-Interim reanalysis data (Dee et al., 2011).
Simulation RC1SD-base-10a was selected from among the various ESCiMo simulations for several reasons: a. it has been nudged to reproduce the "observed" synoptic situations; b. its stratospheric resolution is, with 65 levels, finer compared to other simulations from the ESCiMo project; c. the simulated total column and tropospheric partial column ozone compare well with observations (Jöckel et al., 2016); and d. the precursor emissions from the land transport sector are most realistic in comparison to other simulations.
In conclusion, this simulation represents the state-of-theart in terms of numerical simulation of the atmospheric chemistry.Moreover, the applied nudging technique allows a direct comparison with observational data, since the simulated meteorological situation corresponds to the observed.
Specifically for this study, the submodel SORBIT (Jöckel et al., 2010) was used to extract NO 2 mixing ratios along the sun-synchronous orbit of the Aura satellite, thus matching the local time of OMI observations.Stratospheric VCDs were calculated by vertical integration of the modeled NO 2 mixing ratios between the tropopause height (as diagnosed according to the WMO definition based on lapse rate equatorwards of 30 • north/south and as iso-surface of 3.5 PVU potential vorticity poleward of 30 • latitude) and the top of the atmosphere.
In this study, we make use of the modeled stratospheric columns for two purposes.
1. We perform a simple model-based STS for comparison.
To remove systematic biases between satellite measurements and EMAC, a latitude dependent offset is determined in the Pacific and corrected for globally, similar as in Richter et al. (2005) and Hilboll et al. (2013).We refer to this EMAC-based STS as STS EMAC and applied it to OMI data (Sect.5.1.3).
2. Stratospheric VCDs from EMAC are used to construct a synthetic data set of total NO 2 VCDs for performance tests of STREAM (see next section).

Synthetic VCD
We test the performance of STREAM on synthetic VCDs, which allows a quantitative comparison of the estimated TR to the a priori "truth".The input to STREAM, i. c. measured cloud properties and the respective tropospheric AMFs from OMI as provided in the DOMINO NO 2 product.
Synthetic TRs are given as T * = V trop ×A trop /A strat (compare Eq. 4).Synthetic total columns V * are then calculated as V strat + T * (Eq. 3) and fed into STREAM.The resulting fields of stratospheric VCDs and the respective TRs can then be compared to the a priori "truth".Synthetic V strat , TVCD, and T * are displayed in Fig. S7 for 2 selected days.

Algorithm performance
In this section we analyze the performance of STREAM.As the true stratospheric VCD is not known, the error of any STS algorithm is not easily accessible.Still, the STS performance can be evaluated based on the properties of the resulting TR: in remote regions without substantial NO x emissions, T * should generally be low but still positive (about 0.1 CDU; Valks et al., 2011).Also the variability of T * over both space and time should be low in regions free of tropospheric sources.
Below, we investigate the characteristics of T * from STREAM (Sect.4.1) and its dependency on a priori settings (Sect.4.2) for OMI measurements.In addition, the error of T * is quantified based on synthetic data (Sect.4.3).Application of STREAM to other satellite instruments and the comparisons between STREAM and other STS algorithms are provided in Sect. 5.

Performance of STREAM for OMI compared to RSM
Figure 3 displays the OMI daily mean VCD V * (top) as well as the respective stratospheric field from RSM (second row) and STREAM (third row) for 1 January (left) and 1 July 2005 (right), respectively.The overall latitudinal as well as longitudinal dependencies are clearly reflected in the stratospheric fields, while small-scale stratospheric features are lost by the spatial convolution.Figures 4 and 5 display the resulting TRs, respectively, for both daily (top) and monthly (bottom) means.Figure 6 summarizes the daily and monthly statistical properties of TR, i.e., the median as well as 10th/90th and 25th/75th percentiles (light/dark bars) for different regions (see Fig. S8 for an illustrative sketch of the meaning of the percentile bars, as well as the definition of regions).
Overall, spatial patterns of TR are similar for RSM and STREAM, in particular the enhanced values reflecting tropospheric pollution over, e.g., the USA, central Africa, or China.However, RSM reveals several artifacts of both enhanced as well as systematically negative TR as a consequence of the simple assumption of zonal invariability of stratospheric NO 2 .For instance, on 1 January 2005, VCDs over northern Canada are lowered due to the polar vortex (Fig. 3 top left).Consequently, the simple RSM results in negative T * RSM down to −0.7 CDU (Fig. 4).In contrast, T * RSM over northeastern Russia is quite high (> 0.5 CDU).This pattern is slightly reduced but still present in the monthly mean (see the statistics of T * RSM for high latitudes in Fig. 6).This artifact is largely reduced by STREAM (Fig. 5 top left).The spread of T * at high latitudes is more than 3 times lower than that of T * RSM (Fig. 6).Also for July, systematic structures showing up in T * RSM (in polar regions, but also in the Indian ocean at 30-60 • S) are largely reduced in STREAM.
Over the Pacific, T * RSM is, by construction, 0 on average.T * is systematically higher by about 0.1 CDU (Fig. 6).This results from the emphasis of clouded pixels used for STREAM, which directly reflect the stratospheric rather than the total VCD.This additional advantage of STREAM over RSM is further discussed below (Sect.5.6).
As both RSM and STREAM generally assume stratospheric patterns of NO 2 to be smooth, i.e., do not resolve longitudinal variations at all (RSM) or on scales < σ ϕ (STREAM), the small-scale variations in daily total VCDs (Fig. 3 top) are transferred to the TR, resulting in "patchy" daily TRs ranging from about −0.1 up to +0.4 CDU in remote regions (10th-90th percentiles).In the monthly means, however, these patchy structures have mostly vanished (both for RSM and STREAM), as the spatial patterns of different days at variable locations cancel each other out.The remaining systematic patterns in monthly mean T * have generally larger spatial scales and are within 0 up to +0.25 CDU in remote regions.
On 1 July, a band of enhanced V * shows up around 20-30 • S, where (a) V * is higher in the Indian Ocean compared to the Pacific and (b) the structure of enhanced V * is "tilted" in the Pacific (see Fig. 3 top right); i.e., the RSM assumption of zonal invariance is not fulfilled.Consequently, the RSM results in extended horizontal structures ("stripes") of low/high-biased T * over South America and the Indian Ocean, respectively, ranging from −0.5 up to almost 1.0 CDU (Fig. 4 top right).Again, temporal averaging reduces the amplitude, but systematic patterns of about ±0.4 CDU remain in the monthly mean (Fig. 4

bottom right).
As STREAM also assumes a weak variation of V strat with longitude, in particular at low latitudes, the artifacts in T * are very similar to those of T * RSM at 20-30 • S. Note that this artifact is particularly strong in July 2005 (as compared to 2010; see Sect.5.1.4).
In Sect.5, TRs from STREAM are investigated for other satellite instruments and compared to other STS algorithms, and the advantages and limitations of STREAM are discussed further.definition and convolution settings.We performed runs of STREAM with one-by-one modifications of each parameter and compared the results to the baseline setting.Overall, the effects of a priori settings on T * have been found to be rather small (of the order of 0.1 CDU), and the STREAM results are thus robust with respect to the parameters chosen in v0.92.Below, we summarize the main findings of the performed sensitivity studies.Figures and details are provided in the Supplement.

Impact of cloud weight
The cloud weight w cld was varied (a) by setting it to 1 (i.e., not accounting for cloud properties at all), (b) increasing w cld by a factor of 10 for clouded pixels, (c) including highaltitude clouds in the calculation of w cld , and (d) including low-altitude clouds in the calculation of w cld .a.When no w cld is applied, the tropospheric estimate over the Pacific is ≈ 0, as for the classical RSM, instead of about 0.1 CDU for the baseline.This difference corresponds to the order of the tropospheric background of NO 2 .Over potentially polluted regions, however, the difference to the baseline is larger (0.2 CDU).Here, the stratospheric estimate is additionally biased high due to missing supporting points over continents.
b.The "high w cld " scenario is achieved by modifying Eq. (6a) from w cld : = 10 2×w c ×w p to w cld : = 10 3×w c ×w p ; i.e., w cld is increased by a factor of 10 for cloudy pixels of mid-altitude but stays unchanged for cloud-free pixels.In this scenario, measurements over clouds by far dominate the stratospheric estimate, yielding lower V strat , and thus higher T * , compared to the baseline.However, the difference is very small (< 0.05 CDU).In addition, the variability of T * is generally slightly higher in case of a 10 fold increased w cld .
c.When high-altitude clouds are included in the calculation of w cld , the resulting TR hardly changes at all, indicating that the impact of lightning NO x on NO 2 satellite observations is generally small.
d.The inclusion of low-altitude clouds has almost no effect as well, as expected over clean regions.Over potentially polluted regions, however, it is expected that low-altitude clouds result in increased total columns V * as soon as there is significant NO 2 above or within the cloud, causing high tropospheric AMFs.Consequently, V strat is expected to be biased high, and T * biased low over potentially polluted regions, when low clouds are included in the calculation of w cld .This effect was indeed found, but the absolute change is rather small (< 0.1 CDU in winter, almost 0 in summer).This weak dependency on the inclusion of low-altitude clouds probably results from the conservative definition of w pol , which is already very low over regularly polluted regions.
Following the argument that cloudy observations provide a direct measurement of the stratospheric column, a higher cloud weight would be expected to be more favorable and to result in higher tropospheric background over the Pacific.This is indeed observed for OMI.For other satellite instruments, however, results are somewhat contradictory (see Sects. 5.3,5.4,and 6).Thus, the definition of w cld in Eq. ( 6) is kept as a compromise in order to have common algorithm settings across different satellite platforms.

Impact of convolution
In STREAM, two different CK are applied, yielding two stratospheric estimates, and the final V strat is calculated as weighted mean of both (see Sect. 2.3 and Eqs. 15 and 16).We tested the impact of the choice for CK by applying both the polar ("narrow") and equatorial ("wide") CK globally.The narrow CK, and thus the potential range of influence of satellite pixels with high weights, is limited to about 2×σ ϕ = 20 • in longitude.This potentially leads to biases over continents caused by spatial interpolation.Thus, the resulting T * is (too) low over central Africa.Overall, median T * over potentially polluted regions is lower compared to the baseline settings by about 0.1 CDU.
For wide CK, however, the longitudinal gradients at high latitudes are not resolved anymore.Consequently, the spatial variability of daily T * at high latitudes is increased by a factor of 2. We conclude that our choice of the combined CK for high and low latitudes is a good compromise for realizing weighted convolution.

Impact of latitude correction
When the initial correction of the latitudinal dependency of V * over the Pacific is omitted, the resulting TR reveals global stripes with negative values around the equator and maxima (≈ 0.5 CDU) at about 30 • N/S, both in winter and summer.

Impact of the number of considered orbits
In STREAM baseline settings, for each orbit, stratospheric estimation is based on the previous and subsequent seven orbits, corresponding to full global coverage for OMI.Switching this parameter to either ±14 or ±3 orbits has almost no impact on the resulting TR.
In case of NRT application of STREAM, no subsequent orbits are available, and the previous 14 orbits have to be considered.This setup also results in essentially the same T * statistics (compare Sect. 5.2).

Impact of tropospheric residue weight
In STREAM v0.92, one iteration for w TR is applied.When w TR is omitted, the spread of T * slightly increases for high latitudes.A second iteration does not yield a further improvement.Lowering the threshold in Eq. ( 7) from 0.5 to 0.3 CDU results in a slightly lower spread of T * at high latitudes in summer.

Impact of pollution weight
The impact of pollution weight is investigated by multiplying w pol (where different from 1, compare Fig. 2a) by 0.1 ("low w pol ") or 10 ("high w pol ").In the first case, the resulting pollution weight over most continents is below 0.01, while in the second case it is increased to 1 (meaning that w pol is switched off) except for industrialized pollution hotspots.
In remote regions, the change of pollution weight has almost no impact.In potentially polluted regions, the impact is only moderate as well.Low w pol does not differ much from the baseline, as the latter already assigns rather low weighting factors to potentially polluted pixels; a further decrease by factor 0.1 thus does not change much.
Only for high w pol can a significant change of TR be seen; in this case, the inclusion of more partly polluted observations causes a high bias in the stratospheric estimate and the resulting TRs are biased low by almost 0.1 CDU in winter.

Performance for synthetic data
In order to estimate the uncertainties of the STREAM stratospheric estimate (and thus tropospheric residues), we apply the algorithm to synthetic input data, as defined in Sect.3.3, for which the "true" stratospheric fields and TR are known.Again, a simple RSM is applied as well for comparison.Figure 7 displays the statistics of the error of T * , i.e., the difference T * of estimated and a priori TR, which equals the difference between the true and the estimated stratospheric VCD, for different regions.The spatial patterns of T * are shown in the Supplement (Fig. S20).
Over the Pacific, RSM results in TR biased low by 0.1 CDU.With STREAM, the bias is reduced (0.05 CDU) but not completely removed.On 1 January 2005, T * shows a variability of almost 0.4 CDU (10th to 90th percentile) for both algorithms.This is mainly caused by the small-scale structures of stratospheric NO 2 in EMAC over the Pacific, in particular at southern latitudes (see Fig. S7), which are resolved by neither STREAM nor RSM.The respective spatial variability of the monthly mean, however, is much lower (about 0.1 CDU).
Again, the simple RSM results in large biases and high variability of T * at high latitudes, which are largely reduced by STREAM.
Overall, the agreement of a priori and estimated T * from STREAM is very good, in particular for monthly means.Remaining systematic biases are about −0.1 CDU over potentially polluted regions; i.e., resulting TRs are slightly underestimated, as expected due to the general approach of using total column measurements as proxy for the stratospheric estimation.
The application of STREAM to synthetic data thus provides a valuable estimate of the algorithm's accuracy.One might think that using the synthetic data for optimizing the definition of weighting factors is the next step forward.However, we refrain from doing so due to some contradictory results for different instruments.Concretely, the remaining bias in TR for synthetic data of about 0.1 CDU could be further reduced by increasing w cld .This, however, has adverse ef-fects on SCIAMACHY and GOME results (see Sects.5.3 and 5.4).

Comparison to other algorithms and discussion
In this section, we apply STREAM to different satellite instruments, compare the results to various existing STS algorithms, and discuss the challenges, limitations, and uncertainties of STS in general and STREAM in detail.

OMI
As shown in the previous section, STREAM as applied to OMI data generally shows a good performance (Figs. 5  and 6).The systematic artifacts of a simple RSM, such as the large variability of T * at high latitudes, are largely removed by STREAM.In addition, the application of w cld emphasizes cloudy observations which directly reflect the stratospheric column.Mean T * over the Pacific is thus not 0 anymore as in RSM, and an additional correction for the tropospheric background is not required in STREAM.
The sensitivity of STREAM on a priori parameters has been found to be small.Remaining monthly mean TRs in clean regions and their variability are of the order of 0.1 CDU.
Below, we compare the OMI results for 2005 to other algorithms, i.e., the operational DOMINO (Sect.5.1.1)and NASA (Sect.5.1.2) data products as well as a simple modelbased correction using EMAC (Sect.5.1.3). Figure 8 summarizes the statistics of regional T * from the different algorithms.Note that only coincident measurements where all four data products exist are included in Fig. 8 in order to allow for a meaningful comparison; in particular, high latitudes in hemispheric winter are skipped, as DOMINO data are not provided for SZA > 80 • .Thus, the statistics for STREAM slightly differ from those shown in Fig. 6.

Comparison to DOMINO
STREAM is part of the TROPOMI verification activities.The operational TROPOMI ("prototype") algorithm for STS of NO 2 (van Geffen et al., 2014) was developed by KNMI, based on the DOMINO data processor for OMI (Boersma et al., 2007(Boersma et al., , 2011) ) (Boersma et al., 2011).The STS therein is done by assimilating the satellite measurements in the CTM TM4 (Dirksen et al., 2011).
For TROPOMI verification, we compare STREAM results for OMI to the respective DOMINO product as shown in Fig. S21.On a daily basis, "patchy" patterns of enhanced as well as negative TR show up over remote regions (Fig. S21), which result from the dynamical features already present in total VCDs (Fig. 3) combined with the respective dynamics prognosed by the model; spatial mismatch of these patterns can easily cause biases of the estimated TRs in both directions.Interestingly, some patterns look even reversed as compared to STREAM (Fig. 5), for instance southeast from South Africa (around 50 • S, 50 • E).In the monthly means, these patches again are mostly canceled out.Mean regional TRs (Fig. 8) are very similar between STREAM and DOMINO.However, the variability of T * is slightly higher for DOMINO, in particular at high latitudes, as well as in the Pacific and in remote regions in July.
Figure 9 displays the differences of the monthly mean TR between STS_EMAC and STREAM for January and July 2005.Overall, the differences are quite small (below ±0.1 CDU for 65 % of the world between 60 • S and 60 • N).Nonetheless, the monthly means reveal systematic regional deviations of more than ±0.3 CDU (for less than 3 % of the world).
In January, TRs over East Asia at high latitudes are systematically higher for STREAM.This is probably related to an underestimation by DOMINO, as the DOMINO TRs are very low and partly negative in this region.Over North America, TRs from STREAM are higher than from DOMINO at the east coast but vice versa over western Canada.In both cases, the lower TR is slightly negative, indicating an overestimation of V strat from DOMINO/STREAM at the east/west coast, respectively.
In July, the TR reveals a "stripe"-like structure at about 30 • S, as already discussed in Sect.4.1.In DOMINO, similar bands of enhanced tropospheric residue are found around 30 • S, in particular in the Indian ocean.As the amplitude and width of these bands is different for STREAM and DOMINO, this feature is most striking in the difference map; TRs around 30 • S are generally higher for DOMINO.
DOMINO reveals some patches of systematically enhanced TRs that are not observed by STREAM and thus show up in the difference map as well (west of the USA, west of the Sahara, Himalaya).Reasons for these regionally enhanced TRs (and thus low-biased stratosphere) have to be investigated in future studies.

Comparison to NASA
The official OMI NO 2 product provided by NASA uses an MRSM for STS as well, as described in Bucsela et al. (2013).Daily and monthly maps of TR from NASA (OMNO2 v003/SP2) are shown in Fig. S22.
The NASA STS corrects for the tropospheric background based on a "fixed model estimate" (Bucsela et al., 2013).Consequently, TRs are about 0.1 to 0.3 CDU over clean regions throughout the world.
TRs from NASA are impressively smooth even on a daily basis.This results from the STS algorithm which, over clean regions, interprets the difference between the total column and the (small) modeled tropospheric column as stratospheric column whenever the quotient of the modeled tropospheric slant column and stratospheric AMF (matching our definition of T * ) goes below a threshold of 0.3 CDU.Thus, at southern high latitudes in July (completely classified as unpolluted in the NASA algorithm), the TR is almost 0 ± 0, i.e., shows no variability at all (compare Fig. 8) just by construction, as all the variability present in the total column was assigned to the stratospheric column (compare Fig. 3).
While this is probably a reasonable procedure over completely clean regions, we would like to point out the following.
1.The smoothness of NASA TR over oceans is not surprising, as it is reached by construction.In particular, the smooth patterns of TR over oceans allow no conclusion on the NASA STS performance over polluted continental regions, where TRs are based on interpolated stratospheric fields, just as in STREAM.
2. The NASA procedure of assigning the total column variability in clean regions completely to the stratospheric estimate also removes any cloud dependency of the TR, which affects applications such as profile retrievals by cloud slicing (e.g., Belmonte Rivas et al., 2014).
3. The NASA procedure runs the risk of labeling episodical NO 2 transport events over oceans (Zien et al., 2014) as stratospheric pattern.Bucsela et al. (2013) perform an automatic "hotspot" identification and elimination scheme to avoid this.Nonetheless, on 1 January, a NO 2 transport event can be seen in the total VCD east of Canada (Fig. 3 top left) which is similar to the "meteorological bomb" described in Stohl et al. (2003).This event is clearly visible in T * from STREAM (Fig. 5 top left) but only weakly in T * from NASA (Fig. S22 top left).The reason for this discrepancy is that the local enhancement of NO 2 is partly classified as a stratospheric feature in the NASA product, as illustrated in Fig. S23 (left).
Figure 10 displays the differences of the monthly mean TR for January and July 2005.Again, overall agreement is very good: in January, both products agree within 0.1 CDU for 69 % of the Earth and within 0.3 CDU anywhere.In July, agreement within 0.1/0.3CDU is found for 64 %/94 % of the Earth (for latitudes below 60 • ), respectively.Again, the band at 30 • S sticks out in the difference map as discussed above.Highest deviations of up to 0.5 CDU, however, are observed over the Sahara.Within the NASA STS, the Sahara is masked out completely, as the high albedo and low cloud fractions result in high tropospheric AMFs, such that even low tropospheric VCDs could contribute significantly to the total column.In STREAM, however, large parts of the Sahara are treated as unpolluted and are assigned with w = 1.A close check of the stratospheric estimates from STREAM and NASA over the Sahara reveals that the large deviation probably results from both a highbiased V strat by STREAM and a low-biased V strat by NASA (see Fig. S23 right).

Comparison to STS EMAC
We have used the stratospheric 3-D mixing ratios provided by EMAC in order to perform a simple model-based STS, similar to Hilboll et al. (2013).First, the latitude-dependent offset between EMAC and OMI VCDs is estimated over the Pacific (when a multiplicative adjustment is performed, results hardly change).Second, the offset-corrected stratospheric NO 2 VCDs is used for global STS.No additional correction for the tropospheric background is performed, such that the mean TR over the Pacific is 0 by construction.Daily and monthly maps of TR from STS EMAC are shown in Fig. S24.Daily maps reveal patches of TR from −0.3 CDU up to 0.4 CDU resulting from mismatches in actual and modeled stratospheric dynamics.In the monthly mean, these fluctuations largely cancel out.Overall, variability (10th-90th percentiles) of T * in remote regions was found to be about 0.3-0.4,similar to that for DOMINO.
Figure 11 displays the differences of the monthly mean TR for January and July 2005.The overall negative values over ocean are a result of the neglect of the tropospheric background in STS EMAC .Besides this, the most striking features are 1.positive deviations (i.e., TR from STS EMAC being higher than from STREAM) over North America and Eurasia in January (up to 0.45 CDU, north from 35 The systematic deviations north from 35 • N (1 and 2) are caused by the longitudinal dependency of stratospheric NO 2 from EMAC which differs from the pattern in total column (see Fig. 3).In detail, stratospheric NO 2 over Siberia is quite low in EMAC, resulting in high-biased TR (similar as for RSM) and indicating that the mean longitudinal dependency of stratospheric NO 2 is not fully reproduced by EMAC.Deviations in July over Sahara and southern Asia (3), however, are at least partly caused by a low bias of T * from STREAM as discussed in the previous section.
Overall, deviations are moderate, and STS EMAC still improves the statistics of TR for high latitudes as compared to a simple RSM.It thus might be considered as a simple alternative STS with the advantage that it can be expected to work with the same performance for any satellite instrument, independent of spatiotemporal coverage.

OMI after row anomaly
In 2005, OMI measurements were performed with good instrumental performance, providing daily global coverage.This has changed since summer 2007, when radiance measurements of poor quality regularly occurred at particular cross-track positions ("row anomaly").We thus also tested STREAM on OMI data after the onset of the row anomaly: Figs.S25 and S26 show T * for 2010.While the daily maps reveal gaps due to the exclusion of measurements affected by the row anomaly, the monthly mean patterns as well as the statistical properties are comparable to the results for 2005.The row anomaly thus does not impact the performance of STREAM (or DOMINO or NASA retrievals).

GOME-2
STREAM has been applied to GOME-2 (Metop-A) data for the year 2010.The resulting daily and monthly mean maps are shown in Fig. S27.Again, statistical properties are summarized in Fig. 12.
The overall performance of STREAM, i.e., median and variability range of TR, is generally similar to that found of OMI.However, while OMI TRs are about 0.1 CDU over the Pacific, lower values (0.05 CDU) are found for GOME-2.This might be related to differences of cloud statistics due to pixel size, in particular a lower number of fully clouded pixels for GOME-2, as well as differences in local time, cloud products, or systematic spectral interferences caused by clouds in either retrieval algorithm.
On 1 July 2010, GOME-2 is operated in narrow swath mode, causing poor global coverage.This, however, does not affect STREAM performance.
On 15 January 2010, STREAM results in extraordinarily high TR over the ocean (Fig. S28), which turned out to be caused by a solar eclipse (Espenak and Anderson, 2008).Removing the affected orbit results in normal performance for this day.We recommend that screening of solar eclipses be done automatically (as done for OMI) before running any STS algorithm.

Comparison to NRT mode
STREAM is foreseen to be implemented in an update of the operational GOME-2 data processor as operated in the framework of the O3M SAF.This requires a slight modification of STREAM in order to work on NRT data.
In STREAM v0.92, the stratospheric fields are estimated for each orbit based on the total column measurements, including seven previous and seven subsequent orbits.In NRT, however, no subsequent orbits are available.Thus, STREAM has to be operated on the current plus 14 previous orbits instead.
We ran STREAM in NRT mode.The resulting maps are shown in Fig. S28, and the statistics of TR are included in Fig. 12.The deviations between baseline and NRT are marginal.Thus, STREAM can be operated in NRT with stable performance.

Comparison to operational product (GDP 4.7)
In the current operational data processor (GDP 4.7), STS for NO 2 is done by an MRSM as described in Valks et al. (2011Valks et al. ( , 2015)).Basically, polluted regions (defined by monthly mean TVCDs from the MOZART-2 model being larger than 1 CDU) are masked out.Global stratospheric fields are derived by low-pass filtering in zonal direction by a 30 • boxcar filter.
Figure S30 displays daily and monthly mean maps of T * in January and July 2010.The respective regional statistics are included in Fig. 12.
Overall, TRs from GDP are relatively low.Over the Pacific, mean T * is close to 0 in January, despite the applied tropospheric background correction of 0.1 CDU.Over potentially polluted regions, median TR from GDP is systematically lower (by 0.2 CDU in July) than from STREAM, and almost a quarter of all TRs are even negative.
Figure 13 displays the differences of the monthly mean TR from GDP 4.7 and STREAM for January and July 2010, again pointing out the systematically lower values of GDP TR over continents in July.The systematic low bias of GDP TR probably results from moderately polluted pixels over regions labeled as "unpolluted", which still might imply MOZART-2 TVCDs of up to 1 CDU.These measurements cause a high bias of the estimated stratospheric field around polluted regions; by the subsequent low-pass filtering, this high bias is passed over the polluted regions and results in low-biased TR.Further investigations are needed to find out why this effect is stronger in July than in January.

SCIAMACHY
We have applied STREAM to SCIAMACHY VCDs from the MPI-C NO 2 retrieval (Beirle et al., 2010a).The resulting daily and monthly mean maps for 2010 are shown in Fig. S31.Regional statistics are provided in Fig. 14, compared again to the simple RSM and, additionally, to the results of LNM.
Though SCIAMACHY provides poorer daily spatial coverage, STREAM overall still works well.Again, a clear reduction of the variability of T * is found at high latitudes as compared to RSM.Over the Pacific, mean TR from STREAM is higher than for the RSM (= 0) but, similar to GOME-2, not as high as for OMI.Again, this could be related to the low number of cloudy satellite pixels and spec- ple RSM. Figure 16 displays the regional TR statistics for GOME in January and July 1999.The respective maps are provided in the Supplement (Fig. S34).
Overall, STREAM yields reasonable results for GOME as well.However, some systematic biases are observed: over the Pacific, TRs from STREAM were found to be negative, which can only be explained when the measured columns for cloudy pixels are higher than for cloud-free pixels; over potentially polluted regions, T * from STREAM is systematically lower than from RSM (by 0.2 CDU in July).This might be a consequence of the applied cloud weight, which has obviously different effects on GOME than on OMI.
This explanation would be consistent with previous findings: while Leue et al. (2001) base the STS on cloudy pixels, Wenig et al. (2004) switched the Heidelberg STS to cloudfree pixels after noticing that GOME columns are higher instead of lower over clouds.Wenig et al. (2004) relate this to the contribution of lightning NO x .However, as (a) the impact of lightning NO x on satellite observations is generally small (Beirle et al., 2010b) and (b) lightning activity over the remote Pacific used for the RSM is very weak, we rather suspect that a different effect is responsible for this finding, most probably related to the specific instrumental features of GOME (Burrows et al., 1999), in particular the dichroic mirrors causing polarization dependent spectral structures.It might thus be worth re-checking the DOAS analysis of NO 2 for GOME for spectral interferences related to clouds.A second possible effect, which might in particular contribute to the large discrepancy over polluted regions, is that cloud properties are averaged over the large GOME ground pixel; i.e., in an extreme case, low and high cloud layers, which would both be skipped in w cld if resolved by the satellite pixel, might yield, on average, an effective cloud height with a high w cld .Any tropospheric pollution within (or directly above) the low cloud layer would then bias high the stratospheric estimate and bias low the TR.
5.5 Future instruments 5.5.1 TROPOMI TROPOMI (Veefkind et al., 2012) on S5p will be launched in 2016.Instrumental setup and spatial coverage are similar to OMI, but TROPOMI will provide a better spatial resolution of 7 × 7 km 2 at nadir.
STREAM was developed as a verification algorithm for TROPOMI STS and was tested and compared to the TROPOMI prototype algorithm based on OMI measurements (see above).Though no TROPOMI measurements are available yet, it can be expected that the performance of STREAM on TROPOMI will be even better than for OMI, because, due to the better spatial resolution, more individual satellite pixels are available and among them a higher fraction of clouded pixels.Thus, more sampling points over potentially polluted regions will be available, further decreasing interpolation errors.

Sentinel 4 (S4)
The satellite instruments investigated so far are all operated in low, sun-synchronous orbits, providing global coverage at fixed local time.In the near future, a new generation of spectrometers on geostationary orbits will be launched by different space agencies.Over Europe, S4 (Ingmann et al., 2012) will be the first mission providing a spectral resolving UV/vis instrument on a geostationary satellite.The spatial coverage is focussed on Europe.Thus, no "clean" reference regions are regularly available.STREAM might overcome this problem by using clouded observations where the tropospheric pollution is effectively shielded.
We simply evaluate the expected performance of STREAM on S4 measurements by clipping OMI measurements to the area covered by S4 (as given in Courrèges-Lacoste et al., 2010).The STREAM settings are identical to v0.92, except for the a priori removal of the overall latitude dependency in the reference sector, as no Pacific measurements are available for S4. Figure 17  Though tropospheric pollution over Europe and the Middle East is evident, i.e., an extended clean reference region is actually not available, STREAM is still capable of yielding an accurate stratospheric estimate.Only at the northern and southern borders are systematic biases observed, which can be caused by the overall latitudinal dependency of the stratospheric VCD and border effects of the weighted convolution and can probably be reduced by dedicated optimization of the algorithm for S4.Situation will probably be improved for real S4 measurements due to the higher number of clouded pixels in S4 compared to OMI.Thus, this first check is highly encouraging to further investigate the applicability of STREAM to S4 and possible improvements.

Advantages and limitations of STREAM
STREAM was successfully applied to various satellite measurements with a wide range of spatial resolution and coverage.STREAM is an MRSM and does not need any model input.It can thus be considered as a complementary approach to data assimilation, as chosen for the TROPOMI prototype algorithm.
As (M)RSMs usually estimate the stratospheric column based on total column measurements over clean regions, they generally miss the (small) tropospheric background of the order of some 0.1 CDU.Several (M)RSMs explicitly correct for this effect based on a priori tropospheric background columns (Martin et al., 2002;Valks et al., 2011;Bucsela et al., 2013).In case of STREAM, however, cloudy pixels, which allow a direct measurement of the actual stratospheric column (except for the small tropospheric column above the cloud), are emphasized.Thus, an additional tropospheric background correction should be unnecessary.Accordingly, in case of OMI, TRs from STREAM are about 0.1 CDU over clean regions, similar as for TRs from DOMINO and NASA.This is close to the a priori value chosen by Valks et al. (2011) but below the values given in Martin et al. (2002) (about 0.15-0.3CDU, assuming a tropospheric AMF of 2) and Hilboll et al. (2013) (0.1 up to > 0.6 CDU6 ).
In case of other satellite instruments, however, the TR over the Pacific was found to be lower (GOME-2 and SCIAMACHY) or even negative (GOME-1).The latter can only be explained by cloudy measurements being systematically higher than cloud-free measurements.Further investigations are needed to infer this discrepancy between OMI and GOME-1/2/SCIAMACHY and find how it is related to differences in the cloud products and/or the spectral analysis of NO 2 .
STREAM assumes stratospheric NO 2 fields having low zonal variability, in particular at low latitudes.This is reflected by the choice of a wide convolution kernel at the equator.STREAM is thus not capable of resolving diurnal small-scale patterns caused by stratospheric dynamics.These patterns, however, largely cancel out in monthly means.
Whenever actual stratospheric fields do not match the a priori assumption of zonal smoothness, e.g., in case of "tilted" structures or actual large-scale zonal gradients like differences in the stratospheric column over Pacific and Indian ocean, the TR resulting from STREAM can show artificial "stripes".Further investigations might lead to additional sophisticated algorithm steps to remove these artifacts.However, it has to be taken care that the benefit really outbalances the drawbacks (added complexity) and that no other artifacts/biases are introduced.
The dependencies of TR on STREAM parameter settings have been found to be low ( 0.1 CDU).The application of STREAM on synthetic data results in deviations to the a priori truth of the same order.These deviations are systematic, i.e., the stratospheric patterns estimated by STREAM are slightly biased high, which can be expected, as they are based on total column measurements, which are always higher than the stratospheric column.
Overall, STREAM uncertainty is well within the general uncertainties of STS (see next section).Note that systematic changes of the NO 2 columns of the same order of 0.1 CDU can also result from changes of the settings for the DOAS analysis, like fit interval, inclusion of additional absorbers in the analysis, or the treatment of rotational and vibrational Raman scattering, creating overall biases as well as spatial patterns, e.g., over oligotrophic oceans (E.Peters, personal communication, 2016).

General uncertainties and challenges of stratosphere-troposphere separation
The uncertainty of STS can often not be directly quantified, as the "true" stratospheric 4-D concentration fields are not known.One approach to assess the STS performance is the With respect to the final NO 2 TVCD product, which is higher than TR by the ratio of stratospheric and tropospheric AMFs (Eq.4), uncertainties of this order are completely negligible over polluted regions such as the US east coast, central Europe, or eastern China.Nonetheless, a regional bias > 0.2 CDU (e.g., over Russia in January) can contribute significantly to the relative uncertainty of TVCDs aside the pollution hotspots.Thus, the uncertainty of STS has to be kept in mind in studies focusing on NO x emissions from, e.g., biomass burning or soil emissions over regions like Siberia, the Sahel, or Australia.

Other trace gases
STREAM was developed as STS algorithm for NO 2 .However, several other trace gas satellite retrievals face problems which are similar to STS from an algorithmic point of view, i.e., that a small-scale tropospheric signal has to be separated from a smooth background (e.g., caused by stratospheric columns or, in particular in case of trace gases with low optical depth, shortcomings of the spectral analysis, introducing artificial dependencies on, for example, SZA or ozone columns).Thus, the concept of weighted convolution could be used within the satellite retrievals of, for example, SO 2 , BrO, HCHO, or CHOCHO, with appropriately chosen and optimized weighting factors.

Conclusions
The separation of the stratospheric and tropospheric column is a key step in the retrieval of NO 2 TVCDs from total column satellite measurements.As coincident direct measurements of the stratospheric column are usually not available (except for SCIAMACHY), current STS algorithms either use CTMs (directly or via data assimilation) or follow a modified reference sector method (MRSM) approach, where the stratospheric columns are basically estimated from total column measurements over clean regions.
We have developed the MRSM STREAM.Weighting factors determine how far individual satellite pixels contribute to the stratospheric estimate.Over potentially polluted regions (according to an NO 2 climatology), weights are lowered, whereas measurements over mid-altitude clouds are assigned with a high weighting factor.Global stratospheric fields are derived by weighted convolution and subtracted from total columns to yield tropospheric residues (TRs).In a second iteration, weighting factors are modified based on the TR: high TR indicates tropospheric pollution, and the respective satel-lite pixels are assigned with a lower weight.For systematically negative TR, however, weighting factors are increased.The concept of multiplicative weights can easily be extended by additional factors, e.g., based on fire counts in order to explicitly exclude biomass burning events.
STREAM results are robust with respect to variations of the algorithm settings and parameters.With the baseline settings, the errors of STREAM on a synthetic data set have been found to be below 0.1 CDU on average.STREAM was successfully applied to satellite measurements from GOME 1/2, SCIAMACHY, and OMI.The resulting TRs over clean regions and their variability have been found to be low.However, systematic "stripes" can appear in STREAM TR when the basic assumption that the stratospheric column varies smoothly with longitude is not fulfilled, e.g., in case of "tilted" stratospheric patterns.
The emphasis of clouded observations, which provide a direct measurement of the stratospheric rather than the total column, should supersede an additional correction for the tropospheric background, which successfully worked for OMI but less so for GOME and SCIAMACHY.This might be related to differences in pixel size or local overpass time, both potentially affecting cloud statistics, or differences in the cloud algorithms.However, the detailed reasons are not yet fully understood and require further investigations.
STREAM, which was developed as TROPOMI verification algorithm, was optimized for OMI measurements.Within an O3M SAF visiting scientist project, it was also applied to GOME-2, and STREAM is foreseen to be implemented in an upcoming GDP update.
Results from STREAM were compared to the TROPOMI prototype algorithm, as represented by the DOMINO v2 product, in which STS is implemented by data assimilation.Differences between monthly mean TRs from STREAM and DOMINO are found to be low (almost 0 on average with regional patterns up to about ±0.1-0.2CDU).A comparison to other state-of-the-art STS schemes yields deviations of similar order.
The impact of STS is thus generally negligible for TVCDs over heavily polluted regions.However, the remaining uncertainties still contribute significantly to the total error of TVCDs over moderately polluted regions and have to be kept in mind for emission estimates of area sources of NO x such as soil emissions or biomass burning.
7 Data availability STREAM has been tested on NO 2 retrievals from different satellite instruments as listed in Table 1.The input data sets are publicly accessible; the respective links to the data sets are included in the references provided in Table 1.

Information about the Supplement
Additional images, tables, and text are provided in the Supplement.All references to tables and figures in the Supplement are indicated by a prefix "S".For readability, the Supplement is structured analogously to the paper; i.e., additional material to Sect.2.3 can be found in Sect.S2.3 of the Supplement.
The Supplement related to this article is available online at doi:10.5194/amt-9-2753-2016-supplement.

Figure 1 .
Figure1.Definition of weighting factors (a) w pol as a function of the pollution proxy P (Eq.5), (b) w cld as a function of the cloud radiance fraction (Eq.6) for a cloud pressure of 500 hPa, (c) w cld as a function of the cloud pressure (Eq.6) for a cloud radiance fraction of 1, and (d) w TR as a function of the tropospheric residue (Eq.7).

Figure 2 .
Figure 2. Maps of the weighting factors for 1 January 2005 for OMI: (a) pollution weight w pol , (b) cloud weight w cld , (c) tropospheric residue weight w TR , and (d) product of all weighting factors (Eq.8).
S. Beirle et al.: Stratospheric estimation of NO 2 3 Data sets 3.1 Satellite data sets

Figure 3 .
Figure 3.Total OMI VCD V * (top) and the resulting stratospheric estimate V strat from RSM (second row) and STREAM (third row) for 1 January (left) and 1 July (right) 2005.Resulting V strat from other algorithms are included as well for comparison (see Sect. 5).
STREAM determines the stratospheric NO 2 VCD V strat based on weighting factors as described in Sect. 2. The resulting TRs thus depend on the weighting factor www.atmos-meas-tech.net/9/2753/2016/Atmos.Meas.Tech., 9

Figure 4 .Figure 5 .
Figure 4. OMI tropospheric residues T * based on RSM for January (left) and July (right) 2005 for the first day of the month (top) and the monthly mean (bottom).

Figure 6 .
Figure 6.Regional statistics of OMI tropospheric residues T * from RSM and STREAM for January (top) and July (bottom) 2005.Light and dark bars reflect the 10-90th and 25-75th percentiles, respectively.The median is indicated in white.Narrow bars show the statistics for the first day of the month, wide bars those of the monthly means (see also Fig. S8 left for illustration).The regions are defined in Fig. S8 right."High latitudes" refer to the respective hemispheric winter only.

Figure 7 .
Figure 7. Regional statistics of the error of T * from STREAM, i.e., the difference of estimated and a priori TR (based on synthetic total columns as defined in Sect.3.3).

Figure 8 .Figure 9 .
Figure 8. Regional statistics of OMI tropospheric residues T * from different STS algorithms for January (top) and July (bottom) 2005.Note that the values for STREAM slightly differ from Fig. 6, as here only coincident satellite pixels of STREAM, DOMINO, and NASA are included.

Figure 10 .
Figure 10.Monthly mean difference of tropospheric residues T * from NASA and STREAM for OMI measurements in January (top) and July (bottom) 2005.

Figure 11 .
Figure 11.Monthly mean difference of tropospheric residues T * from STS EMAC and STREAM for OMI measurements in January (top) and July (bottom) 2005.

Figure 12 .
Figure 12.Regional statistics of GOME-2 tropospheric residues T * from different algorithms for January (top) and July (bottom) 2010.Conventions as in Fig.6.

Figure 16 .
Figure 16.Regional statistics of GOME tropospheric residues T * from different algorithms for January (top) and July (bottom) 1999.Conventions as in Fig.6.
displays the resulting TR (top) and the difference of TR between clipped and global OMI data (bottom) for January 2005.

Figure 17 .
Figure 17.Performance of STREAM on "S4 data" (i.e., OMI measurements clipped to the area covered by S4) for January 2005.The top panel displays the resulting TR, the bottom panel shows the difference to the TR resulting from full OMI data as shown in Fig. 5.The area covered by S4 in winter has been taken from Courrèges-Lacoste et al. (2010).

S
.Beirle et al.:  Stratospheric estimation of NO 2 usage of synthetic data, as in Sect.4.3.In addition, the TR can be used to evaluate the plausibility of the stratospheric estimate and to derive realistic uncertainties:-Negative TRs are nonphysical.Thus, the occurrence of on average negative T * (exceeding the values/frequencies explainable by noise) clearly indicates a positive bias of the estimated stratosphere.-Troposphericbackground columns over regions free of NO x sources are expected to have low spatiotemporal variability.Thus, the observed variability of T * over clean regions serves as proxy of the uncertainty (precision) of the STS.From different algorithms (MRSMs as well as model-based methods), typical variabilities of T * over remote regions are about 0.5 CDU for daily means and about 0.2-0.3CDU for monthly means.For a simple RSM, much higher values (≈ 1 CDU) are found at high latitudes.Systematic biases (accuracy) of STS can be estimated from the intercomparison of TRs from different algorithms.Figure 18 displays the standard deviation of monthly mean TR from the algorithms shown in Fig. 8 and discussed in Sect.5.1, i.e., two different, independent MRSM approaches (STREAM and NASA) as well as two STS based on models, a simple one (STS EMAC ) and a complex data assimilation setup (DOMINO).Note that the upper range of the color bar was lowered to 0.3 CDU.Overall, the standard deviation of TR from different STS is low (typically < 0.1 CDU and below < 0.2 CDU for most parts of the world).It is thus consistent with the uncertainty estimates of stratospheric columns given in literature (Boersma et al. (2011): 0.15-0.25 CDU (SCD); Valks et al. (2011): 0.15-0.3CDU (VCD); Bucsela et al. (2013): 0.2 CDU (VCD)) and with the magnitude of systematic deviations found in the study on synthetic data (Sect.4.3).

Figure 18 .
Figure 18.Standard deviation of monthly mean T * from different algorithms (STREAM, DOMINO, NASA, and STS EMAC ) for January (top) and July (bottom) 2005 (OMI).

Table 1 .
UV/vis satellite instruments compared or discussed in this study a On Metop-A.A second GOME-2 instrument was launched 2012 on Metop-B, and a third is planned to be launched on Metop-C in 2018.b Switched to 40 × 40 km 2 for GOME-2/Metop-A in Metop-A and Metop-B tandem operation.c At nadir.d Reduced coverage after row anomaly in 2007; see http://projects.knmi.nl/omi/research/product/rowanomaly-background.php.e Geostationary orbit: hourly coverage over Europe.
in the Supplement displays monthly mean ratios

Table 2 .
Terms and abbreviations related to STREAM used in this study.