Quantification and parameterization of non-linearity effects by higher order sensitivity terms in scattered light Differential Optical Absorption Spectroscopy

Abstract. We address the application of Differential Optical Absorption Spectroscopy (DOAS) of scattered light observations in the presence of strong absorbers (in particular ozone), for which the absorption optical depth is a non-linear function of the trace gas concentration. This is the case, because Beer-Lambert law generally does not hold for scattered light measurements due to many light paths contributing to the measurement. While in many cases linear approximation can be made, for scenarios with strong absorptions non-linear effects cannot always be neglected. This is especially the case for observation geometries, 5 for which the light contributing to the measurement is crossing the atmosphere under spatially well separated paths differing strongly in length and location, like e.g. in limb geometry. In these cases, often full retrieval algorithms are applied to address the non-linearities requiring iterative forward modelling of absorption spectra involving time consuming wavelength by wavelength radiative transfer modelling. In this study, we propose to describe the non-linear effects by additional sensitivity parameters that can be used e.g. to build up a look up table. Together with widely used box air mass factors (effective light 10 paths) describing the linear response to the increase in the trace gas amount, the higher order sensitivity parameters eliminate the need for repeating the radiative transfer modelling when modifying the absorption scenario even in presence of a strong absorption background. While the higher order absorption structures can be described as separate fit parameters in the spectral analysis (so called DOAS fit), in practice their quantitative evaluation requires good measurement quality (typically better than that available from current measurements). Therefore, we introduce an iterative retrieval algorithm correcting for the higher or15 der absorption structures not yet considered in the DOAS fit as well as the absorption dependence on temperature and scattering processes.

The article is structured as follows: In Sect. 2 we introduce an approximation for the radiative transfer equation considering higher order OD terms while Sect. 3 generalize DOAS considering these terms. Section 4 introduces an approach to approximate the effective light paths of any order as function of wavelength and trace gas absorption, which is practical to quickly calculate absorption spectra and the first order effective light paths. Sect. 5 discusses the application of the higher order OD terms in the retrieval algorithms. First, the application of Taylor series DOAS (Puķīte et al., 2010) is studied, and the method 5 is extended by considering the higher order OD terms of different trace gases. In addition, an iterative two step approach is described and investigated in sensitivity studies. Sect. 6 investigates the application of the new retrieval scheme for the retrieval of BrO vertical concentration profiles from satellite limb measurements by SCIAMACHY (Bovensmann et al., 1999) and compares the retrieved profiles with correlative balloon measurements (Dorf et al., 2006(Dorf et al., , 2008Rozanov et al., 2011). Finally, Sect. 7 draws some conclusions. The intensity (in our study the Sun normalized radiance) as observed by a detector depends on the paths the photons took in the atmosphere until reaching the detector, each with a certain probability to take place. The intensity is basically the mean of this probability. The path integral formulation of the radiative transfer equation (compare Marquard et al., 2000) in a discrete summation form is expressed: The probability w T i of the path i to take place depends on its geometry and the optical properties along it (e.g. a path with many scattering events or with high trace gas absorption along it will have low probability). The paths not originating from the Sun disk solid angle nor matching the instrument's aperture have zero weight in this definition. For the reason of simplicity, the limit expression and summation boundaries along i will be skipped in the following.

20
One can modify the probability or weight w T i describing it as a product of the weight without absorption w i and the absorption probability along the path i (this is possible because Beer-Lambert law holds along one single path). The equation becomes: l ij describes the pathlength of the ith trajectory through the box j (if the box is not crossed by a path, the length for the path 25 there is zero). β kj is the absorption coefficient of absorber k in box j, i.e.
where c kj and σ kj are number density and absorption cross-section of the absorber, respectively.
Since the Beer-Lambert law does not hold for a multiple light path ensemble, only its approximation can be applied for scattered light observations. Standard DOAS applications use its approximation up to the first order. However, for strong absorption scenarios the first order approximation is usually not applicable, which motivates us to investigate the use of higher order terms.

5
Expanding the OD τ (i.e. the logarithm of the intensity of Eq. 2) in a Taylor series (similarly as it is done by Deutschmann, 2014) with respect to the absorption coefficients β jk up to the 3rd order one obtains: Here, τ 0 , referred to in the following as scattering term, is the contribution to the OD that can be explained by scattering processes only. It equals the logarithm of the intensity I 0 which would be obtained for the atmosphere without any absorbers present (note that I 0 in this notation is not the solar irradiance). The rest, i.e. the difference between the observed optical depth τ and the scattering term is the OD caused by absorption. τ 1 , τ 2 , τ 3 approximate the absorption OD of the first, second and third order contributions to the observed absorption OD by k absorbers present in the atmosphere. We refer to them in the following 15 as first, second and third order OD terms, respectively. The first order OD terms can be also called linear OD terms while the second and third order (and even larger order) OD terms are generally referred to as higher order OD terms in the following.
The terms of a respective order are always a sum of the OD contributions of different trace gases. While the linear OD term is a sum of the first order or linear kth absorber OD contributions, the higher order OD terms contain contributions arising from the same trace gas (if k = K for the 2nd order, or k = K =k for the 3rd order OD terms, we call these contributions 20 self-correlative OD terms of absorber k) and from the combination of different trace gases (if k = K for the 2nd order, or any k, K ,k is not equal to another one, we call these contributions cross-correlative OD terms of e.g. absorbers k and K).
Note that the absorbers k, K andk belong to the same atmospheric scenario, also the boxes j, J andJ belong to one scenario spanning the whole atmosphere, therefore the summation order is arbitrary.
L j is the effective light path (of the first order) through box j describing the sensitivity of the measurement to a certain 25 atmospheric volume, in this case for an atmosphere without absorbers: Normalized by the box dimensions (usually by the height) a dimensionless functional is obtained: the well known box AMF (e.g. Wagner et al., 2007): However, the higher order OD terms of Eq. 4 introduce the new sensitivity functionals: L 2jJ and L 3jJJ . The second order effective light paths are: and the third order effective light paths are: They are weighted products of the individual light paths through two or three boxes, respectively. In general, any order effective light paths can be defined, if the Taylor series expansion of Eq. 2 is extended to higher orders. The M th order term 10 for any M boxes is: Note that similar to the effective light paths L j , the higher order effective light paths are equal for any trace gas considered in a certain atmospheric scenario.
3 Generalization of DOAS for scattered light observations 15 In the next sub-sections the validity of different approximations (linear or higher order terms) in DOAS is assessed.

Linear approximation in the standard DOAS
In order to get a DOAS like equation (as in Platt and Stutz, 2008) for scattered light observations, one basically needs to consider the Taylor series expansion of the logarithm of the intensity in Eq. 4 up to the first order as a function of wavelength within a certain wavelength interval (fit window): 20 τ (λ) = −ln(I(λ)) = τ 0 (λ) + τ 1 (λ) In the following we skip explicitely mentioning the dependence of wavelength due to the reason of readability.
The wavelength dependency of the scattering term τ 0 is usually approximated by a low order polynomial, i.e.: where λ is wavelength, P indicates the polynomial order. a p the coefficients of the different polynomial orders obtained by the fit.
τ 1 in Eq. 10 approximates the absorption OD. Considering wavelength dependent absorption cross-sections but neglecting 5 their spatial variation (assuming them being constant with temperature and pressure), the linear OD term τ 1 can be rewritten: with the slant column density (SCD) S k in the DOAS literature referred to as the concentration of absorber k integrated along the effective light path:

10
Since the Taylor series expansion is performed at the absorber concentration of zero, the SCD in the formula represents the effective light path for an atmosphere without absorption. Here it is important to note that for strong absorbers the retrieved SCDs as well as optical depths are systematically underestimated by the linear model, because longer light paths contributing to the measurement become less probable (see also discussion later). This effect can be addressed in a simple way by calculating effective light paths including a background absorption scenario (like it is done in Puķīte et al., 2006;Kühl et al., 2008) 15 when inverting SCDs into spatial concentration distributions. This is however only useful for minor absorbers because the relationship between the OD and their concentration is still almost linear. Applied to the spatial interpretation only, this simple approach, however, does still not consider spectral effects, i.e. systematic biases in the fitted SCDs due to ignoring higher order absorption structures.
Both problems were empirically addressed by a Taylor series approach (Puķīte et al., 2010) by including first order depen-20 dencies of the SCDs on trace gas absorption and scattering as additional fit parameters. They showed that an application of this approach for a strong absorber (ozone) in the UV largely decreased the fit residual structures. However, at the same time a systematic bias for weak absorbers still remained (see Figs. 12 and 13 therein). Another possibility to address the problem of scattered ligth observations with a linear method is the so called AMF modified DOAS (Richter, 1997;Diebel et al., 1995) where the product of a wavelength dependent AMF and cross-section is used instead of a simple cross-section. This method, 25 however, relies on the knowledge or assumption of atmospheric properties like e.g. trace gas and aerosol profiles.

Higher order DOAS
Instead of limiting the DOAS fit to the linear approximation (standard DOAS), also the higher order terms of Eq. 4 can generally be considered.
Like for the approximation of the wavelength dependence of τ 0 in the linear case (Eq. 11), also the wavelength dependence of the effective light paths in Eq. 4 can be approximated by polynomial functions: Here λ p , their products with the cross-sections, as well their product with the products of the cross-sections are fit parameters describing the wavelength dependency of the OD due to scattering. The retrieved quantities from the fit are the polynomial coefficients a p , SCD coefficients S pk that parametrize the wavelength dependency due to the scattering, as well as the higher 10 order SCD coefficients S pkK and S pkKk .
The retrieved SCD coefficients are associated with the trace gas concentration by: The equation allows to investigate the relationship between the SCD and the spatial distribution of an absorber at any wavelength. Additionally it is worth to notice that the SCDs correspond to an atmosphere without absorption, so theoretically 15 it is possible to calculate the effective light paths L j for such an atmosphere. This eliminates the need for a-priori profile assumtions in the calculation of effective light paths and avoids possible subsequent iterative steps. For the second and the third order SCD coefficients the relations can be written in a similar manner: From these two equations it can be basically concluded that additional information for the spatial evaluation can be derived by fitting higher order terms, because different order effective light paths have different spatial distributions as discussed later.
Note however that in this manuscript the content of this additional information is not explicitly investigated because a) a non-linear inversion algorithm would be needed, and b) the high correlations between the derived SCDs of different orders, especially when they correspond to the same trace gas. For accurate fit results, in general a very good measurement quality 5 (e.g. signal to noise ratio) would be needed.

Terms for second order approximation
In order to account for the effects caused by strong absorption scenarios, let us consider the higher order Taylor series terms of the expansion in Eq. 4. The considerations are generally valid for combinations between any absorbers, both strong or weak.
However, the impact of combinations between weak absorbers only is expected to be negligible.

10
The second order Taylor series term for any two absorbers X and Y can be split in altogether three terms: two for the respective absorbers (important for strong absorbers only) and one for an interdependency between the absorbers (can be important also for combinations between a strong and a weak absorber).
3.3.1 Self-correlative term 15 The single trace gas term for any of the trace gases is: Including the squared cross-section in the DOAS fit (Puķīte et al., 2010), the second order contribution of the absorber is accounted for.

Cross-correlative term 20
The second order Taylor series expansion term for the interdependency between two absorbers X and Y is: the linear DOAS model will be largest for weak absorbers since the ratio between the OD associated with the cross-correlative term and the OD attributed to the trace gas will be larger for a weak absorber: As an example, the Taylor series terms of different orders for a scenario with ozone (strong absorber), NO 2 and BrO (weak absorbers) characteristic for Kiruna in March (see Table 4.1 in Puķīte et al., 2010, for details)  and the inter-correlative terms between ozone and the minor absorbers with respect to their first order terms can bee seen. For ozone, also the 3rd order self correlative term shows rather high values.
The lines in Fig. 1, panels b and d, show the sum of the absorption OD terms up to a certain order while the dots indicate the true total absorption OD (logarithmic ratio between intensities simulated by the RTM without and with absorption). The overestimation of the absorption is clearly seen if only the first order (linear terms) are considered. Second and third order 15 terms improve the characterisation converging towards the truth.

Second order effective light paths and box AMFs
The second order effective light paths (and higher order terms as well), similarly as the classical first order effective light paths, depend on the specific composition of light paths for the observation geometry and are the same for any trace gas. I.e. the effective light paths only quantify the effect to the observed OD, an absorber will have if it is 'added' to the atmosphere. The 20 second order box AMF for boxes j and J are: L 2jJ (Eq. 7) and A 2jJ characterize how dependent the sensitivity is between two different places in the atmosphere, i.e.
accounts for all those light paths which cross both selected boxes. If however both or any of the boxes are not crossed, the individual contribution of the light path is zero. In the case of complete correlation, which is the case when there exist only one 25 light path (like for direct light observations), the second order effective light paths are equal to the product of the corresponding first order effective light paths: The same is of course valid also with respect to the box AMFs. In this case contributions to the observed intensity by the second order terms like in Eq. 18 (and even higher order terms) become zero and the DOAS equation becomes linear.
Let us analyse the difference between the second order effective light paths and the products of the corresponding (first order) effective light paths in more detail. This difference can in principle be used as a measure for the complexity of the observation geometry. Indeed, this difference in mathematical terms is the covariance between light paths through the two boxes. For the 5 same box (if j = J), the difference (i.e. the variance) will always be ≥ 0 because the mean of the squared light paths is larger than the squared mean of the light paths if their lengths and/or weights are not equal. For different boxes (j = J) the difference can be both positive or negative but not smaller than the negative product of the effective light paths. A negative difference will appear when the two boxes are more likely to be crossed by different than by similar light paths (e.g. different parallel light paths from the Sun towards the line of sight of the instrument). In practice the difference will be larger if boxes j and J 10 are located close to each other (with maximum if j = J) and decrease and even can become negative with increasing distance between the boxes j and J. Negative values are more characteristic for 2D or 3D simulations of the atmospheric radiative transfer where it is more likely that different light paths cross a different set of boxes; for the 1D atmosphere considered here mostly positive values occur.
For illustration 1st order effective light paths and their product between different boxes are compared with 2nd order effective 15 light paths for the same geometry in Fig. 2. Similar to first order effective light paths, also for the 2nd order effective light paths, the highest sensitivity is found around the tangent point. Interestingly there also the largest differences between the second order effective light paths and the products of the 1st order effective light paths are observed. This finding is caused by the increased scattering probability around the tangent point, which causes photons to change the direction. This also means that higher order contributions to the OD increase for absorbers located around the tangent point. The different sensitivity distributions of the 20 different order terms could in principle allow to obtain additional information about the spatial distribution of the absorber. But this possibility is not further exploited in this study.

Third order contribution
Continuing analysing the Taylor series expansion of Eq. 1 one can realize that the sensitivity at a certain region can depend not just on the sensitivity at another region but also on a third (Eq. 4) or fourth etc. region (there can be also higher order 25 self-correlation terms for the same region as well). A third order impact of the absorption of one trace gas, between two or even three trace gases can be written: Again, the corresponding products of cross-sections can be included in the DOAS fit as fit parameters. From Fig. 1a,c one can see, that for the chosen scenario, even the 3rd order self-correlative ozone term might be important to be considered in the fit.
Third order box AMFs are: Similar as the third order effective light paths L 3jJJ (Eq. 8) the third order box AMFs quantify the dependency of the sensitivity between three locations in atmosphere, i.e. they account for all light paths which cross the given three boxes. If however any of the boxes is not crossed, the individual contribution of the light path is zero.

10
In this section we discuss the applicability of fundamental concepts of DOAS for scattered light observations. We show that these fundamental concepts are not valid in a strict sense.
While the basic consequence of these findings is a possible bias of the results of the retrieval algorithms, in practice these aspects are specifically relevant only for scenarios with strong non-linearities due to absorption. For most of the current DOAS applications these considerations are not relevant since they are performed in the (almost) linear range. 15 The fundamental assumption by the standard DOAS is that it is possible to distinguish between absorption features of different trace gases. I.e. the total atmospheric absorption τ T (being the logarithm of the ratio between a measurement without and with absorption) can be written as the sum of the ODs of the individual absorbers: In order to improve the fit results sometimes differential cross-sections are applied to better differentiate between narrow 20 and broad band spectral features, but it is always assumed that each trace gas has its own 'fingerprint' not mixed with another absorber.
Looking at the higher order terms discussed before, it becomes clear that this assumption (Eq. 28) does not hold for scattered light observations, i.e. there are 'fingerprints' that cannot be attributed to one trace gas alone but are products of several absorbers. 25 It can in particular be shown that the classical definition used for OD, SCD and AMF are affected. The AMF definition used by e.g. Perliski and Solomon (1993) is based on the estimation of the intensity with and without an absorber.
Expressing both intensities according to Eq. 2 we obtain the OD related to absorber X as: where I −X means intensity without absorber X and β Xj is the absorption coefficient of the selected absorber. One might wish to investigate the contribution of each absorber to τ T in Eq. 28 separately. However looking at Eq. 28 one can immideately realize that for two absorbers X and Y : since: I.e. the OD of two absorbers (or any number of absorbers) is not equal to the sum of the individual ODs of these absorbers, since the cross-correlation between different absorbers is not considered. Consequently also the definitions of the AMF (Perliski and Solomon, 1993): and SCDs as: 15 are fundamentally inaccurate (V X is the vertical column density (VCD) of absorber X). Hence not only standard DOAS but also 'extended' applications like e.g. the AMF modified DOAS (Richter, 1997) are affected. The result in Fig. 3 is confirmed when the OD according to the classical definition is expressed including higher order terms.
The exact intensities with and without absorbers in Eq. 29 are replaced with the derivation according to Eq. 4. Considering 5 terms up to the second order, the OD is: or assuming spatially constant cross-sections: The classical OD thus consists of the absorption terms attributed to the absorber X (the first two terms on the right side of the equation) and the cross-correlative terms between X and all remaining absorbers. Summing the ODs for the individual absorbers, the cross-correlative terms are considered twice as much as in the OD calculation for the whole absorption together while the self correlative terms are considered correctly. 15 Applying Eq. 33, SCD is expressed Alternatively, according to the light path integral (LPI) definition, the SCD is defined as concentration integrated along the effective light path (e.g. Wagner et al., 2007). In the discrete notation it can be written as: with L T the effective light path obtained for the given absorption scenario: The corresponding definition of the OD would be the integration of the absorption coefficient along the effective light path: Please note that τ lX = τ cX (as defined in Eq. 29) indicating that even for a scenario with only one absorber present in the atmosphere (where the classical definition in Eq. 29 is valid), LPI definition (Eq. 39) is not accurate.
Also for the LPI definition the optical quantities can be expressed in terms of the Taylor series expansion. Expanding effective 5 light paths in Eq. 38 in Taylor series on absorption (for more details see Sect. 4.2) up to the second order, the SCD in Eq. 37 becomes: Please notice that the summation over k here also contains the self-correlative term, i.e. X ∈ k. In a similar way the OD can be approximated: While the application of the LPI definition has been limited for retrievals of weak absorbers (Marquard et al., 2000), Eq. 41 allows to quantify the disagreement for strong absorbers which arises due to overestimating the self-correlative term by two times in comparison to the classical definition (Eq. 29, see its approximation in Eqs. 34 and 35). The disagreement in principle can now be addressed in a quantitative way. The cross-correlative terms in the LPI definition are considered in a similar way 15 as in the classical formula.
In summary, both the classical and LPI definitions of the OD are not applicable to describe the total OD as the sum of the ODs of individual absorbers. In order to obtain an agreement between the sum of the approximations of the ODs τ k of individual absorbers and the Taylor series approximation of the total OD in the form of Eq. 28, one can use arbitrary factors a X,k instead to assign part of a cross-correlative OD to absorber X and factors a k,X to assign another part of it to absorbers k, 20 so that factors: for any k and with a X,k=X = 0.5. The OD definition for absorber X can then be written: Therefore the approximation of the OD in Eq. 43 can be used to define more appropriate quantities for SCD and AMF in Eqs. 32 and 33 for applications e.g. in forward modelling for profile retrievals (e.g. satellite limb or DOAS observations) like in AMF modified DOAS. The selection of factors a X,k is arbitrary and could depend on empirical considerations like the correlation properties between different fit parameters. E.g. for an absorber with more pronounced differential structures one could assign a larger factor, to account for the fact that cross-correlative structures can become assigned to the strong absorber 5 in the fit. In other words, the (self-correlative) absorption structure would largely correlate with the cross-correlative structure.

Variability of effective light paths on absorption and on wavelength
The effective light paths or box AMFs definded earlier (Eqs. 5,6,7,8,9,22,27) are calculated at a single wavelength and at no absorption (or at a given background scenario) while the DOAS fit is performed over a selected wavelength range, over which the effective light paths change both due to variability of the trace gas absorption as well as the atmospheric scattering 10 properties. Therefore it can be worth to parametrize the different effective light paths to account for such variations in the retrieval process without the need to perform RTM simulations for every wavelength within the fit window if atmospheric trace gas composition changes.

Wavelength dependency due to scattering
If the Taylor series expansion of Eq. 2 is made for and effective light paths are calculated at zero absorption, the wavelength

Dependency on absorption
By performing the Taylor series expansion with respect to absorption (considering the dependency of the weights w on absorption), effective light paths at an arbitrary absorption background can be expressed as a function of higher order effective light paths and the trace gas distributions. Assuming spatially constant cross-sections, the first order effective light path (as in Eq. 5) 25 for box J considering terms up to the third order can be written: Here trace gas concentrations c k and c K with k and K belonging to the same set are arbitrary background distributions of trace gases.
Such a parametrisation is practical, since it can be used for any wavelength chosen for the procedure described in Sect. 4.1 and for any trace gas cross-section as well as for any trace gas profile. This allows to adjust the effective light paths with small calculation effort, which can be useful for iterative Gauss-Newton-like retrievals, for which a linearisation of the forward model 5 is used and is updated after each iteration. An iterative algorithm can be implemented without the need to redo (much slower) RTM simulations. With small modification (see next subsection) also the spatial variability of cross-sections can be accounted for. at different wavelengths in the middle visible spectral range (520 -570 nm). The largest discrepancies occur for altitudes below 15 the TH, which usually only have a small effect on the measured signal. The largest contribution to the measurement is from and above the TH. At TH they exceed 2-3% only at 570 nm where the absorption OD is extremely strong (around 0.9, see Fig. 1, b), above the TH the discrepancy strongly decreases and converges towards zero few km above.

Spatial variability of cross-sections
If the true cross-section σ Tkj at a certain box j differs from the originally considered cross-section σ k , the impact on the 20 absorption will be the ratio of both R kj = σ Tkj /σ k . In this way one can apply modifications by scaling the concentration of absorber k by the cross-section ratio in the respective formulas. E.g. the SCD in Eq. 13 can be rewritten in order to account for the spatial variability of the cross-section and/or the cross-section variation on wavelength: In a similar way, i.e. by considering the the cross-section ratio, one can include modifications also in all other formulas 25 containing products of cross-section and concentration.
5 Application in retrieval algorithms

Implications for Taylor series DOAS
In comparison to standard DOAS, Taylor series modified DOAS (Puķīte et al., 2010) suggested adding higher order terms for strong absorbers (e.g. ozone) directly in the fit. Also terms to account for the wavelength dependency can be fitted. In this way the linear equation structure between the logarithm of the intensity and the absorption parameters is kept and the 5 retrieval problem can be solved by the usual least squares technique. As a result, fit residuals highly improved and the retrieval of minor absorbers became more stable. However as seen in e.g. Fig. 12 in Appendix C in Puķīte et al. (2010), higher order impacts of strong absorbers could not completely account for the variation of the OD of weak absorbers (in that case BrO) with wavelength. Even considering the variation of the BrO SCD with wavelength only hardly improved the agreement. The remaining problem is that also the cross-correlative terms between the weak absorber (BrO, NO 2 ) and the strong absorber 10 (ozone) must be considered. This can be done by adding the cross-section product of both in the fit. Although easy to implement, several reasons complicate the use of these terms for a non-iterative retrieval algorithm. They include: (1)  interpretation (e.g. for the retrieval of spatial distributions) requires a non-linear algorithm.
The OD for two absorbers considering terms up to the second order can be written.
Standard DOAS accounts only for the terms 1 and 2. Taylor series DOAS as discussed in Puķīte et al. (2010) added also term 3 (X≡O 3 ). (Terms considering the wavelength dependency caused by scattering are also added but are not shown here).
Term 4 can be neglected for weak absorbers, while term 5 shows a rather strong impact if it is ignored (see e.g. Fig. 12 bottom panel in Appendix C in Puķīte et al., 2010). However, adding it, assuming it as fully contributing to the OD of the weak absorber leads to errors by overestimating the impact of the cross-correlative terms (Sect. 3.5). The effect is not negligible with 25 respect to the OD of the weak absorber (see Eq. 21). If however the correlative term is added to the fit but not assumed to contribute to the OD of the weak absorber, the fit coefficient of term 2 of the weak absorber is basically an approximation for the SCD, which would be measured if the strong absorber X would not be present in the atmosphere. So one could in principle retrieve profile information from the retrieved SCD by performing forward modelling for an atmosphere with no absorption. In practice, however, this first order term has a rather high fit error because of the correlation with the inter correlative term and consequently the profile retrieval will give rather poor results (the retrieval error is high).
As an example, a DOAS evaluation for a simulated limb spectrum at TH=19.83 km by applying the Taylor series approach 5 is shown in Fig. 6 for the wavelength range 518-570 nm, where the ozone absorption is especially strong. The fit parameters are listed in Table 1. To improve the condition of the forward model matrix, the polynomial parameters are orthogonalised.
The same is done for the ozone parameters; they are all labeled as O 3 in the figure. 10 mio light path trajectories are used to simulate the spectrum reducing the noise below 2E-4 allowing a good distinction also for the second order cross-correlative term between ozone and NO 2 . Fig. 7a compares the fitted ODs summed according to their attribution to certain trace gases 10 or scattering in Table 1 with the corresponding values calculated according to the formulas in the right column of the table.
Good agreement is obtained. It can be seen that the OD of the second order cross-correlative term between ozone and NO 2 is approximately 10% of the OD of the first order NO 2 term -a potential error source if neglected. Additionally, the effect of spectral noise on the retrieval is investigated by adding random Gaussian noise to the simulated spectrum. The noise level was selected in a way that the fit residual increased ∼10 times (i.e. to 2E-3) in comparison to the residual for the original 15 simulation. Fig. 7b, however, reveals even higher variation (up to 5E-3) in the fitted NO 2 and cross-correlative terms. The variation can be explained by the large covariance between both terms as shown in Fig. 8 where the statistical properties (mean and standard deviation) of the fitted NO 2 ODs are plotted in comparison with the calculated OD from the effective light paths.
For the statistical analysis the spectrum with one thousand different noise spectra added is evaluated. It can be seen that (1) the random noise does not introduce obvious systematic error structures in the fit result and (2) the standard deviations of the 20 two fit terms are very similar in magnitude, but the standard deviation of their sum is reduced almost three times. In practice, therefore, it might be necessary to utilize both terms together for further retrieval steps (e.g. spatial evaluation).
To illustrate the consequences if the cross-correlative term between ozone NO 2 is not considered, Fig. 9 compares the resulting NO 2 OD when the cross-correlative term is either included or excluded from the fit with the simulated OD according to the classical formula (Eq. 29). Also the respective second order approximation for the classical definition calculated from 25 the effective light paths, i.e. the sum of the first order NO 2 and the cross-correlative term between ozone and NO 2 (compare to Eq. 35) are plotted. Again a good agreement between the simulated classical OD and the approximated OD can be seen (around 1%). It is also found that the agreement with the evaluation, where both the first order NO 2 and the cross-correlative terms are included is varying only slowly with wavelength and is less than 2%. In addition to the higher order effects, the variation might also be caused by broadband (i.e. scattering) features, the RTM behaviour (e.g. Monte Carlo (random) noise in the simulated 30 spectrum), as well as numerical effects in the fitting algorithm. If however the 1st order NO 2 term is left alone in the fit, the relative difference has a similar shape as the ozone cross-section with the amplitude one order of magnitude larger reaching 15%.

Suggested algorithms
Due to the high cross-correlation between the different order terms, the retrieval quality, among others, depends on the way how this correlation is accounted for. There are several possibilities including: (1) A quasi full retrieval approach. The absorption OD is calculated for each considered spectral point according to the higher order Taylor series expansion, and the forward model is linearised at a certain a-priori absorption scenario first. Then an 5 iterative least squares fitting technique is applied improving the knowledge about the trace gas distributions and adjusting the forward model according to the improved knowledge using the effective light paths of different orders. This method must be the most accurate but also the most time consuming as the calculations are performed for every spectral point and the inversion is performed for all spectral points and measurement geometries together. But it will be still faster than the classical full retrieval approach because no new full RTM calculations are necessary for each iterative step.

10
(2) A quasi AMF modified DOAS. Other than the method by Richter (1997) where full RTM simulations are performed for every wavelength, here AMFs are calculated utilising pre-calculated effective light paths of different orders. Also the limitation of the classical AMF definition for a multiple absorber scenario can be considered, e.g. by applying the OD definition according to Eq. 43 when calculating AMFs e.g. by Eq. 32. AMFs for the retrieved trace gases are calculated for an a-priori profile and are applied in a DOAS fit for each measured spectrum separately. Then the trace gas profile is retrieved and the procedure is 15 repeated adjusting the AMFs, e.g. with respect to the spatial variation of the cross-sections. This method is also quite accurate since it considers effects between different absorbers in the approximation of the AMFs. Similary as in the previous method, also here, in this case -AMF, calculations for every spectral point are required, but the spatial retrieval is done separately from the spectral analysis (which is used to retrieve the SCDs, the product of the fitted VCD and AMF), so the inversion requires less computational resources.

20
(3) A two step approach. In the first step, a DOAS fit is performed including all relevant higher order terms, especially also the inter-correlative terms. In the second step (spatial retrieval) a retrieval procedure is performed at a selected wavelength (e.g. a wavelengths in the center of the spectral region or where the differential absorption structures are largest, or the absorption effects smallest; with a compromise between all these aspects). Further there are two possibilities: (a) to use only the first order terms which describe the absorption as there would be no influence of absorption by the strong absorber (the absorption is 25 sorted out by the cross-correlative term) (b) to perform a non-linear spatial retrieval including the cross-correlative terms of the absorber of interest and the strong absorber. The possible correlation between the different order terms of the same absorber, however, needs to be approached iteratively and at the end a balance between (a) and (b) should be found. This method would be faster than the first two, because the effective light path calculations are limited to one or few spectral points. At the same time, the algorithm would be less accurate due to the correlations between the first order and second order terms in the DOAS 30 fit, or, in other words, it requires better measurement quality.

An iterative two step approach
While the implementation of the algorithms mentioned in Sect. 5.2 is not a subject of this study, for the following sections a fourth retrieval approach is suggested and implemented. Here the effective light paths up to the 2nd order are applied to retrieve vertical profiles. The principle is similar to the two step approach in Puķīte et al. (2010). First, a DOAS fit is performed to the measured spectra, second, the spatial information is obtained by inverting the fit results to vertical profiles. The substantial 5 change here is the implementation of an iterative process where the retrieved profiles of the relevant trace gases are used to recalculate the absorption spectra. In the subsequent iterations, the measured spectrum is corrected by the calculated spectra, the DOAS fit is performed for the difference, and the retrieved profile is adjusted by eventually arising corrections (a Gauss-Newton approach is implemented).

Spectral evaluation 10
First, a DOAS fit is performed for the simulated spectra similarly as it is done by the non-iterative algorithm described in Puķīte et al. (2010). Afterwards, the obtained SCDs S f are used to calculate the OD contribution for absorber X: where σ gX are the fitted parameters of trace gas X. Normally they are the corresponding cross-sections of the trace gas. For ozone, also the additional Taylor series terms are considered as in Puķīte et al. (2010). The calculation of the OD by Eq. 47 15 is performed at the selected wavelength at which the spatial evaluation is later performed. A wavelength of 342 nm (located between strong ozone absorption bands while close to the most pronounced BrO absorption band) is used for the UV fit window of 337-357 nm. For the visible fit window (519-570 nm), the selected wavelength is 545 nm. The trace gases X considered in the calculation and in the profile retrieval (see next subsection) are BrO, NO 2 , O 3 and NO 2 , O 3 , for the respective fit windows.

Spatial evaluation 20
Effective light paths of the first and second order pre-calculated for zero absorption are used to calculate first order effective light paths for the profiles retrieved in the previous iteration (for the initial iteration for the a-priori data) at the selected wavelength according to Eq. 44. In order to allow the usage for the selected wavelength, they are approximated by a fit with broad band wavelength function (see Sect. 4.1).
The profile is retrieved according to the Gauss-Newton optimal estimation scheme (Rodgers, 2000): starting with the a-priori information as a first guess, i.e. c Xi = c Xa . ∆τ X is the difference between τ f X and τ Xa for the retrieval of the first guess. τ Xa , which is applied only for the initial inversion, is the absorption OD calculated according to the approximation of the classical definition (Eq. 34) for the a-priori scenario at the selected wavelength. The elements of 20 the weighting function matrix K Xi are the products of the first order effective light paths and the cross-sections. Weighting functions and ODs are used instead of effective light paths and SCDs in order to enable the possibility to account for the temperature dependency of the cross-sections (important when the retrieval is applied on real measurements). E Xe is the error covariance matrix with the DOAS fit variances as its diagonal elements, the off-diagonal elements are zero. E Xa is the a-priori covariance matrix. The applied retrieval settings for the sensitivity studies are summarized in Table 3.

Iteration process
In further iterations, absorption spectra based on the first and second order absorption terms according to Eq. 4 are calculated from the profiles retrieved in the previous iteration for all wavelengths within the fit window considering the broadband approximation and possible temperature dependencies of the cross-sections. Then the DOAS evaluation is performed again for each measured spectrum corrected by subtracting the calculated absorption spectrum (referred to as corrected spectrum in the 10 following). The spatial retrieval is performed again, the difference ∆τ in Eq. 48) is now the newly obtained τ f X . Since the retrieval is converging fast, a fixed small number of iterations (2-3) is selected.

Sensitivity studies of the retrieved profiles
We apply the iterative two step algorithm to absorption spectra simulated at different THs to investigate its ability to reproduce the true profiles used for the simulated spectra and compare it with the linear algorithm where (1) no iteration for spectral 15 correction is applied and (2) a linear forward relationship between the a-priori and OD is assumed. I.e. while the spatial retrieval is performed similarly as the initial step of the iterative algorithm applying Eq. 48, τ Xa (as introduced in Sect. 5.3.2) is calculated according to the light path integral definition in Eq. 41.

Comparison of the iterative two step approach with the linear two step approach
Figs. 10 and 11 illustrate the effectiveness of the iterative two step approach to consider spectral features of the involved 20 absorbers in both the UV and visible spectral range. In both cases the ODs of the minor absorbers (as derived from the fit to the corrected spectra) are reduced to zero (for the given simulation precision) when the iterative approach is applied.
It means that the forward model considers all spectral features correlated with the fitted terms of these trace gases. Here especially the wavelength dependency and cross-correlation between different absorbers is accounted for. For ozone some remaining absorption can still be seen comparable with the magnitude of the contribution of its third order terms (compare 25 with Fig. 1). Also an almost complete removal of systematic residual structures is observed for the UV fit window; the rest is most likely attributed to higher order effects or simulation errors. For the visible spectral range, where the residual is dominated by simulation noise (although the same photon trajectory number is simulated by RTM as in the UV), small changes can be observed especially at longer wavelengths with higher ozone absorption.
The obtained vertical concentration profiles are plotted together with the true profiles in Fig. 12  were simulated for the true scenario (blue lines), and another, for which the effective light paths are calculated for the a-priori profiles from the first and second order absorption terms simulated at zero absorption background (green lines). For the iterative two step retrieval, results for the initial retrieval step (red lines) and the final results after iterative process (solid magenta) is shown. For this retrieval additional result are shown, for which the cross-correlative terms are skipped in the calculation of the correction spectra (dashed magenta line), as discussed in detail in Sect. 5.4.2. The top panel shows results for the retrieval 5 in the UV, the bottom for the retrieval in the visible. To minimise possible effects due to spatial under-sampling and a-priori constraints, also true profiles smoothed by the averaging kernels (according to Connor et al., 1994) are included in the figure.
Even plots show the relative difference between the retrieved profiles and the smoothed true profile. It should be noted, that the differences to the true (non-smoothed) profiles could in principle be caused by a contribution of the a-priori profiles to the retrieved profiles (if the effect of the a-priori on the retrieval is large). However, here the measurement response (i.e. the 10 sum of the rows of the averaging kernels) is close to unity above the altitude of ∼12-16 km (depending on involved trace gases) indicating that this effect is small. Hence the differences mentioned above are mainly caused by the smoothing alone.
The ozone profile retrieved in the UV (see two plots on the top, left) shows an overestimation of less than 4% for all retrieval methods, indicating that there is only a small non-linearity effect on the retrieval of the ozone profile itself. This finding can be explained by the fact that (a) the spatial retrieval is performed at a wavelength outside the strong ozone absorption band, (b) 15 the non-linearities are largely accounted for by the fitted Taylor series ozone terms and (c) by a good representation of OD at the selected wavelength by the fitted ozone OD. Interestingly, the iterative retrievals (even without the necessity for iterations) show good agreement (better than 1%) for altitudes between 17 and 33 km, while the linear retrievals show discrepancies of up to 2% for practically all altitudes. For altitudes below 15 km, however, the iterative retrieval shows larger discrepancies of ∼3%. The application of effective light paths simulated for the true profiles or approximated for the a-priori profiles does not 20 give significantly different results: the results agree within ∼1% range with respect to each other.
In the visible spectral range, much larger ozone absorption occurs increasing gradually with wavelength. Thus it is not possible to select a wavelength with low ozone absorption for the spatial evaluation. Therefore an overestimation of the retrieved profiles (see the two plots on the left in the bottom panel) by the linear retrieval of up to 16% is observed because the effective light path integral formalism for the OD used in the retrieval set-up overestimates the higher order contributions to the 25 absorption OD (see Sect. 3.5). At the same time, in the initial step of the iterative approach the classical OD definition (Eq. 34) is used. Thus, higher order self-correlative terms are correctly approximated in τ Xa in Eq. 48 (while the contribution of the cross-correlative terms with weak absorbers can be ignored). Good agreement (within 1%) is achieved above the ozone peak between 20 and 28 km altitude. Below and above this altitude range the concentration is under-and overestimated, respectively, by up to 4%. The iteration process leads to an improvement also for these altitudes reaching an overall agreement within 1%.

30
For weak absorbers like NO 2 , the higher order self-correlative terms are negligible, explaining that almost no difference between the linear retrieval approach and the initial step of the iterative approach (see upper third and fourth plot) is found (because both OD definitions consider cross-correlative terms similarly). The difference with respect to the truth of up to 25% for altitudes just below the ozone peak in the UV could be caused by the incorrect accounting for the cross-correlative effect with ozone (either caused by the lower ozone a-priori profile in comparison with the truth or the incorrect OD definition, see discussion in Sect. 3.5). Since, however, the retrieval utilising the effective light paths simulated for the true profiles agrees with the retrieval utilising the effective light paths approximated for the a-priori profiles, the discrepancy is not caused by the difference between the true and a-priori ozone profile. The most likely reason for the disagreement is the broad band variation of the NO 2 OD with wavelength (due to scattering), since other than for ozone, no variation of the NO 2 OD due to dependence of scattering with wavelength is assumed in the DOAS fit. And, in contrast to BrO, the NO 2 absorption features are distributed 5 rather equally along the fit window, while the selected wavelength for the spatial retrieval is more at the beginning of the fit window. The profile obtained after the iteration process, however, shows good agreement (better than ∼1%) with the truth for altitudes above 16 km. Deviations below this altitude for the retrieved NO 2 profiles are caused by the a-priori profile according to our investigation. The results for the visible (bottom, middle plots) reveal disagreement with the truth for the linear retrievals when the effective light paths are calculated for the a-priori profiles. While above the ozone profile peak an overestimation 10 of up to 6% is found, below the ozone profile peak an underestimation of up to 8% occurs. Much better agreement with the truth is obtained for the linear retrieval with effective light paths calculated for the true profiles and the iterative retrieval (better than 2-3% for altitudes above 20 km). Below this altitude, an overestimation of up to 10% for the linear and 8% for the iterative retrieval is obtained, which does not depend on the a-priori and is potentially caused by the cross-correlation of the fit parameters or even higher order Taylor series terms not considered in the retrieval. Due to the almost perfect agreement 15 between the iterative retrieval and the linear retrieval (when the effective light paths are calculated for the true profiles) the assumption of an influence of possibly incorrect cross-correlative terms in the OD definition can be excluded. Similarly as in the UV, the NO 2 profiles below 14 km are affected by the a-priori.
The BrO retrieval in the UV (right two plots on the top) shows similar findings as the NO 2 retrieval in this spectral range: an agreement between the linear retrievals and the initial step of the iterative retrieval is found, but with a smaller discrepancy 20 to the truth. This finding is explained by the selection of the wavelength for the spatial evaluation close to the strongest BrO absorption band: the BrO profiles are slightly underestimated by up to 4% for altitudes up to 30 km. The iterative retrieval improves the agreement to 2-3% for altitudes below 25 km, while at higher altitudes the agreement is similar to the linear approaches.

Effect of cross-correlative terms between different trace gases 25
As stressed in Sect. 3.3.2 and demonstrated by the application of the Taylor series approach (Sect. 5.1) the consideration of cross-correlative higher order terms is especially important for the retrieval of the minor absorbers. We investigate their importance in the application of the iterative algorithm by performing an iterative retrieval without considering the crosscorrelative terms in the calculation of the correction spectra. The resulting profiles (see magenta lines in Fig. 12) confirm the aforementioned arguments and results: substantial discrepancies with respect to the true profiles are observed for both minor 30 absorbers (BrO and NO 2 ) while they are negligible for ozone. The discrepancies are largest for the NO 2 results in the visible spectral range (more than 20% around 20 km altitude) due to the larger ozone absorption, while in the UV the difference to the truth does not exceed 5%.

Dependency on the wavelength selection for the spatial retrieval
We investigate the stability of the iterative retrieval if the spatial retrieval is performed at different wavelengths. Fig. 13 displays the relative difference with respect to the true profiles when the spatial evaluation step of the retrieval is performed at different wavelengths within the fit window. While for the linear retrieval the discrepancies strongly depend on the selected wavelength, the iterative retrieval shows similar results for all wavelengths. For the linear retrieval, the discrepancies for the ozone profile 5 are larger for wavelengths at the strongest ozone absorption bands in the UV as well as smaller at shorter wavelengths in the visible where smaller ozone absorptions occur. For NO 2 , the agreement is better at longer wavelengths in the UV but even then some discrepancies are left. In the visible, the profile is overestimated at higher altitudes (above 20-25 km depending on wavelength) increasing with wavelength, while it is underestimated at lower altitudes. For BrO, however, good agreement for the linear retrieval is obtained for the lower wavelength part of fit window (where the strongest BrO absorption peak occurs) 10 but close to the ozone ozone absorption bands. In short, the application of the linear approach requires detailed sensitivity studies to find the best retrieval wavelength and the agreement may even depend on the involved fit parameters. At the same time, the iterative approach shows similar performance at all wavelengths for all involved trace gases.

Effect of including the Taylor series terms in the fit
For strongly non-linear problems, and if the a-priori is too far away from the true profiles, the results during the iterative retrieval might converge towards profiles that differ from the truth. Applying the standard DOAS fit (i.e. skipping higher order terms), much larger discrepancies are found even after the iterative process ( and (2) consequently the retrieved profiles get systematic biased, which affects the calculated correction spectra. The retrieved profiles by different approaches are shown in Fig. 17. While the significantly disagreeing results of the linear retrievals are corrected to a large extent by the iterative approach, the uncertainties in the UV are by a factor of 2-4 larger compared with the retrieval including the additional ozone terms. The iterative retrieval in the visible spectral range, which is characterized by much stronger absolute but smaller differential ozone absorption, also resuls in larger uncertainties (by by a factor 2-4) above 5 the ozone profile peak, while the ozone concentration is disagreeing by 7% and the NO 2 concentration by more that 30% at 20 km altitude.

Application to SCIAMACHY measurements
In the following we apply the two step iterative algorithm introduced in section 5.3 for the retrieval of trace gas profiles from SCIAMACHY limb measurements. We investigate the response to different retrieval settings (like the cross-section 10 dependency on temperature, or the consideration of the cross-correlative terms). We also compare the retrieved profiles with

Retrieval settings
The retrieval of vertical profiles from SCIAMACHY measurements is performed in a similar way as for the studies with simulated spectra above. The retrieval scheme is based on the linear two step retrieval algorithm developed in our group (Puķīte  Table 3. Besides the iterative retrieval scheme, the parameters considered are similar as in Puķīte et al. (2010), with some modifications as indicated in Table 4. The essential modifications in the spectral fit include:

5
(1) Use of the more recent temperature dependent ozone cross-sections from Serdyuchenko et al. (2013); we include crosssections at 193K, 213K and 233K in the spectral fit.
(2) The measured spectra are gridded to a tangent height grid of 3 km.
(3) Spectra at the same tangent height (across track) are averaged.
(4) A Sun spectrum (A0 spectrum in the SCIAMACHY dataset) is used as a reference spectrum to improve the signal to 10 noise ratio in the fit. In order to correct for possible offsets in the retrieved quantities caused by unexplained spectral structures in the Sun spectrum, the absorber optical depths are corrected (by subtraction) by the optical depths retrieved at high TH.
(5) The spectral shift is linearised and the shift between the reference and the analysed spectra is fitted as additional fit parameter according to the method in Beirle et al. (2013) (during the iterative process, the wavelength grid is adjusted to account for non-linearities in the shift) 15 (6) To account for the spectrally varying polarisation sensitivity of the SCIAMACHY instrument eta and zeta spectra are orthogonalised and the spectra corresponding to the largest eigenvalues are considered in the fit.
The settings for the spatial evaluation and the calculation of the correction spectra are similar as for the sensitivity studies, but in addition cross-sections at all available temperatures are considered for the calculation of correction spectra and effective light paths.

20
The most important modifications in the spatial retrieval step include the following (see Table 4 for more details): 1) The vertical profiles are now obtained on a 1 km retrieval grid.
2) Instead of the slant column density the optical depth is used for the inversion allowing the consideration of the temperature dependence of the cross-sections.  Although the modifications are discussed in the scope of the iterative algorithm, the retrieval with the linear approch used in the comparison plots in the next subsection also includes all respective modifications while omitting the iterative process.
Consequently the results are not the same as presented in Puķīte et al. (2010) or Rozanov et al. (2011).

Results for BrO
We retrieve vertical concentration profiles by the iterative algorithm and the linear algorithm and analyse the effect of considering the temperature dependence of the cross-sections or the second order cross-correlative terms. We compare the vertical profiles retrieved from SCIAMACHY measurements with collocated LPMA/DOAS balloon observations (Dorf et al., 2006(Dorf et al., , 2008Puķīte et al., 2010;Rozanov et al., 2011). For a detailed description of the balloon DOAS instrument, the retrieval al-

15
An agreement between SCIAMACHY and the balloon profiles (within the error bars) can be seen for most of the profiles and altitudes, which fulfil the collocation criterion (indicated by the yellow shading). The profiles obtained by the linear retrieval are systematically lower than presented in the retrieval comparison by Rozanov et al. (2011) where they were found systematically too high in comparison to retrievals by other groups. This decrease compared to the previous version can largely be explained by the use of another wavelength (342 nm) for the spatial retrieval. Also the different treatment of the Fraunhofer 20 reference might contribute to the change (we found a systematic bias caused by TH reference spectrum applied in the previous algorithm). The profiles from the iterative retrieval without yet considering the temperature dependence of the cross-sections shows an increase compared to the linear retrieval. The difference is increasing with altitude (from around 0% below 20 km to 10% around 30 km). The tendency is comparable to the findings from the simulated spectra where also a positive offset is obtained for the altitude range around 20 km between the linear and iterative retrieval. However, for the real measurements the 25 difference is by ∼5% smaller. The consideration of the temperature dependence of the cross-sections leads to lower values in comparison to the retrieval assuming a constant temperature. A seasonal variation is found similar to the findings by Rozanov et al. (2011). They investigated the effect of the temperature dependent BrO cross-sections and found a typical effect of 10% with a strong seasonal variation with altitude. Keeping in mind that in the current study also the temperature dependence of the ozone cross-section is or is not considered as well, a larger effect (up to 15 to 20%) for high latitudes in spring is obtained 30 caused by interference with the temperature dependent ozone absorption features. Exclusion of the cross-correlative terms in the correction spectra (which is subtracted from the measured spectra in the upcoming iterations of the iterative retrieval) shows only an effect of few percent which is in accord with the simulation studies. in principle, the higher order cross-correlative structures can be considered in the DOAS fit as additional fit parameters, to obtain meaningful results good signal to noise ratio is required to minimise the cross-correlation between the fitted parameters.
To avoid the need to apply a full retrieval algorithm requiring time-consuming, iterative wavelength by wavelength forward modelling of absorption spectra, we introduce an iterative two step retrieval algorithm separating the spectroscopic and spatial retrieval steps. From the profiles retrieved in a previous step corrections for the measured spectra are calculated. After they 15 are applied to the measured spectra, the next iteration is performed. Usually, convergence is found after 2 to 3 iterations. The correction spectra are calculated based on the a-priori profiles or the profiles derived in the respective iteration. The calculation is based on RTM calculations at only a few selected wavelengths for an atmosphere with zero (or background) absorption), which strongly reduces the calculation time. Also higher order absorption effects and the temperature dependence of the cross sections is considered. The quantitative definition of higher order absorption effects can be obtained by expanding the radiative 20 transfer equation into higher order Taylor series. Higher order effective light paths are calculated as weighted mean products of the light paths through several boxes. By approximating the wavelength dependency of the first and higher order effective light paths derived from pre-calculated RTM simulations at several wavelengths by a broad band function, we calculate absorption spectra and first order effective light paths for any atmospheric trace gas composition and for any wavelength within the fit range thus eliminating the need for RTM simulations during the retrieval process. We find a strong influence by second 25 order absorption structures for retrievals from measurements in limb geometry, which often exceed the contributions by minor absorbers (ODs ∼10 −3 -10 −1 ). Even third order terms show potential significance: for atmospheric scenarios typical for Arctic spring (with the stratospheric ozone concentration larger than the yearly average), we found an OD for the third order ozone absorption term of up to 10 −2 at the absorption peak at ∼333 nm and exceeding 10 −4 even at ∼344 nm, while also in the visible (at 550 nm) it is ∼10 −3 . Also cross-correlative terms between ozone and other trace gases can have significant ODs 30 and thus have the potential to cause retrieval errors if not considered by retrieval. Since higher order contributions to the OD structures are proportional to products of the trace gas concentrations, these terms will increase or decrease proportionally to changes in the respective products, e.g. second order self-correlative terms are proportional to the squared concentration. By applying the iterative two step algorithm for simulated measurements, the residual structures caused by absorption are reduced: Sensitivity studies reveal good agreement within 1% for retrieved ozone profiles with respect to the truth. This good agreement is obtained for most of the altitudes for retrievals both in the UV and the visible spectral range even if the effective light paths up to just the second order are considered. The agreement practically does not depend on the selected wavelength for the spatial evaluation. Also for the minor absorbers NO 2 and BrO the retrieved profiles are wavelength independent with an agreement within a few (1-3) percent for altitudes above 17-20 km. At the same time, linear retrieval algorithms (without an 5 iterative process) without complete correction for higher order effects show partly significant discrepancies and the retrieved values are strongly depending on the wavelength selected for the spatial evaluation. Good agreement is also found between the retrieval results when the second order approximation for the effective light paths or their direct calculation by RTM are used. The pre-calculation of higher order sensitivity parameters by RTM allows to enhance the performance speed of iterative retrieval algorithms. The consideration of only second order absorption terms in the simulation of the correction spectra in the 10 iterative two step approach accounts for most of the non-linear absorption effects reducing the systematic residual structures to the order of ∼10 −4 for scenarios even with strong ozone absorption. An agreement within a few percent for simulated trace gas profiles typical for SCIAMACHY limb geometry is obtained. Also the application of the iterative two step algorithm on real SCIAMACHY measurements confirms the findings of the sensitivity studies based on synthetic spectra and shows good agreement with the collocated balloon measurements. The algorithm profits from the additional possibility to correct the  15 and the third order effective light paths for boxes j, J,J are: where w i,is is the weight and l i,isj , l i,isJ , l i,isJ are the light paths for the i s th scattering event of the ith trajectory. i, i s here corresponds to i in the notation of Eqs. 7, 8 while the notation of j, J,J is the same. Also here light paths not crossing some or all two or three involved boxes are considered by a length product being zero, with their weight contributing to the sum in 20 the denominator.        Table 1 in comparison with a calculation from the simulated terms at zero absorption (formulae in the right column in the table). b: the same as panel a but the retrieval is performed with random Gaussian noise added to the simulated spectra. The noise leads to an increase of the fit residual by 10 times when compared to the fit in Fig. 6. The results are shown for 10 different random noise spectra.