Retrievals from GOMOS stellar occultation measurements using characterization of modeling errors

In this paper, we discuss the development of the inversion algorithm for the GOMOS (Global Ozone Monitoring by Occultation of Star) instrument on board the Envisat satellite. The proposed algorithm takes accurately into account the wavelength-dependent modeling errors, which are mainly due to the incomplete scintillation correction in the stratosphere. The special attention is paid to numerical efficiency of the algorithm. The developed method is tested on a large data set and its advantages are demonstrated. Its main advantage is a proper characterization of the uncertainties of the retrieved profiles of atmospheric constituents, which is of high importance for data assimilation, trend analyses and validation.


Introduction
The scintillations of stars observed through the Earth atmosphere in occultation experiments are caused by interaction of stellar light with air density irregularities generated mainly by small-vertical-scale gravity waves and turbulence. It was shown (Polyakov et al., 2001;Sofieva et al., 2009) that the scintillation is a nuisance for reconstructing chemical composition of the atmosphere from spectral stellar occultation measurements. Due to their random nature, scintillations do not produce any bias in the statistics of an ensemble of reconstructed profiles. They only result in fluctuations in re-Correspondence to: V. F. Sofieva (viktoria.sofieva@fmi.fi) trieved profiles of atmospheric constituents. In case of GO-MOS measurements on board Envisat, the perturbations in the stellar flux caused by scintillations are corrected by using additional scintillation measurements by the fast photometer operating in the low-absorption wavelength region 646-698 nm, λ red =672 nm (Dalaudier et al., 2001;Sofieva et al., 2009). The scintillation correction aims at reduction of rms fluctuations in retrieved profiles due to scintillation. The applied scintillation correction assumes that light rays of different color pass through the same air density vertical structures, thus the signal perturbations at different wavelengths are identical after appropriate shifting and stretching caused by chromatic refraction. This hypothesis is always satisfied in vertical (in orbital plane) occultations and it is true for scintillations generated by anisotropic irregularities, practically for all obliquities. However, this hypothesis may be violated in oblique (i.e., off orbital plane) occultations if isotropic turbulence is well developed. Sofieva et al. (2009) have shown that the applied scintillation correction removes the significant part of perturbations in the recorded stellar flux caused by scintillations, but it is not able to remove scintillations that are generated by isotropic irregularities of air density (mainly turbulence).
The remaining perturbations due to incomplete scintillation correction are not negligible. In oblique occultations of bright stars, the modeling errors at ∼20-45 km (mainly caused by scintillation) can be comparable or even exceed the instrumental noise by a factor of 2÷3 (Sofieva et al., 2009, Fig. 9). Neglecting modeling errors obviously results in underestimated uncertainties of the retrieved profiles.
The atmospheric transmission spectra obtained after dividing the stellar spectra observed through the Earth atmosphere by the reference spectrum, recorded above the atmosphere, contain spectral features of absorption and scattering by gases and particles. Each GOMOS UV-VIS transmission spectrum contains 1416 spectral values in the wavelength range 250-675 nm, and one stellar occultation comprises 70-100 spectra at different tangent altitudes in the range of ∼10-140 km. Vertical profiles of ozone, NO 2 , NO 3 and aerosol optical depth are retrieved from the UV-VIS spectrometer measurements. Since aerosol extinction spectrum is not known a priori, a second-degree polynomial model is used for the description of the aerosol extinction. The aerosol number density and two parameters that determine the wavelength dependence of aerosol extinction spectra are retrieved from GOMOS data. Due to non-orthogonality of cross-sections of Rayleigh scattering by air with the considered polynomial model of aerosol extinction, the air density is not retrieved from UV-VIS measurements by GOMOS. It is taken from ECMWF analysis data corresponding to occultation locations.
The GOMOS inversion from UV-VIS spectral measurements is split into two steps (Kyrölä et al., 1993. First, atmospheric transmission spectra T ext (λ,h) λ being wavelength), which are corrected for scintillation and refractive attenuation (Dalaudier et al., 2001;Sofieva et al., 2009), are inverted into horizontal column densities N for gases and optical thickness for aerosols, for every ray perigee (tangent) altitude h (spectral inversion). Then, for every constituent, the collection of the horizontal column densities at successive tangent heights is inverted to vertical density profiles (vertical inversion).
The GOMOS spectral inversion problem can be written in the form: where T ext are measured transmittances after the correction of refractive effects at altitude h, N are horizontal column densities at this altitude, is the matrix of effective crosssections, and ε represents the error term (noise and modeling errors). The spectral inversion is performed for each tangent altitude separately (i.e., independently of other tangent altitudes). It relies on the standard maximum likelihood method, which is equivalent to minimization of the χ 2 statistics under the assumption of a Gaussian distribution of the measurement errors: where T ext is a vector of observed transmission spectra, T mod is a vector of modeled transmittances, and C is the covariance matrix of transmission errors. No a priori information is used in the spectral inversion. The minimization of χ 2 is performed using the Levenberg-Marquardt algorithm (Press et al., 1992). If we assume that the transmission errors are due to measurement noise only, i.e., C=C noise , the inversion (2) is very fast, as C noise is diagonal. The idea of an accounting the modeling errors is very simple: the covariance matrix of the transmission errors C can be presented as a sum of two matrices (provided errors are Gaussian): where the diagonal matrix C noise corresponds to the measurement noise and C mod corresponds to the modeling error. The incomplete scintillation correction is the dominating source of modeling errors in the stratosphere. Other modeling errors are discussed in . The scintillation correction errors are not correlated at different tangent altitudes, thus allowing the representation (3). The scintillation correction errors result in wavelength-dependent perturbations in the transmission spectra, therefore C mod is essentially non-diagonal. Hereafter, we will refer to the inclusion of modeling errors as to the "full covariance matrix (FCM)" inversion. The main problems associated with this method are defining the covariance matrix of modeling errors and the efficient numerical solution of the minimization problem (2), as the straightforward inversion of the non-diagonal matrix C (of a relatively large size) reduces significantly the numerical efficiency of the retrieval. The FCM method is applied within the spectral inversion. The second step of the GO-MOS inversion, the vertical inversion, remains unchanged. The paper is organized as follows. Section 2 briefly describes the characterization of modeling error, which is presented in detail in (Sofieva et al., 2009). A numerically efficient implementation of the "full covariance matrix" inversion method is presented in Sect. 3. Section 4 assesses the FCM method and demonstrates its advantages. A discussion and a summary conclude the paper.

Parameterization of the modeling errors
As discussed by Sofieva et al. (2009), the main source of GOMOS modeling errors in the stratosphere (at altitudes ∼20-50 km) is the incomplete scintillation correction. In this section, we briefly describe the parameterization of the scintillation correction error. A more detailed discussion of this parameterization, which includes a justification based on the theory of isotropic scintillation (i.e., scintillations generated by isotropic irregularities of air density) and illustrations based on statistical analyses of GOMOS residuals, can be found in (Sofieva et al., 2009).
In the proposed parameterization, the modeling (scintillation correction) error is assumed to be a Gaussian random variable with zero mean and covariance matrix C mod : where indices i and j denote the spectrometer pixels corresponding to wavelengths λ i and λ j , σ is the amplitude and B is the correlation function of off-diagonal elements. The correlation function B is approximated by Here J 0 is the Bessel function of zero order and ξ is the ratio of the chromatic separation of rays corresponding to wavelengths λ i and λ j , ch (λ i ,λ j )sinα, to the diffractive Fresnel In Eq. (6), α is obliquity of an occultation, i.e. the angle between the direction of line of sight motion and the local vertical at the ray perigee point; α=0 • in vertical (in orbital plane) occultations, α >0 • in oblique (off orbital plane) occultations, with the limit α=90 • in case of purely horizontal occultations; ch (λ i ,λ j ) is the vertical chromatic shift of rays corresponding to wavelengths λ i and λ j (Dalaudier et al., 2001). For an illustration of these parameters, see Fig. 4 in (Sofieva et al., 2009). An example of the correlation function of the spectrometer pixels B(λ i , λ j ) at 30 km is shown in (Sofieva et al., 2009, Fig. 7). The amplitude of the scintillation correction error can be approximated as: where σ iso (z,λ,α) is rms of isotropic scintillations (relative fluctuations of intensity) in spectrometer channels, and the term 1 − b ph sp B(λ, λ red ) takes into account the influence of scintillation correction procedure. In (7), σ iso (z,λ,α) is parameterized as: Here σ 0 (z) is the "standard" profile of isotropic scintillation variance in the spectrometer channels, which was estimated using red photometer data (λ red =672 nm) from all occultations of Canopus in 2003 with obliquity α ∼50 • by the method explained in (Sofieva et al., 2007), and ρ 0 (z) is the average air density profile in the considered data set. The factors in (8) give the dependence of σ iso on wavelength λ, obliquity α (via dependence of ray velocity v in the phase screen (Dalaudier et al., 2001) on α) and the mean air density ρ(z). In (7), b ph sp is the ratio of isotropic scintillation variances of smoothed red photometer and spectrometer signals for λ red =672 nm, which is parameterized as: where ph ch is the vertical chromatic shift for wavelength 672 ± 25 nm, corresponding to the width of the red photometer optical filter. B(λ,λ red ) is the correlation coefficient between the smoothed signal of the red photometer and spectrometer channels, which is defined in the same way as the correlation of spectrometer channels, Eq. (5). The altitude and wavelength dependence of the amplitude of the scintillation correction error is illustrated by Fig. 8 in (Sofieva et al., 2009).
The proposed parameterization adjusts "automatically" the correlation and the mean amplitude of isotropic scintillations in spectrometer channels to the different measurement conditions (obliquity, altitude).

A numerically efficient implementation of the "full covariance matrix" inversion method
In this section, we discuss numerically efficient implementing the "full covariance matrix" inversion method.
Since the correlation of off-diagonal elements depends on altitude, the covariance matrix of modeling errors has to be evaluated for each altitude. This evaluation can be optimized as follows. The main parameter used in the parameterization of scintillation correction error is the ratio of the chromatic separation ξ of rays corresponding to wavelengths λ i and λ j to the Fresnel scale ρ F . The chromatic shift ch (λ i ,λ j ) is proportional to the difference in standard refractivity c(λ i )− c(λ j ) (Dalaudier et al., 2001): where q is refractive attenuation (Dalaudier et al., 2001), D is a distance from the ray perigee point to the observation plane, ε is a refractive angle, and λ 0 is a reference wavelength (for GOMOS, λ 0 =500 nm is chosen). Therefore, neglecting the chromatic dependence of q, q(λ,z) ≈ q(λ 0 ,z), ξ can be factorized into the altitude-dependent and wavelengthdependent terms: (λ i λ j ) 1/4 . Only the factor K(z) needs to be calculated separately for each tangent altitude, while X(λ,λ ) can be pre-computed for each wavelength pair.
Once the "full" covariance matrix C=C noise +C mod has been calculated, it is used in Levenberg-Marquardt iterations. At each step, the iteration involves solving the linearized normal equations, which have the form: Here N j is the current iterate of horizontal column densities with N = N j +1 -N j , J is the Jacobian corresponding to T mod (N) evaluated at N j , and γ is the parameter chosen at each step in order to ensure convergence (Press et al., 1992).
The most straightforward way to implement the modeling error is to compute the inverse of C and substitute it into (12). However, this approach is numerically inefficient, and it cannot fully exploit the sparsity of C. The sparsity of C depends on altitude and on obliquity: above ∼42 km and below ∼20 km, the number of zero elements exceeds 90% (it is advantageous also to truncate very small elements to zero, in order to enhance sparsity of C). The non-linear optimization can be made efficient using numerical methods in linear algebra. One approach is described below.
The covariance matrix C is positively definite and symmetric, therefore it allows the Cholesky decomposition C=LL T . This can be used in iterations: we define vectors u and W as and find them at each step. The normal equation becomes: and the objective function equals The main advantage of this approach is its numerical efficiency: computing the factorization is considerably faster than computing the inverse matrix, and the Cholesky factor L has almost the same sparsity as C. Consequently, the computations required in the iteration can be implemented with sparse linear algebra. Since L is lower triangular, finding W and u from (13) is not slower than multiplying by C −1 . Compared to the straightforward method using an explicit inverse matrix (which cannot be considered as a feasible option, of course), the Cholesky-based approach can decrease the time spent in the main computations (inverting or factorizing the covariance matrix and performing the Levenberg-Marquardt iteration) by up to 80%.

Assessment of FCM inversion
The FCM inversion was first implemented and tested using the GOMLAB modeling environment developed at Finnish Meteorological Institute, and it was implemented later into the GOMOS prototype processor GOPR v. 7.0 (see also Sect. 5).
The effect of including modeling errors into the spectral inversion is illustrated in Fig. 1, first for the individual occultation of Sirius at the orbit 7673. It is known that the quality of inversion can be characterized with the normalized χ 2 statistics χ 2 norm = χ 2 M−P , where χ 2 is defined by (2), M is the number of measurements and P is the number of retrieved parameters. If the model describes well the experimental data and the measurement errors are properly defined, χ 2 norm ≈ 1. If the modeling errors are ignored (i.e., the covariance matrix is taken C=C noise ), χ 2 norm can significantly exceed the value 1 in oblique occultations of bright stars; this indicates underestimation of the uncertainty of the retrieved densities. This is the case for the occultation shown in Fig. 1 (blue line). Inclusion of modeling errors reduces dramatically the χ 2 norm values (Fig. 1, red line). The second immediate observation is that the retrieved horizontal column density profiles are smoother for the "full covariance matrix" inversion, especially those of NO 2 and NO 3 , thus indicating more robust retrievals.
The features observed for the considered occultation of Sirius are common for oblique occultations of bright stars. Figure 2 shows the χ 2 norm values for four sets of successive night-time oblique occultations of the brightest stars Sirius and Canopus. The information about these data sets is collected in Table 1. The GOMOS successive occultations of a certain star are located at approximately the same latitude (maximum 14 occultations per day), and they are carried out at approximately the same local time. The values of χ 2 norm dramatically decrease when modeling errors are taken into account (FCM method). They become close to the ideal value χ 2 norm = 1. This indicates that the parameterization of scintillation correction errors proposed in (Sofieva et al., 2009) properly describes the main source for modeling errors at altitudes ∼20-50 km.
For characterization of smoothness of the retrieved horizontal column density profiles, we consider the profile fluctuations δN = N − N . The mean profile N is obtained by smoothing the original profiles with a rectangular window with the cut-off scale of 3 km. The altitude range was divided into segments of 5 km overlapping by 1 km. For each set of occultations, we computed rms of line density fluctuations δN inside each altitude segment. Rms fluctuations of the retrieved horizontal column density profiles are presented in Fig. 3, for retrievals without modeling error accounted in the inversion (blue lines) and for FCM inversion (red lines). The horizontal column density profiles become significantly smoother if the modeling errors are included into the inversion. The smoothing effect is associated with the correlation of measurement uncertainties, which is inherently taken into account in the FCM inversion (note that this is not regularization). The large unrealistic fluctuations in NO 2 and NO 3 profiles due to incomplete scintillation correction were noticed already in the first GOMOS data (as seen, for example, in Fig. 1, blue lines). In the current operational GOMOS retrievals (the IPF 5.0 processor), a variant of DOAS (differential optical absorption spectroscopy) method, so called Global DOAS Iterative method (Hauchecorne et al., 2005;Kyrölä et al., 2010) is applied after the spectral inversion for NO 2 and NO 3 . The application of Global DOAS iterations allows removal unphysical fluctuations in NO 2 and NO 3 profiles. The smoothness of profiles from operational retrievals is shown by green lines in Fig. 3. As observed, the smoothness of NO 2 and NO 3 profiles in the "full covariance matrix" inversion is comparable to that obtained with the Global DOAS Iterative inversion. This highlights the importance Atmos. Meas. Tech., 3, 1019-1027   of the correct characterization of modeling errors in the retrievals.
The GOMOS spectral inversion is followed by the vertical inversion aimed at reconstruction of local density profiles from the collection of horizontal column densities Sofieva et al., 2004). The vertical inversion is linear; it is slightly noise-amplifying (i.e., the fluctuations existing in horizontal column density profile are enhanced after the vertical inversion) ). An improved smoothness of the horizontal column density profiles obtained with FCM method is very advantageous therefore. In GOMOS processing, the Tikhonov-type regularization is applied in the vertical inversion for its stabilization Sofieva et al., 2004). The impact of FCM inversion on local density profiles is nearly the same as for profiles of horizontal column densities: the profiles become smoother, especially those of NO 2 and NO 3 . The effect of including modeling errors on local density profiles is illustrated in Fig. 4. This figure shows the retrieved vertical profiles for the occultation of Sirius whose horizontal column densities are presented in Fig. 1 and discussed above. Reduced fluctuations in local density profiles are originated from the increased smoothness of horizontal column density profiles, as expected by the error propagation in the linear inversion.  The proximity of χ 2 norm to 1 in the FCM inversion indicates that the characterization of the modeling errors is close to reality. At the same time, this ensures that the estimated accuracy of the retrieved profiles is close to reality. Figure 10 in (Sofieva et al., 2009) shows the uncertainty estimates for the retrieved GOMOS ozone profiles, which include the characterization of scintillation correction errors. In this paper, we discuss the changes in error estimates due to including the modeling errors in the inversion. Figure 5 shows error estimates for local densities of O 3 , NO 2 , NO 3 and aerosols with ignored and included modeling errors in the inversion. If the modeling errors are ignored, the uncertainty of the retrieved profiles can be significantly underestimated for oblique occultations of bright stars. For such occultations, the underestimation in uncertainty can achieve up to 1-2% for ozone, 2% for NO 2 , 40% for NO 3 and 15% for high-altitude aerosols (for aerosols below 20-25 km, the underestimation is significantly smaller, up to 7%). The altitude range of increased uncertainty corresponds to the range where the adverse influence of scintillation correction error is maximal. This range depends on constituent due to the influence of the wavelength-dependent scintillation correction error on spectral features of different constituents.
For dimmer stars, the percentage of modeling errors in the total error budget is smaller, therefore the underestimation of the resulting uncertainty of the retrieved profiles is less remarkable. For vertical occultations, where the scintillation correction works nearly perfectly, including the modeling errors via FCM is not important (however, such occultations constitute a small percent of the GOMOS data).

Discussion and summary
We have discussed inclusion of the modeling errors into the GOMOS inversion. For GOMOS, the main source of modeling errors at altitudes ∼20-50 km is the incomplete scintillation correction. The parameterization of scintillation correction errors proposed in (Sofieva et al., 2009), which adjusts "automatically" the magnitude and correlation of the modeling errors to different occultation geometry, allows accurate quantifying modeling errors in this altitude range. While accounting the modeling errors, the inversion procedure based on non-linear minimization has to be optimized. The numerical efficiency is of high importance in the processing of satellite data, due to their large amount.
Our paper presents the algorithm for numerically efficient processing of GOMOS data, with modeling errors taken into  (Table 1), which are indicated in the figure.  (Table 1), which are indicated in the figure.
account. It is based on using the Cholesky factorization and the subsequent change of variables. The "full covariance matrix" inversion was successfully implemented and tested on a large data set. Compared to the inversion without modeling errors, the following changes in inversion statistics and results are observed: (1) Dramatic changes in normalized χ 2 values: from χ 2 norm ∼20-50 in oblique occultations of bright stars to χ 2 norm ∼1-2, at altitudes 20-50 km. This ensures that the applied parameterization of scintillation correction errors proposed in (Sofieva et al., 2009) adequately describes the main source of modeling errors for altitudes ∼20-50 km. At the same time, this allows us to expect that the uncertainties of retrieved profiles are characterized properly.
(2) The horizontal column density profiles are smoother if the "full covariance matrix" inversion is applied, especially for NO 2 and NO 3 . This feature is advantageous for the GOMOS processing, as the subsequent vertical inversion is slightly noise-amplifying. The smoothness of NO 2 and NO 3 profiles becomes comparable to that obtained with the Global DOAS Iterative inversion (the current operational GOMOS processing).
(3) Significant increase of error bars (compared to the case of neglecting modeling errors) for oblique occultations of very bright stars is observed.
The main advantage of the "full covariance matrix" inversion is the proper characterization of the uncertainty of retrieved profiles. This feature is of high importance,   (Table 1) were used for the analysis. especially for validation, data assimilation and for time series analyses. The inversion method described in our paper is already implemented in next version of the GOMOS prototype processor (GOPR v. 7), and it is planned to be used for future reprocessing of GOMOS data.