Glyoxal retrieval from the Ozone Monitoring Instrument

We present an algorithm for the retrieval of glyoxal from backscattered solar radiation, and apply it to spectra measured by the Ozone Monitoring Instrument (OMI). The algorithm is based on direct spectrum ﬁtting, and adopts a two-step ﬁtting routine to account for liquid water absorption. Previous studies have shown that glyoxal retrieval algo- 5 rithms are highly sensitive to the position of the spectral ﬁt window. This dependence was systematically tested on real and simulated OMI spectra. We ﬁnd that a combination of errors resulting from uncertainties in reference cross sections and spectral features associated with the Ring e ﬀ ect are consistent with the ﬁt-window dependence observed in real spectra. This implies an optimal ﬁtting window of 435–461 nm, consis- 10 tent with previous satellite glyoxal retrievals. The results from the retrieval of simulated spectra also support previous ﬁndings that have suggested that glyoxal is sensitive to NO 2 cross section temperature. The retrieval window limits of the liquid water retrieval are also tested. A retrieval window 385–470 nm reduces interference with strong spectral features associated with sand. We show that cross track dependent o ﬀ sets (stripes) 15 present in OMI can be corrected using o ﬀ sets derived from retrieved slant columns over the Sahara, and apply the correction to OMI data. Average glyoxal columns are on average lower than those of previous studies likely owing to the choice of reference sector for o ﬀ set correction. OMI VCDs are lower compared to other satellites over the tropics and Asia during the monsoon season, suggesting that the new retrieval is biogenic and anthropogenic

Here we present a glyoxal retrieval for the Ozone Monitoring Instrument (OMI) with optimized retrieval settings. OMI offers superior spatial resolution and temporal coverage compared to existing satellite instruments (GOME-2 and SCIAMACHY) used to retrieve glyoxal. However retrieving glyoxal from OMI spectra is more challenging compared to the aforementioned instruments due to its lower spectral resolution and 5 smaller signal-to-noise ratio. We use simulated OMI spectra to test retrieval accuracy and apply a systematic approach (Vogel et al., 2013) to optimize the glyoxal fit window. OMI uses a 2-D CCD array in contrast to the linear photodiode array detectors used in GOME-2 and SCIAMACHY. This makes retrieved slant columns subject to cross track biases. We present a simple method to correct for these glyoxal offsets. 10

Methods
OMI was launched on the NASA Aura satellite in sun-synchronous orbit in July 2004, with an equatorial crossing time of 13:42 LT. It is a CCD spectrometer measuring backscattered solar radiation with a 13 km × 24 km nadir resolution and daily global coverage. It's spectral range is 270-500 nm divided over three channels, allowing the 15 retrieval of both HCHO and glyoxal. Glyoxal is retrieved in the visible channel (full spectral range 350-500 nm). The visible channel CCD is divided into 60 across track positions, with an average spectral sampling distance of 0.21 nm and average spectral resolution of 0.63 nm (full width at half maximum).
Glyoxal vertical column densities (VCDs, molecules cm −2 ) are determined using 20 a two-step approach widely employed for optically thin trace gas retrievals in the UV Visible spectral region. In the first step modelled spectra are directly fitted to observed OMI radiances to determine slant column densities (SCDs) that represent the integrated glyoxal number density through the mean photon path from the Sun to the instrument. In the second step, the SCDs are translated to VCDs using air mass factors 25 (AMFs) computed using a radiative transfer model (Palmer et al., 2001).

Fitting glyoxal slant columns
Glyoxal SCDs are determined using the direct spectrum fitting approach described by Chance (1998). Here the state vector x ∈ R n , consisting of a set of variables impacting the observed radiance (including the glyoxal SCD) is estimated from a set of observed radiance values at a number of discrete wavelengths (λ). Let y ∈ R m be the vector 5 of these discrete radiance values. Assuming that the noise variance in the measured spectrum is the same for all wavelengths, the optimal estimate for the statex is found by the least squares difference between the observed radiance and a model spectrum F (x, b), a function of the state vector and a set of unoptimized parameters b 2 (1) 10 The modeled spectrum (F (λ)) consists of a solar source term I 0 (λ) that is then modified by trace gas absorption τ(λ), a common mode spectrum R(λ) constructed by averaging a set of spectrum fit residuals, and scaling and baseline polynomials (P sc (λ) and P bl (λ) respectively), intended to account for broad-band spectral features; 15 F (λ) = I 0 (λ) exp (−τ(λ)) + R(λ) P sc (λ) + P bl (λ) The source spectrum (I 0 (λ)) is derived from the monthly running mean of a set of daily solar irradiance spectra measured by OMI during the end of its orbit (b sol (λ)). Due to the satellite's orbital motion relative to the sun, the solar irradiance spectra are 20 Doppler shifted relative to the earth observations. To account for this, both the solar and earthshine grids are calibrated using a high resolution solar reference spectrum (Chance and Kurucz, 2010). Since the measured spectra are not fully Nyquist sampled, direct interpolation of the measured solar spectrum to the earthshine grid introduces aliasing. To account for this, an additional term b u (λ) derived from the difference 25 in a fully sampled and under sampled solar reference spectrum is included in the fit  (Chance et al., 2005). Finally an inelastic Raman scattering source term (b r (λ)) to account for "filling in" of the solar lines from O 2 and N 2 rotational transitions is included as described in Chance and Spurr (1997). 5 The source spectrum is attenuated by trace gas absorption. The total optical depth (τ(λ)) is the sum of the contributions from each absorber.
Here, x j and b j (λ) are the SCD and reference cross section (RCS) of species j 10 respectively. Table 1 summarises the RCSs included in the fitting procedure. The two strongest glyoxal absorption bands lie in the 430-460 nm spectral region, as shown in Fig. 1. In addition to glyoxal, absorption due to ozone (O 3 ), nitrogen dioxide (NO 2 ), water vapor (H 2 O) and the oxygen collision complex (O 2 −O 2 ) contribute significantly to the total op-15 tical depth, and are therefore included in the fitting process. Previous work has shown that surface extinction from liquid water is significant over clear surface waters, where the mean photon path through the ocean is significant (Vrekoussis et al., 2009). Lerot et al. (2010) found that the cross correlation between the glyoxal and liquid water RCSs within the glyoxal fit region is too high for simultaneous fitting. Our OMI retrieval adopts 20 the Lerot et al. (2010) approach of including pre-fitted optical depths from a separate liquid water retrieval that takes advantage of the broad spectral features of liquid water outside the glyoxal fit window. All RCSs are degraded to the OMI instrument resolution through convolution (denoted ⊗) with the measured instrument transfer function Γ(λ) (Dirksen et al., 2006), and then splined to the instrument wavelength grid. As 25 a measured solar spectrum is used in the fitting process, the convolution to the source 6070 Introduction (I 0 ) and absorption (τ) expressions is done separately. Thus if I hr 0 (λ) and τ hr (λ) denote the solar spectrum and total optical depth at infinitely high resolution, the first term of Eq. (3) is given by;

5
However in reality, the instrument distorts the spectrum after trace gas absorption. Thus the convolution must be applied last in the true expression.
The difference between equations 5 and 6 is referred to as the solar I 0 effect (Aliwell 10 et al., 2002). For glyoxal, correcting for the I 0 effect is important, as the I 0 effects of strongly absorbing interfering species are comparable in magnitude to observed glyoxal optical depths. For small optical depths the RCS of species j can be corrected using a high resolution solar reference spectrum (I sol (λ)). Assuming a small reference column density The above correction is insensitive to reference column densities for all interfering species considered over ranges typically observed in the atmosphere. Here, reference column densities for each species were chosen so that the optical depth used in Eq. (7) 20 is approximately 10 −3 . This magnitude is small enough to be in the range where the exponential in Eq. (7) is approximately linear, and thus should be a good approximation for the I 0 effects for all shallow optical depths. A common mode spectrum R(λ) constructed by averaging the fit residuals of spectra between 30 • N and 30 • S is included in the final spectrum fit. R(λ) is intended to account for systematic residuals uncorrelated with the RCSs, for instance those due to errors in the specification of the instrument transfer function. The scaling and baseline polynomials account for broad band spectral effects, including Rayleigh and Mie scattering, 5 wavelength dependent surface reflection and instrument offsets.
Here λ is the mean wavelength over the fitting window. The choice of the appropri-10 ate polynomial orders (n sc and n bl ) impacts retrieval accuracy. Lower order polynomials may not fully account for the broadband spectral features not physically modelled, whereas a polynomial of too high order may increase error through overfitting. Here we set n sc = 3 and n bl = 1. This choice was made by performing a set of sensitivity tests systematically varying n sc and n bl over a subset of OMI orbits. Polynomial degrees 15 lower than this order induced latitudinal dependent biases, and larger orders resulted in similar SCDs to the orders selected.

Determination of glyoxal vertical column densities
The spectrum fitting algorithm described in the previous section returns a slant column measurement of glyoxal (x glyoxal ≡ Ω s ). A more geophysically relevant quantity is the 20 vertical column density (Ω v ), defined as the number density per unit area integrated through the height of the atmosphere. The ratio of these quantities is called the Air Mass Factor (A).
For optically thin absorbers including glyoxal, radiative transfer simulations required to determine A can be decoupled from the profile of the trace gas being measured (Palmer et al., 2001).
W (z) are called scattering weights, and represent the number of times the radiation reaching the satellite has traversed the layer [z, d +dz]. Here, W (z) for each OMI observation is interpolated from a lookup table calculated with the VLIDORT v2.4RT radiative 10 transfer model, and taking into account instrument viewing geometry, cloud fraction and height, surface height, and reflectance (González Abad et al., 2014). We use data from the OMI O 2 −O 2 cloud retrieval algorithm and seasonally dependent OMI lambertian equivalent surface reflectances database from Kleipool et al. (2008) as inputs for the lookup table. Although aerosols are not explicitly accounted for, their impact on the 15 scattering weights is partially accounted for through the cloud retrieval algorithm.
S(z) in Eq. (11) is the vertical shape factor, representing the normalized glyoxal profile (n(z)): GEOS-5 assimilated meteorology with a resolution of 2 • ×2.5 • and 47 vertical levels. We use a modified version of the simulation described in Fu et al. (2008), with significant updates to NMVOC chemistry (Miller et al., 2012).

Error estimation
Errors in the retrieved glyoxal slant column density are estimated following the meth-5 ods of Rogers (2000). The difference between the retrieved (x) and true state (x) of the atmosphere arises from a combination of parameter errors (b −b), noise in the measured spectrum ( ) and forward model errors (∆f).
10 K b is the forward model Jacobian with respect to the model parameters and G y is the sensitivity of the retrieved state to changes in the observed radiance. OMI spectra are fitted with the Gauss-Newton based ELSUNC least squares algorithm (Lindström and Wedin, 1988), which additionally uses a truncated QR method far from the solution. Near fit convergence, G y is approximated by the sensitivity derived from a Gauss New- 15 ton iteration, which is the pseudoinverse of the forward model Jacobian (K x = ∂F /∂x) Parameter errors arise from variables that are not optimized in the fitting process. The main source of parameter error in the glyoxal retrieval is due to uncertainty in the The Jacobian for b i is defined as K i b = ∂F /∂b i . The full parameter Jacobian matrix corresponding to Eq. (15) is given by; As each RCS used in the retrieval derives from independent laboratory measurements, it is reasonable to assume that the RCS error of one species is uncorrelated with the RCSs of the others. Thus the expression for the full error covariance matrix of the parameters has a block diagonal structure.
Here, S i b is the error covariance matrix corresponding to the RCS of the i th species. S i b are approximated as diagonal, and are constructed using relative errors reported in Table 1. The covariance matrix describing how parameter errors impact errors in the 15 fitted variables (S b x ) can now be found by propagating the parameter errors in Eq. (13) using Eqs. (14), (16) and (17). The covariance matrix for the measurement error is estimated from the root mean square of the fit residuals ( rms ) adjusted by the number of statistical degrees of freedom. No correlation in noise signal between measurement pixels is assumed leading to the following expression for the noise covariance matrix.
Here m are the number of points in the spectrum and n the number of fit variables. This leads to the following estimate of the random error component of the fit variable covariance matrix Inference of the forward model error term in Eq. (13) is complicated by an incomplete knowledge of the true atmospheric state, and by how the state maps to the observed spectra. The polynomials included in the fitting process are only approximations for 15 the true physics (scattering, reflectance and instrument effects that cause radiometric offsets). Although no forward model error estimate is included in the retrieval, the sensitivity to forward model error can be assed by testing the retrieval algorithm against model spectra, where the true atmospheric state is known. This will be done in the next section.

Retrieval optimisation
The choice of the retrieval spectral window is an important determinant of retrieval accuracy. Figure 2  as τ ≈ 1.2 × 10 −3 and τ ≈ 6 × 10 −3 respectively. The simulated slant optical depths of NO 2 and O 3 exhibit a strong dependence on solar/instrument viewing geometry, that can be attributed to strong stratospheric absorption. In the following sections, we will use this case study orbit to evaluate the sensitivity of our retrieval to the settings of the forward model, and the position of the fit window. We start by using simulated spectra 10 to guide the initial design of the retrieval. We additionally test the sensitivity of the retrieval algorithms for the pre-fitted liquid water absorption and glyoxal using real OMI spectra.

Observing system simulation experiments
The model described in Eq.
(2) is only a semi-physical approximation of the true spec- 15 trum. We therefore test its performance relative to a model closer to the true physics through an Observing System Simulation Experiment (OSSE). The approach is summarised in Fig. 3. For each OMI track, GEOS-Chem chemical and meteorological profiles were sampled for the instrument viewing geometry and the results averaged onto a 2 • latitude grid for computational expediency. The version of GEOS-Chem used here 20 does not simulate stratospheric chemistry, so O 3 and NO 2 zonal climatologies derived from the OMI total column ozone retrieval (Liu et al., 2010) and a stratospheric model (McLinden et al., 2000) were included above the model tropopause. Clear sky synthetic spectra were modeled with VLIDORT on a 0.01 nm grid, using the viewing geometry of OMI.  Liu et al. (2007) and Vandaele et al. (2003) respectively. Simulated spectra are convolved with a 0.65 nm FWHM Gaussian approximating the OMI instrument transfer function, and then sampled onto the OMI radiance wavelength grid. The observed solar spectrum is simulated by convolving the high resolution solar reference with the same Gaussian, followed by 5 sampling to the OMI solar irradiance grid. Finally, the retrieval algorithm is applied and the results compared to the "true" state (i.e. GEOS-Chem). The RCS of O 3 and NO 2 exhibit temperature dependencies that could induce errors in the retrieval if not properly accounted for. Alvarado et al. (2013) found significant improvements in their spectrum fit residuals over heavily polluted regions by incorpo-10 rating two independent NO 2 RCSs at different temperatures into their glyoxal retrievals. We first tested the impact of four different RCS temperature choices on the retrieval using a 435-460 nm window. Figure 4 shows the difference between the retrieved and true slant column densities. The preliminary version of the OMI glyoxal retrieval used a NO 2 RCS temperature of 220 K (Chance, 2006). Figure 4 shows that using this RCS 15 temperature induces a significant positive global bias in the retrieval, as well as a local bias between 0-15 • S over the region with strong pyrogenic emissions. We also tested a 240 K RCS temperature which is closer to the average temperature of the environment of photons absorbed by NO 2 . Although this reduces the global bias, the pyrogenic hotspot remains. Including two independent NO 2 RCSs at different temperatures (230 K 20 and 290 K) removes the 0-15 • S bias whilst slightly increasing the overall bias, likely due to the added cross correlation caused by fitting the second RCS. This is consistent with the reductions observed by Alvarado et al. (2013). We therefore include two NO 2 RCSs at different temperatures in the operational retrieval to avoid interferences from boundary layer NO 2 . Including an additional O 3 RCS (243 K) does not improve 25 the retrieval owing to the small temperature dependence of O 3 between 400-500 nm.
The sensitivity of the retrieval to window position was tested following Vogel et al. (2013), by systematically quantifying OSSE retrieval error as a function of lower and upper wavelength limits. Figure 5 shows the mean bias between the retrieved and true Introduction glyoxal slant columns, as well as the slope of the linear regression of the retrieved vs. "true" slant columns. Retrievals for most window choices have a mean bias of less than 5 × 10 13 , except when the window truncates the strongest glyoxal band. The window region centered around 445-463 nm performs optimally, as shown by the lowest mean bias and regression slope closest to 1. This corresponds to the strongest glyoxal ab-5 sorption band. Extending the window down to 435 nm to include the second strongest glyoxal band slightly increases the mean bias and slope. Given the relatively low retrieval bias for most windows, the results of the OSSE indicate that the spectrum model (F (λ)) is capable of accounting for the physical effects simulated by the OSSE. These include:

Stripe correction
Trace gas retrievals from 2-D CCD instruments such as OMI suffer from systematic cross-track biases, which appear as stripes when viewed in the along track direction. This has been attributed to the cross-track variability of the measured solar irradiances (Veihelmann and Kleipool, 2006). For OMI, this variability arises due to a combination 20 of noise in the measured solar spectra, transient dark current signals and the angular and wavelength dependence of the diffuser used for irradiance measurements. The impact of these variations is significant for glyoxal due to its relatively weak absorption. We investigated how solar spectrum variation impacts the OMI retrieval by adding noise to the solar spectra used in the retrieval OSSE (Fig. 6) noise realisation was added to each across-track solar spectrum. We chose a signal to noise ratio of 3000, to be roughly consistent with the noise level expected for averaging a month of OMI solar spectra. Striping is apparent in the slant columns retrieved from the synthetic spectra ( Fig. 6b) with cross-track biases reaching as high as 1.5 × 10 15 molecules cm −2 . Figure 6b also shows that the magnitudes of the stripes are 5 constant with latitude. Thus determining the stripe offsets at one location should be sufficient for correcting the stripes at all locations. The Sahara is a convenient region to determine the cross-track stripe offsets. Glyoxal concentrations in this region are expected to be negligible, with VCDs simulated by GEOS-Chem below 1.5×10 13 molecules cm −2 all year. In addition, spectra over the Sa-10 hara have a high signal-to-noise ratio due to high surface reflectivity. Figure 6c shows the mean glyoxal SCD retrieved in the Saharan region defined by the limits [20][21][22][23][24][25][26][27][28][29][30] Since there is essentially no glyoxal these represent the stripe offsets due to noise in the solar spectrum employed in the retrieval. Subtracting these offsets from Fig. 6a produces the stripe-corrected results in Fig. 6d. For reference, the synthetic 15 data retrieved with noise-free solar spectrum is shown in Fig. 6a. The stripe-corrected retrieval is virtually indistinguishable from the noise-free case. We thus conclude that we can correct for stripes using this simple background subtraction approach, provided that the origin of the stripes is due purely to solar irradiance spectrum noise. This should generally be true, except for radiance/irradiance spectra impacted by so called 20 random telegraph signals (RTS) caused by particle hits on the CCD. These lead to prolonged changes in dark current, which manifest as spikes in the observed spectra. To reduce the impact of RTS, we remove pixels that have been flagged as RTS in the level 1-B product (Kleipool, 2005). We identify additional spikes by comparing the residual difference between the modeled and measured spectra. Pixels in spectra whose resid-25 uals are 3 standard deviations from the mean are flagged as RTS. Spectra with these additionally flagged pixels are then refitted upon removal of the flagged pixels.
A particular stripe offset correction should apply for all spectra retrieved using the same OMI solar spectrum. Since the operational retrieval uses a 30 day running mean, the stripe patterns should vary smoothly in time. For the real spectra, we must also consider how random noise in the radiance spectra propagates to random error in the stripe offsets. In principle this can be reduced by averaging retrievals over the normalisation region. We therefore create a time dependent offset for each track by taking a 5 day running mean of all retrieved slant columns for each track in the Saharan normalisation 5 region. The 5 day window was chosen because this was the minimum window width required to reduce the uncertainty in the stripe offsets below 1 × 10 14 molecules cm −2 . The associated stripe patterns for the month of July 2006 and their uncertainties are shown in Fig. 7. The magnitude of stripes determined from the real spectra are comparable in magnitude to those in the OSSE. We also see that the stripe pattern over the month time frame is fairly constant. Thus the 5 day averaging window appears small enough to capture the temporal variability of the stripe offsets.
To correct the stripes for a particular orbit, the derived stripe offset nearest in time is subtracted. Figure 8 shows the SCDs retrieved with real spectra from o10430 and an orbit taken on the same day over India (o10427) before and after the stripe cor-15 rection is applied. Since the random uncertainty of the fits for individual spectra are large (O(10 15 ) molecules cm −2 ) a 30 point running mean is applied to each track to aid visualisation of the stripe patterns. Figure 8 shows that the magnitudes of the stripes are significantly reduced upon applying the stripe offset correction. The correction performs similarly for both orbits, further evidence that the stripe patterns arise due to 20 the common solar spectra employed in the retrieval. Thus the stripe correction offsets derived over the Sahara should apply globally.

Liquid water pre-fit
Retrieved glyoxal slant columns over clear oceanic waters are systematically negative when absorption from liquid water is not considered, due to anti-correlation between 25 glyoxal and liquid water in the glyoxal fit window. Lerot et al. (2010) designed a two step retrieval procedure to correct for the impact of liquid water absorption, whereby liquid water is first derived in a larger fit window, and then held constant in the smaller 6081 Introduction glyoxal fit window. We adopt the same approach for the OMI retrieval. In the retrieval of liquid water absorption, we additionally fit O 3 , NO 2 , and the O 2 −O 2 collision complex. The liquid water retrieval uses a 1st order baseline and 5th order scaling polynomial. The higher order polynomial choice was needed to account for the impacts of surface reflectance over the broader fit window.

5
The sensitivity of the liquid water retrieval to window position was tested over two regions using real OMI spectra. The region over the Sahara used for the stripe correction for o10430 was selected for sensitivity tests to test a potential interference between liquid water absorption and surface reflectance from sand (Richter et al., 2011), which could induce errors in deriving the glyoxal background from this region. Sensitivity tests 10 were also performed on an orbit taken on the same day (o10423), over the Pacific ocean between 0-30 • N. This region contained significant liquid water absorption. Figure 9 shows the mean retrieved liquid water optical depth, total parameter error due to RCS uncertainty, and the spectrum fit residuals (adjusted for statistical degrees of freedom) for the two regions. Above 480 nm, retrievals over the Sahara become strongly 15 negative. This is likely an artifact of the strong spectral dependence of reflectance from sandy surfaces, which contains a pronounced feature at approximately 480 nm (Richter et al., 2011). Optical depths over the clear ocean region are a maximum for a window setting of 397.5-470 nm, whilst the retrieval that minimises the spectrum fit RMS is for 410-467.5 nm. However in both these regions, retrievals over land are strongly nega-20 tive. These biases may be explained by the large parameter uncertainty, mostly due the high uncertainty in the liquid water cross section. Extension of the lower window limit below 400 nm leads to a sharp reduction in parameter error, which occurs due to the strong increase of liquid water absorption below this wavelength (see Fig. 1). Retrieved liquid water optical depths over the Sahara in this spectral region are close to zero, sug-25 gesting that incorporation of the strong shoulder of the liquid water RCS below 400 nm acts to significantly reduce the cross correlation with the surface reflectance signal.
For the liquid water retrieval, we set the window interval at 385-470 nm. The retrieval window choice for liquid water was guided by not wanting to fit any unwanted surface Introduction reflectance signals that could be corrected by the polynomials in the smaller glyoxal window. Since sand has a strong spectral feature at 480 nm, we do not consider upper window limits above this wavelength. In addition, the incorporation of the water absorption shoulder below 400 nm reduced the negative retrieval bias over sandy surfaces. Figure 10 shows the spatial distribution of liquid water absorption for July 2006 for 5 the operational retrieval. Liquid water absorption peaks at the centers of ocean gyres. These regions are areas of low biological activity, and thus have very low turbidities, thus allowing long effective light paths through the ocean surface layer.

Glyoxal retrieval
In this section we test the sensitivity of the retrieval to fit window selection using real 10 OMI spectra. All retrievals were performed using the pre-fitted liquid water optical depths described in the previous section. For each retrieval window, we also retrieved a set of orbits within a 5 day window of o10430 to determine the stripe offsets. The resulting mean SCDs as a function of window position are shown in Fig. 11. We show the mean SCD for all retrievals over land in o10430, as well as the Pacific Ocean region 15 tested in for the liquid water retrieval. For the land case, the mean SCD is positive for lower wavelength bounds between 428 and 436 nm. For the Pacific Ocean sector, the region in fit window space containing positive SCD values shrinks, with negative mean SCDs retrieved for upper wavelength limits above 461 nm. This difference relative to the o10430 case is likely due to interference from liquid water absorption. 20 Significant differences exist between the SCD patterns seen in the real OMI spectra compared with the biases in the OSSE. The decrease in the mean SCD in the real spectra when the lower wavelength limit is extended below 435 nm encompasses a strong Fraunhofer line due to Hydrogen (434.047 nm), followed by two more lines associated with Iron and Calcium (430.790 and 430.774 nm respectively). The SCD 25 decrease as the strong solar lines are included in the fit window could be a result of imperfect corrections for inelastic scattering, which was not simulated in the OSSE. The SCDs retrieved from OMI data are strongly negative using the 445-460 nm window that 6083 Introduction was optimal for the OSSE. This could be a result of RCS uncertainties, which will have a larger impact on smaller fit windows. Errors due to RCS uncertainty are estimated using the first term in Eq. (13). Figure 12 shows the mean error on retrieved glyoxal induced by uncertainties in the RCSs of NO 2 and H 2 O. Above 435 nm the mean estimated error from the NO 2 RCS increases rapidly, providing evidence that the negative 5 SCDs retrieved for the shorter windows are impacted by RCS uncertainties. For the operational retrieval, glyoxal is retrieved using a fit window set between 435 and 461 nm. The lower limit was selected to avoid the potential interference with the Ring effect. The upper wavelength limit was chosen as a balance between avoiding interference from the liquid water spectrum (favoring smaller windows) and reducing 10 parameter error (favoring larger windows). This wavelength region is similar to previous studies (Lerot et al., 2010;Vrekoussis et al., 2010). The resulting glyoxal SCDs for July 2006 using the operational fit window are shown in Fig. 13. Retrieved SCDs remain negative over areas with strong liquid water absorption, even after the inclusion of the pre-fitted liquid water optical depths. Figure 14 plots the gridded SCDs in Fig. 13 15 against the liquid water optical depths in Fig. 10. There is a clear negative trend in glyoxal with increasing optical depth, with glyoxal columns scaling approximately linearly with liquid water optical depth. This behaviour is consistent with how errors in the liquid water RCS would impact the glyoxal SCD in the retrieval. These are expected to be large due to the fact that the reported uncertainties in the liquid water RCS are large 20 (6-14 %), and the resolution of the RCS is 5 nm, far greater than the 0.65 FWHM of OMI. These VCDs are slightly higher than those observed by GOME-2 and SCIAMACHY, which tend to peak at ∼ 3 × 10 14 molecules cm −2 (Vrekoussis et al., 2009;Lerot et al., 2010). Since these regions are predominantly composed of evergreen coniferous trees, these VCDs may be related to monoterpene emissions. 15 OMI shows persistently high glyoxal columns in the range 3-6×10 14 molecules cm −2 over India and China. Elevated glyoxal levels are also seen in by GOME-2 and SCIA-MACHY over these regions, with significantly elevated VCDs observed over most of Asia during summer (Vrekoussis et al., 2009;Lerot et al., 2010). For OMI this seasonality is weaker except over NE China. Here OMI columns tend to be lower on average 20 by ∼ 2×10 14 molecules cm −2 compared to the other instruments. The VCDs from other retrievals may be greater due to the interference associated with boundary layer NO 2 . The spatial pattern and timing of the broad summer glyoxal maximum over most of India and China follows observed water vapor patterns tied to the Indian and East Asian monsoons (Wang et al., 2014). Similarly, glyoxal columns are higher in the GOME-25 2 and SCIAMACHY retrievals over tropical ocean regions with elevated water vapor columns, and thus could represent a possible interference. Since all retrievals use water vapor RCSs calculated from the HITRAN database, differences likely arise from the choice of temperature and pressure used to derive the RCS. The sensitivity of our Average VCDs over the Sahara for GOME-2 and SCIAMACHY range between 1 and 2 × 10 14 molecules cm −2 (Vrekoussis et al., 2009;Lerot et al., 2010).  ues are close to zero since the Sahara is used as a reference sector in the stripecorrection algorithm. Acetylene represents the only known long lived source of glyoxal (Fu et al., 2008), with a global mean lifetime of 12 days (Xiao et al., 2007). The resulting background glyoxal concentrations calculated in GEOS-Chem are of the order of 10 13 molecules cm −2 . GEOS-Chem acetylene fields in the upper troposphere are in 10 good agreement with observations (González Abad et al., 2011;Xiao et al., 2007), suggesting that these low background VCDs are reasonable. The high background values in SCIAMACHY and GOME-2 likely reflect the choice of offset correction region. In the case of GOME-2, the retrieval background is corrected using a Pacific Ocean reference sector (Lerot et al., 2010). Since interference from liquid water is likely not fully 15 accounted for due to the large uncertainties in its RCS, determining offsets in regions with significant liquid water absorption may positively bias the offset correction applied. Figure 16 shows the ratio of glyoxal to HCHO VCDs (R GF ) computed for OMI for July to August 2007, using OMI HCHO retrievals from González Abad et al. OMI and surface measurements, however these were derived from multi-year averages. It is therefore likely that the R GF values are representative of multiple sources. For instance, the higher R GF observed by Vrekoussis et al. (2010) attributed to biogenic regions in Africa likely arises from a combination of pyrogenic and biogenic emissions.

Results and discussion
In the Southeast United States Vrekoussis et al. (2010) observe R GF < 0.04. This is 5 consistent with the other observations assuming that isoprene is the dominant year round source of glyoxal and HCHO in the Southeast United States.

Conclusions
We have developed a glyoxal retrieval for the Ozone Monitoring Instrument (OMI) aboard the NASA Aura satellite. The new retrieval takes advantage of the higher spa-10 tiotemporal resolution of OMI (13 × 24 km 2 nadir pixels, daily global coverage) relative to previous satellite sensors used to retrieve glyoxal. We began by testing the retrieval algorithm against simulated OMI spectra. The results show that retrievals that include only one (stratospheric) NO 2 RCS cannot sufficiently account for its temperature dependence. Not including a second NO 2 RCS at 15 higher temperature leads to an overestimation of glyoxal over regions with high levels of boundary layer NO x . The OSSE results are consistent with those derived from real spectra (Alvarado et al., 2013).
We then used the synthetic spectra to inform the design of a new cross-track bias correction for OMI glyoxal. It was shown that determining the cross track bias over the 20 Sahara is sufficient for determining the correction for all orbits that use the same solar spectrum. The method was applied to real OMI spectra and significantly reduced the magnitude of the cross-track biases.
The sensitivity of the liquid water retrieval to fit window position was tested by systematically varying the lower and upper wavelength limits of the retrieval window. We 25 found that upper window limits above 480 nm lead to strongly negative optical depths over desert regions. We attribute this to a strong absorption feature near 480 nm from Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | sandy surfaces. We determined that the interference with sand could be minimised by using a fit window of 385-470 nm, which avoids the sand spectral feature and incorporates a strong feature in the liquid water spectrum below 400 nm. The sensitivity of the glyoxal retrieval position was also tested. It was found that retrieved SCDs systematically decrease for lower window limits below 435 nm, and 5 speculate that this is due to interferences from the Ring effect. We compared SCDs retrieved over land and a remote ocean region containing strong liquid water absorption to show that liquid water interferes strongly for upper window limits above 461 nm. We estimated retrieval errors caused by errors in the RCSs of H 2 O and NO 2 to show that errors induced by these cross sections increase significantly for lower window limits 10 435 nm. We determined the optimal window to be 435-461 nm, similar to previous windows used by satellites.
(b) Slant column densities (molecules cm −2 ) retrieved from synthetic spectra, retrieved from synthetic spectra using solar spectra with a 3000 signal-to-noise ratio.