Tomographic retrieval approach for mesoscale gravity wave observations by the PREMIER Infrared Limb-Sounder

PREMIER is one of three candidates for ESA’s 7th Earth Explorer mission that are currently undergoing feasibility studies. The main mission objective of PREMIER is to quantify processes controlling atmospheric composition in the mid/upper troposphere and lower stratosphere, a region of particular importance for climate change. To achieve this objective, PREMIER will employ the first satellite Fourier transform infrared limb-imager with a 2-D detector array combined with a millimetre-wave limb-sounder. The infrared limb-imager can be operated in a high spatial resolution mode (“dynamics mode”) for observations of smallscale structures in atmospheric temperatures and trace gas fields with unprecedented 3-D sampling (0.5 km in the vertical direction, 50 km along track, 25 km across track). In this paper, a fast tomographic retrieval scheme is presented, which is designed to fully exploit the high-resolution radiance observations of the dynamics mode. Based on a detailed analysis of the “observational filter”, we show that the dynamics mode provides unique information on global distributions of gravity waves (GW). The achievable vertical resolution for GW observations has values between the vertical sampling (0.5 km) of the dynamics mode and the vertical field of view (about 0.75 km). The horizontal across track resolution corresponds to the horizontal across track sampling of 25 km. Since the achievable along track horizontal resolution is about 70 km, the dynamics mode will provide GW limb-observations with a horizontal resolution comparable to nadir sounders. Compared to previous observations, PREMIER will therefore considerably extend the range of detectable GWs in terms of horizontal and vertical wavelength. Correspondence to: J. Ungermann (j.ungermann@fz-juelich.de)


Introduction
Numerous previous satellite missions have used the infrared limb-emission sounding technique for atmospheric remote sensing. Global observation of atmospheric infrared limbemissions represents a reliable technique to obtain vertically resolved profile data of temperature, a variety of trace gases, aerosols, and clouds simultaneously, at daytime and at nighttime. First global infrared limb-emission observations of an extensive number of atmospheric trace species were made by LIMS (Limb Infrared Monitor of the Stratosphere; Gille and Russel III, 1984) and SAMS (Stratospheric and Mesospheric Sounder; Drummond et al., 1980) aboard the Nimbus 7 satellite. Trace gas fields obtained from these sensors and followon instruments such as the CRISTA instrument (Riese et al., 1999) greatly contributed to our understanding of the threedimensional composition, structure and large-scale dynamics of the middle atmosphere. Early 2002 ESA launched Envisat, a polar-orbiting Earth observation satellite, which included the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument (Fischer et al., 2008) for infrared limb-emission observations with high spectral resolution. MIPAS significantly expanded the number of detectable species and the corresponding height range compared to previous space missions.
All previous instruments have used telescopes with scanning mirrors or one-dimensional detector arrays to obtain profile information of atmospheric trace species. While these previous instruments were rather restricted in terms of spatial resolution (and in particular horizontal sampling), the PRE-MIER mission (Process Exploration through Measurements of Infrared and millimetre-wave Emitted Radiation) (ESA, 2008) employs an InfraRed Limb-Sounder (hereafter IRLS) that builds on recent developments in infrared detector array technology providing the basis for an instrument capable of achieving the high three-dimensional (3-D) spatial resolution Published by Copernicus Publications on behalf of the European Geosciences Union. necessary to sense the upper troposphere/lower stratosphere (UTLS) region (Riese et al., 2005;Friedl-Vallon et al., 2006).
During the PREMIER mission the IRLS can be operated in two different modes. The one discussed in this paper, the "dynamics mode", is designed for unprecedented 3-D spatial sampling and is thereby optimised to resolve small and mesoscale atmospheric temperature and trace gas structures (e.g. Adams et al., 2009). The sampling is 0.5 km in the vertical direction, 25 km across track (in a 320 km swath), and 50 km in the along track direction. The across track sampling corresponds to the achievable across track resolution. To fully exploit the benefits of the dense sampling in the vertical and along track direction in terms of resolution, we present a new tomographic retrieval scheme. Tomographic retrievals for limb sounding measurements were first explored by a variety of authors for different purposes and instruments (e.g. Solomon et al., 1984;Gurevich, 1995). Practical implementations for the large-scale retrieval of atmospheric constituents from satellite measurements were first produced by Carlotti et al. (2001) for MIPAS and Livesey and Read (2000) for MLS (Microwave Limb Sounder). By using a very fast and efficient forward model (Hoffmann et al., 2008;Hoffmann and Alexander, 2009) we are able to explore the approach at smallest spatial scales that were not accessed by previous studies (e.g. Carlotti et al., 2006;Steck et al., 2005). In particular, our analysis demonstrates the capability of the PREMIER IRLS to provide unique global information on mesoscale gravity waves (GWs).
Global observations of GWs are a major objective of the PREMIER mission (ESA, 2008). GWs represent an important coupling mechanism for the middle atmosphere (Fritts and Alexander, 2003). They contribute about 50% to the driving of the quasi-biennial oscillation (Dunkerton, 1997), are the major forcing mechanism of the summer branch of the Brewer-Dobson circulation (Alexander and Rosenlof, 2003), and contribute 30 to 50% to the predicted increase of the Brewer-Dobson circulation due to climate change (McLandress and Shepherd, 2009). GWs are also the main driver of mesospheric winds and cause the cold summer mesopause (McLandress, 1998). Currently, uncertainties in the global representation of GWs and their effects on the circulation represent a major limitation for the predictive capabilities of chemistry climate models.
The great potential of GW observations from space has been demonstrated in several studies (e.g. Eckermann and Preusse, 1999;Preusse et al., 2002). However, the spatial resolution of previous global observations was insufficient for the determination of direction-resolved momentum flux (Ern et al., 2004) and so far only absolute values of momentum flux have been deduced from CRISTA (Ern et al., 2004(Ern et al., , 2006 and HIRDLS (Alexander et al., 2008;Wright et al., 2010). In addition, there is a case study of momentum flux from an exceptional long-vertical wavelength mountain wave event over South Georgia Island using AIRS data . The combination of high-resolution 3-D temperature measurements and good altitude coverage offered by PREMIER will allow for simultaneous determination of GW temperature amplitudes and associated horizontal and vertical wavelength (wave-vector). Preusse et al. (2009) showed that global 3-D observations by limb-imagers such as PREMIER provide the best perspective to obtain global information of GW momentum flux.
For the interpretation of the results, it is crucial to determine accurately the observational filter (Alexander, 1998) of the limb-imaging technique employed by the PRE-MIER IRLS. The observational filter describes the reproduction of GW amplitudes by the retrieval scheme as function of vertical and horizontal wavelength. A precise characterisation of the observational filter is also essential for comparisons with other measurement techniques (Preusse et al., 2000;Hertzog et al., 2008) as well as for comparisons with GW resolving models or results obtained from GW parameterisation schemes (Ern et al., 2004(Ern et al., , 2005.
In Sect. 2, we briefly introduce the dynamics mode of the PREMIER IRLS. We present our simulation setup in Sect. 3 followed by the forward model in Sect. 4 and finally the tomographic retrieval approach in Sect. 5. The results regarding achievable resolution and GW observational filter are presented in Sect. 6.

The PREMIER Infrared Limb Sounder
The PREMIER IRLS essentially combines a Fourier transform infrared spectrometer with two two-dimensional (2-D) detector arrays measuring in the spectral regions from 770 to 980 cm −1 (band A) and from 1070 to 1650 cm −1 (band B), respectively. Each 2-D detector array will consist of about 100 × 100 pixels, providing about 10 000 simultaneous limbviews within the altitude range from ≈3 to 55 km and with a swath width of 320 km simultaneously. To increase the signal-to-noise ratio, individual array pixels will be co-added to obtain sampling patterns of specific scientific measurement modes.
Limb sounding exploits the radiation thermally emitted in the atmosphere along the line-of-sight (LOS) of the instrument, which is directed towards the limb of the Earth's atmosphere. The point of the LOS closest to the surface is called the tangent point. Under optically-thin conditions, the tangent point of the LOS generally determines the point in the atmosphere that contributes most to the measured radiation. The radiance of a single measurement typically originates mainly from an air volume with about 500 km extent along the LOS (half width of horizontal weighting function). The vertical extent is mainly determined by the special viewing geometry of the limb observation and the vertical field-ofview (FOV), which has about 750 m full-width at half-max at the tangent point. See Sect. 4.2 for a more detailed discussion of the two-dimensional radiative transfer weighting functions.  (Remedios et al., 2007) and 30 km tangent altitude. The black curve indicates total radiance. The other curves indicate radiance for single emitter atmospheres (see legend). The noise level of the observations will be about 10 −5 W/(m 2 sr cm −1 ). The black arrow indicates the CO 2 Q-branch analysed in the retrievals.
The main advantage of the proposed instrument concept is its great flexibility, i.e. the trade-off between spatial and spectral resolution can be adapted to the scientific needs. While currently only two modes are specified, the hardware poses few restrictions should a certain scientific question require a different trade-off between spatial and spectral sampling. In this paper we will concentrate on the IRLS dynamics mode. It is designed to measure a subset of atmospheric variables with very high spatial sampling but moderate spectral sampling. Spectral sampling requirements are relaxed to 1.25 cm −1 from its maximum envisioned sampling of 0.2 cm −1 , which is employed in the other major operating mode. As an example, simulated spectra for the IRLS dynamics mode for mid-latitude atmospheric conditions and 30 km tangent altitude are shown in Fig. 1. Even at this reduced spectral resolution, infrared emission features of important atmospheric constituents (e.g. CO 2 , H 2 O, O 3 , HNO 3 , CH 4 , or N 2 O) are clearly visible.  2. Illustration of the PREMIER IRLS dynamics mode measurement grid. Symbols indicate the locations of tangent points of individual measurements. The green coloured spheres represent the measurements of a single image of PREMIER. Each blue orb represents a full profile and thereby the horizontal component of the measurement grid of ten consecutive images. In our simulations, we generally use only measurements of the central track, of which ten consecutive vertical measurement profiles are shown coloured in red. As the tangent points of a single profile are spaced at vertical intervals of 500 m, the orbs representing it are merged into a vertical line. For the outermost measurement tracks, the tangent-points lie generally not within the idealized 2-D measurement track, but are displaced from it by up to 2.2 km in the across-track direction.
The vertical coverage of the IRLS dynamics mode observations is 48 km with a lower boundary that increases from 3 km at the poles to 7 km at the equator to account for the latitudinal variation of the tropopause height. Vertical sampling is 500 m. Limb images will be measured rearward to the flight direction of the satellite. Dense sampling along track is achieved by high measurement speed, i.e. about 7.5 s per image in the dynamics mode, corresponding to ≈50 km along track distance. The 2-D detector array will allow for 13 parallel measurement tracks. The across track sampling is 25 km and the swath width is 320 km. The three-dimensional measurement grid of the PREMIER IRLS dynamics mode is illustrated in Fig. 2. The green spheres represent the tangent points of all measurements of a single image. Each blue orb represents a single measurement profile of preceding and following images, providing a visual clue for the horizontal measurement density. Please note that tangent points of a single image are not placed exactly vertically above one another as higher tangent points are located closer to the satellite; in fact, consecutive images are spaced so closely that the topmost tangent points of one image are placed vertically above the lowermost tangent points of the previous image.
In this study we are interested in determining the achievable resolution of the dynamics mode with a focus on resolving GWs. As the across track resolution directly corresponds to the across track sampling, it suffices to examine a single track and determine its resolution in vertical and along track direction. Further, we examine how well monochromatic GW waves can be reproduced by PREMIER measurements and tomographic retrieval techniques in accordance with one of the major scientific mission objectives.

Simulation setup
We examine how well tomographic temperature retrievals using simulated PREMIER IRLS dynamics mode measurements are able to reproduce GW parameters like amplitude, horizontal and vertical wavelength, as well as phase shifts.
To this end, we carry out linear and non-linear retrieval endto-end tests. We first simulate the radiance observations the PREMIER IRLS would make for a given monochromatic GW field. Second, we use the simulated observations to perform a tomographic temperature retrieval. Finally, retrieved temperatures are compared with the original field. We do not add noise or other instrument effects to the simulations so as not to obscure the principal capabilities of the approach; however, we do provide error estimates for retrieval noise. Please note that this setup is highly simplified compared to what an inversion scheme for actual PREMIER IRLS measurements would look like. In practice, additional values like pressure, contaminant trace gasses, etc. would need to be retrieved in addition to temperature. The set of simulated measurements is designed according to the PREMIER specification as described in Sect. 2. For our analysis of the GW observational filter, we only use limb profiles of the central track of the 2-D array (depicted as red orbs in Fig. 2). In total 101 consecutive profiles along track with 91 measured altitudes each are used per simulation. The tangent point altitudes of one such profile range from 10 to 55 km spaced at intervals of ≈500 m. This is a simplification of the vertical range of the PREMIER IRLS, which covers 48 km but varies in height between the equator and the poles. Figure 3 demonstrates how the line-of-sight of these dense measurements overlap one another in the along track direction. The same air volume is viewed from multiple angles, which suggests a tomographic retrieval problem.
To accurately represent a wide spectrum of GWs with different horizontal and vertical wavelength, we define a crosssection of the atmosphere 6000 km long and 82 km high. This accommodates the full LOS of 91 consecutive vertical measurement profiles of the central track constituting the measurement grid. For the initial calculation of simulated satellite measurements, this cross-section was finely gridded with a horizontal sampling of only 5 km and a vertical sampling of 50 m. For the retrieval of the baseline setup, we use a coarser sampling: the vertical sampling of the atmospheric retrieval grid corresponds to the measurement grid, i.e. 500 m between 10 km and 55 km height and 2 km outside this range. The horizontal sampling is set to 12.5 km (four times finer than the measurement grid) to avoid representation errors (see Sect. 6.2).
For tracks other than the central one, the tangent points of consecutive vertical measurement profiles do not lie within a plane. We therefore examined the effect of the angle of 2.6 • between the LOS of the outermost tracks and the along track direction using both a 3-D atmosphere and a 2-D planeparallel approximation and found the error of the approximation to be in the order of the instrument noise. In effect, the results presented below for the central track are assumed to be representative for all tracks of the PREMIER IRLS.
We use typical cloudless mid-latitude atmospheric profiles from Remedios et al. (2007) as background for our input atmosphere and as the "a priori" retrieval state. In practice, measurements contaminated by clouds will likely be filtered out, affecting the local retrieval result with a reduced resolution and higher a priori influence. This input atmosphere contains temperature, pressure, aerosol, CO 2 , Ozone and several other trace gasses relevant for the 12.6 µm channel. Onto the temperature of this background and a priori atmosphere, waves with varying horizontal and vertical wavelength with a constant amplitude of 5 K are modulated to form the input atmospheres for our simulations; we refer to the latter atmospheres as "true" atmospheric state. Ideally, the retrieval would perfectly reproduce the true state.

Fast forward model for the PREMIER IRLS
A fast radiative transfer model is essential to solve large inverse problems in atmospheric remote sensing. We adapted the C++ based Juelich Rapid Spectral Simulation Code Version 2 (JURASSIC2) for the PREMIER IRLS tomographic temperature retrievals presented in this paper. An older version of JURASSIC (Hoffmann, 2006) was previously used as forward model for retrievals for several satellite-and air-borne remote sensing experiments (e.g. Hoffmann et al., 2008. It was applied to dedicated GW studies for AIRS aboard NASA's Aqua satellite Hoffmann and Alexander, 2009). JURASSIC2 is an advancement of JURASSIC including a sophisticated instrument model, various powerful inversion algorithms and several specific code optimisations for this study, which are outlined in Appendix A.
JURASSIC2 computes the radiative transfer based on the emissivity growth approximation (EGA) (Weinreb and Neuendorffer, 1973;Gordley and Russel, 1981;Marshall et al., 1994). Compared with conventional line-by-line calculations this approach is about a factor 1000 faster, since radiative transfer is calculated based on precalculated spectrally averaged values of emissivity stored in lookup tables. The emissivity look-up-tables for JURASSIC2 are prepared by means of exact line-by-line calculations utilising the Reference Forward Model (RFM) (Dudhia et al., 2002). In most stratospheric applications the EGA typically has an accuracy of 1 to 2% or better and no further means were taken here to improve forward model accuracy for the simulation study. Processing real measurements at a later stage may require improvements, e.g. by means of regression techniques (e.g. Francis et al., 2006).
JURASSIC2 provides a flexible handling of different types of observation geometries, e.g. nadir, sub-limb, limb, or zenith viewing for different types of atmospheric input data, e.g. single vertical profiles (1-D), sets of profiles along a satellite track (2-D), and regular or irregular 3-D model data. This is achieved by providing different interpolation functions for atmospheric data as well as a flexible 3-D raytracing routine (Hase and Höpfner, 1999).
For this project we apply a 2-D scheme with the grid described in Sect. 3. To sample sufficiently the atmospheric inhomogeneities along the ray paths the ray-tracing step length was set to a value of 0.4 km for generating the simulated measurements and 4 km for the actual retrieval. For pressure and aerosol extinction we apply a logarithmic vertical interpolation and a linear horizontal interpolation. Interpolation for temperature and trace gas volume mixing ratios is bi-linear.
Each measurement is affected by the vertical field-of-view of the instrument. A single radiative transfer calculation of JURASSIC2 however simulates an infinitesimal thin beam of radiation, also called a "pencil beam". To account for the effect of the FOV, we perform pencil beam calculations with a finer vertical sampling than that of the actual measurements. A single measurement is then simulated by convolving the results of the finely spaced pencil beam calculations with a sensitivity function describing the FOV of the instrument. We found a sampling twice as dense as the original measurement spacing to be sufficiently accurate, i.e. the tangent points of the pencil beams are vertically spaced at intervals of ≈250 m. Including pencil beams above and below the uppermost and lowermost measurement, we calculate 193 pencil beams for a single vertical measurement profile of the PRE-MIER IRLS consisting of 91 measurements.

Channel selection and weighting functions
For the temperature retrieval studies presented in this paper we analyse radiance emissions of the CO 2 Q-branch at 12.6 µm, which was successfully used by Riese et al. (1997) to retrieve stratospheric and mesospheric temperature distributions from satellite observations. Retrieving temperature from radiance measurements at individual spectral grid points of the PREMIER IRLS is technically feasible due to the rather low noise equivalent spectral radiance of the proposed instrument.
The temperature weighting functions presented here are computed by means of numerical perturbation. Figure 4 shows several weighting functions for the selected radiance channel at 12.6 µm (790.75 cm −1 ). To analyse the temperature weighting functions and provide a fast analytical representation for other studies, we developed a semi-empirical model. Preusse et al. (2002) provides an analytical model under the assumption of an optically transparent atmosphere. Our simulations indicate that in the 12.6 µm CO 2 -branch transmittance gets as low as 0.6 and cannot be neglected. We therefore modified the model of Preusse et al. (2002) by adding a term compensating for transmittance. Using the same nomenclature as in Preusse et al. (2002), the sensitivity of the measured signal on temperature at given altitude z (measured from the ground) and the distance along the ray path s (measured from the tangent point) is obtained from a Gaussian, (1) Here K 0 denotes the maximum sensitivity. Note that the absolute values of K 0 presented below depend on the atmospheric grid sampling. The sensitivity will change when the grid sampling varies because the individual perturbations refer to smaller or larger air volumes. The standard deviations ϕ z and ϕ s measure the vertical and horizontal extent of the weighting function, respectively. The horizontal shift µ was introduced to deal with atmospheres which are not optically thin, as occurs in our setup. As a kind of first-order effect, the point of maximum sensitivity will shift towards the instrument. In the exponential function, z t denotes the tangent altitude. The term −s 2 /2R E , with Earth radius R E , determines the apparent curvature of the ray paths with respect to the ground-base altitude coordinate system (Preusse et al., 2002). We fitted the semi-empirical model to the temperature weighting functions shown in Fig. 4 to obtain statistics of the variables K 0 , µ, ϕ s , and ϕ z . From 10 to 55 km altitude the maximum sensitivity K 0 decreases from 4×10 −6 to 2×10 −7 (W/(m 2 sr cm −1 ))/K, the horizontal shift −µ towards the instrument decreases from 140 to 40 km, the horizontal fullwidth at half-max along the line-of-sight 2 √ 2ln2ϕ s decreases from 575 to 475 km and the vertical full-width at half-max 2 √ 2ln2ϕ z is nearly constant at 0.95 km. The semiempirical model provides a reasonable description of the true weighting functions as the accuracy of the fits is about 2 to 3% of K 0 .

Inversion technique
Deriving atmospheric temperature values from (tomographic) infrared spectral measurements presents a nonlinear, ill-posed inversion problem. To obtain a unique and physically reasonable solution the problem needs to be regularised.
To perform the inversion, the JURASSIC2 retrieval processor implements a number of standard methods, of which the maximum a posteriori estimation (MAP) was selected (Rodgers, 1976). This method interprets measurements and atmospheric states statistically and determines an optimal weighted mean between an a priori atmospheric state and the measurements.
Let n ∈ N be the number of atmospheric state vector elements and m ∈ N the number of measurements. Simulations of the JURASSIC2 forward model can then be represented mathematically as the application of a (non-linear) function F : R n →R m . F'(x) is the Jacobian matrix of the forward model F at atmospheric state x ∈ R n , also known as Kernel matrix. The Jacobian matrix consists of the weighting functions of the measurements. With x a ∈ R n being the a priori state of the atmosphere, S a ∈ R n×n a covariance matrix for x a , y ∈ R m the vector of measurements and S ∈ R m×m the covariance matrix of the measurement error, effectively the cost function J (x) with needs to be minimised to obtain the MAP solution. The minimisation is performed using the iterative Gauss-Newton algorithm. The resulting iteration formula can be written in two forms, one that requires the inverse of the covariance matrices and the solution to a linear equation system with n unknowns in each iteration (n-form), or one that requires solely the covariance matrices and the solution to a linear equation system with m unknowns in each iteration (m-form) (Rodgers, 2000). The choice of the more efficient form for our problem is critical since the linear algebra determines the memory consumption and thereby the maximal problem size. In our simulations n is generally four times as large as m, so we chose the m-form to reduce memory consumption and processing time.
As an initial guess or starting point x 0 ∈ R n we selected the true atmospheric state to reduce simulation time. As this choice may sidestep potential convergence issues, it was confirmed for selected representative non-linear retrievals that they also converge to the same solution from other starting points, e.g. a polar atmosphere. Using a different starting point approximately doubles the number of required iteration steps. The covariance matrix S is assumed to be diagonal and takes into account noise (10 −5 W/(m 2 sr cm −1 )) as well as the quasi-statistical part of the forward model errors (which is estimated to be 0.3% of radiance (Hoffmann, 2006), that is, it ranges from 0.3×10 −4 W/(m 2 sr cm −1 ) for the lowermost tangent heights and 0.8 × 10 −6 W/(m 2 sr cm −1 ) for the uppermost tangent heights). The a priori covariance matrix S a is set up according to a first-order autoregressive model with v i being the height of an atmospheric grid point and h i its horizontal coordinate given as along track displacement in kilometres. The correlation lengths are used as tuning parameters for the regularisation. The vertical correlation length c v is chosen to be 0.5 km and the horizontal correlation length c h is chosen to be 200 km (see Sect. 6.3 regarding the choice of value for these parameters). The standard deviations σ i are taken from the reference atmosphere of Remedios et al. (2007) and vary between 10 K and 14 K depending on height.
Only temperature values at grid points between 10 km and 65 km are retrieved, the others are fixed to the a priori atmosphere. Thereby an additional 10 km are retrieved compared to the uppermost tangent point height of 55 km. As the vertical grid becomes more sparse above 55 km, this corresponds

Fig. 5.
Example of a retrieval end-to-end test used for generating the observational filter. The instrument flies from right to left and measures radiance coming from the right. A wave perturbation with 10 km vertical wavelength and 320 km horizontal wavelength tilted towards the instrument is super-imposed onto the a priori atmosphere depicted in (a). The difference between the disturbed atmosphere and the a priori is shown in (b). The difference between the retrieval result and the a priori -that is the retrieved wave perturbation -is depicted in (c) and the difference between the result of the retrieval and the true state is depicted in (d). The tangent points of the satellite measurements are shown as white circles. Please note that the horizontal retrieval grid is four times finer than the measurement grid. For better representation of the investigated structure only kilometres 2000 to 3000 of the simulated 6000 kilometres are shown.
to 91 grid points with a 0.5 km spacing and 5 grid points above with a 2 km spacing per profile. Starting from the initial guess x 0 , each iteration of the employed Gauss-Newton algorithm generates a new atmospheric state x i+1 ∈ R n : converging onto the optimal estimatex ∈ R n . The approach outlined above is traditionally applied to retrieve single vertical profiles of atmospheric data, but it is just as well-suited for 2-D tomographic retrieval problems, the only difference being the interpretation of the state vector x i .
Please note that we do not apply the approach of Steck et al. (2005), where the full 2-D state vector is updated sequentially by using in each step the radiance information provided by one additional limb-scan, but perform a simultaneous update using all available measurement information in a single step. Figure 5 shows exemplarily such an end-to-end test. The unperturbed atmosphere common to all tests is depicted in Fig. 5a. An temperature perturbation of the input atmosphere is depicted in Fig. 5b. The tangent points of measurements are shown as white circles. The outcome of the retrieval is shown in Fig. 5c. One can clearly see that the wave structure is qualitatively well reproduced between 10 and 55 km height. In the area between 55 km and 65 km, the wave structure degrades into the unperturbed a priori atmosphere due to a lack of measurement information. The amplitude of the retrieved wave sometimes slightly surpasses ±5 K, which is shown as white areas in the wave troughs and crests. Figure 5d depicts the difference between the input atmosphere and the result; the error is overall less than 0.5 K between 20 and 50 km height.  6. Plot of the averaging kernel of a single atmospheric state vector element located in the middle of the atmosphere at 40 km height. Each coloured box represents the influence of the atmospheric grid point of the true atmosphere on a single sample of the retrieval result. Its shape is typical for averaging kernels of elements between 15 km and 50 km height.

Measurement contribution and resolution
The averaging kernel matrix A ∈ R n×n of the final estimatê x characterises the retrieval. One row of the averaging kernel matrix of the baseline setup with no modulated wave structure is depicted in Fig. 6. As each element in the row describes the influence of one point of the true atmosphere on the result, it can be reordered two-dimensionally according to their horizontal and vertical coordinates. The figure is representative for rows belonging to measurements with tangent points located between 15 km and 50 km height. Aside from a strong centred peak, one can see small "ripples" in the vicinity of the peak resembling the shape of the weighting functions (compare with Fig. 4). The asymmetry is caused by the asymmetry of the weighting functions.
Using the averaging kernels it is possible to deduce measurement contribution and resolution of the retrieval. Typical values are plotted in Figs. 7 and 8. The contribution of measurement information to the retrieval results is estimated by summing the elements of each row of the averaging kernel matrix. A value of 1 indicates a high contribution of measurement information while a value of 0 indicated a high contribution of a priori information. The tangent points of measurements lie within the region of 10 to 55 km height and 500 to 5500 km horizontal distance, which closely corresponds to the region where the measurement contribution in Fig. 7 is between 0.95 and 1.05. Outside this region retrieval results are dominated by a priori information. As we Fig. 7. Shown is the measurement contribution for a baseline retrieval with no modulated wave structure (i.e. the atmosphere of Fig. 5a). Tangent points of measurements lie between ≈10 km and 55 km vertically and ≈500 km and 5500 km horizontally. are mainly interested in results with high measurement content and to simplify data processing, only data points with a measurement contribution between 0.95 and 1.05 are evaluated further.
The averaging kernels can also be used to estimate vertical and horizontal resolution by mapping each row of the averaging kernel matrix associated with an atmospheric sample to the two-dimensional structure it represents, as depicted in Fig. 6. In this form, the full-width at half-max of the vertical and horizontal direction can be calculated (e.g. Steck et al., 2005;von Clarmann et al., 2009). Figure 8 shows exemplarily the estimated horizontal and vertical resolution. As can be seen in Fig. 6, the averaging kernels generally contain numerous very small elements that describe the influence of atmospheric sample points that are not directly above or beside the atmospheric sample point in question; as these are disregarded when calculating the resolution, this method provides an approximate measure of resolution.
The vertical resolution in the central area is slightly better than the assumed vertical FOV, which has a full-width at half-max of 750 m; however, without the effects of the a priori vertical correlation a vertical resolution of nearly 500 m, corresponding to the vertical sampling of measurements, is achievable (see Sect. 6.3). This shows that the achievable resolution is not necessarily limited by the FOV. The horizontal resolution of about 70 km comes close to the measurement grid sampling (50 km) and is a significant improvement compared to the 475 to 575 km full-width at half-max along the line-of-sight of the individual weighting functions. Combined with the across track sampling and resolution of 25 km, PREMIER IRLS would achieve a horizontal resolution comparable to nadir sounders.
The simulations indicate that spatially over-sampled measurements (with respect to the weighting functions) can be separated and provide high spatial resolution in the vertical and horizontal domain.  . 8. Shown are the vertical (a) and horizontal (b) resolution for a baseline retrieval with no modulated wave structure (i.e. the atmosphere of Fig. 5a). Tangent points of measurements lie between ≈10 km and 55 km vertically and ≈500 km and 5500 km horizontally.

Retrieval noise
The effect of noise on the retrieval can be estimated based on the effect of perturbations of the measurements on the retrieval result.
The method of optimal estimation does not only provide an optimal estimatex but also a corresponding covariance matrix of the estimate. However, as our a priori covariance matrix for the atmospheric state is not derived from a climatology, but chosen ad-hoc, the covariance matrix of the retrieval resultx should not be interpreted statistically. Instead, we use the retrieval noise for diagnosis, i.e. the effect of measurement noise on the retrieval result. The gain matrix G ∈ R n×m , which can be interpreted as linear pseudoinverse of F and maps measurements into the atmospheric state space, is defined as It can be used to estimate the retrieval noise covariance matrix: The retrieval noise is the most important error for gravity wave analysis as the effect of other, systematic errors can be more readily handled by detrending (e.g. Ern et al., 2004). It will be analysed and compared with GW amplitudes in Sect. 6.3.

Gravity wave observational filter
The observational filter defines how a wave perturbation of given horizontal and vertical wavelength is reproduced by a retrieval. The retrieval may modify the wave amplitude, wavelength, and phase in a complicated way. As the amplitude is the most important feature of the wave with respect to its energy, the assessment presented here focuses on how well the amplitude of a given wave is reproduced.

Deduction from end-to-end simulations
To determine the observational filter, end-to-end simulations must be performed for all waves of interest. To deduce the amplitude of the wave in the retrieved atmosphere, all atmospheric grid points between 15 and 50 km and along track 1500 to 4500 km are fitted against the modulated wave with amplitude as the only free parameter using a least-square fit. Please note that this does not compensate for eventual shifts in phase as is generally present in 1-D retrievals ; any phase shift present would reduce the sensitivity (even to zero for a phase shift of 180 • ). However, tests including the phase shift in the fit indicated that no phase shift was present for tomographic retrievals. According to Shannon's sampling theorem, it is impossible to perfectly reproduce waves with wavelength δ using a sampling distance longer than δ/2; for this reason we tested only waves from a vertical wavelength of 1 km upwards and from a horizontal wavelength of 100 km upwards. Detecting GW signatures from actually measured profiles will require a more sophisticated process, as it is generally neither known if a GW is present in the data, nor what wave length it may have, nor will it generally be monochromatic. Figure 9a shows the filter for the baseline setup described in Sect. 3; it corresponds qualitatively with the shape of analytically derived sensitivity of Preusse et al. (2002), albeit most notably the region of good sensitivity is moved well towards shorter horizontal wavelengths. Due to the much denser atmospheric sampling and the tomographic approach, much smaller waves become visible. The finest horizontal wavelength resolvable is about 100 km for a vertical wavelength around 5 km and the wave being tilted towards the instrument. This is a significant improvement compared to the roughly 250 km horizontal wavelength of previous sensitivity studies involving the CRISTA instrument (Preusse et al., 2002). Please note that the resolvable horizontal wavelength is much smaller than the horizontal full-width at half-max of the weighting function, which is roughly 500 km. Waves  of a vertical wavelength of 2 km can be retrieved well, but the sensitivity drops quickly for shorter vertical wavelengths, as the measurement grid is not fine enough to properly resolve them. Around two kilometre vertical wavelength and long horizontal wavelength, the retrieved amplitude becomes slightly larger than 1; this is caused by an insufficient vertical sampling of the atmospheric retrieval grid, similar to the horizontal sampling issue discussed below in Sect. 6.2 but less pronounced. For waves with large vertical wavelength, the sensitivity becomes smaller depending strongly on the horizontal wavelength. But for waves with 300 km horizontal wavelength and longer, a sensitivity of at least 0.5 is given for all examined vertical wavelength longer than 2 km. Sensitivity rises to 1 for all waves with horizontal wavelength longer than ≈500 km (not shown).
Varying the phase of the modulated wave had no significant influence on the sensitivity filter derived by the 2-D retrieval and is therefore not depicted. Small differences are present for waves with vertical wavelength below 2 km where the resolvability strongly depends on the relation between wave peaks and sampling grid. Changing the orientation of the waves from being tilted towards the instrument to being tilted away from it has a significant influence as is shown in Fig. 9b. The sensitivity to waves with a horizontal wavelength below 150 km is lost, but this is compensated by an increased sensitivity to waves with a larger vertical wavelength and an overall smaller transition region, in which waves become distorted. As the atmosphere is not completely transparent, the sensitivity of the ray to temperature changes is not only a function of height, but also of the horizontally travelled distance, which can be clearly seen in Fig. 4. As the sensitivity to waves largely depends on the relation between ray path and wave trough and crest (Preusse et al., 2002) this is to be expected and needs to be taken into account when GW parameters are deduced from retrieved temperatures. The slope of the tangent point locations does slightly influence the observational filter, but is not the cause for the observed asymmetry.

Sensitivity to retrieval grid sampling
The horizontal sampling of the retrieval grid has a significant influence on the retrieval results as shown in Fig. 10.
Before settling on a horizontal sampling distance of 12.5 km for the retrieval grid, several other sampling grids were tested ranging from 6.75 km to 400 km. It was found that the larger the sampling distance became, the more pronounced the amplification of wave amplitudes for short GW lengths became. This amplification can be seen in Fig. 10, which depicts the filter for a "natural" retrieval grid being identical with the measurement grid. Apparently, at least for GW with short wavelengths, the natural sampling is not necessarily the best choice.
The amplification of retrieved waves is most likely caused by the bi-linear interpolation of the temperature between the grid points. The fine structure of wave perturbations with short wavelength cannot be sufficiently represented by linear interpolation, which the retrieval compensates for by raising the wave amplitudes. Therefore, either a different atmospheric representation more suited to represent wave structures is needed or the grid sampling needs to be refined to sufficiently capture the variations of the atmospheric state. Here we employ the second approach. If the retrieval grid sampling drops below the measurement grid sampling, the problem becomes underdetermined and needs to be regularised appropriately. Simulations have shown that resolution, measurement content and retrieval noise are not noticeably impacted by the refinement of the atmospheric grid.
For the examined waves, the artificial amplitude magnification is greatly diminished with a horizontal grid distance of 25 km and negligible at 12.5 km. The sampling distance of 12.5 km was chosen to obtain exactness of the results.
As mentioned in Sect. 6.1, a similar effect occurs for waves with small vertical wavelength, which increases the retrieved amplitude by a factor of up to 1.2 for waves with about 2 km vertical wavelength. As the effect is much less pronounced and therefore easier to correct, we decided not to increase the vertical retrieval grid sampling for this study.

Tradeoff between retrieval noise and resolution
The vertical and horizontal correlation length of the a priori covariance plays the most important role in the MAP regularisation applied here. Different values were examined before the values of 0.5 km and 200 km were chosen to provide a satisfactory trade-off between resolution and retrieval noise. In the Bayesian scheme applied in this study the strength of the regularisation, i.e. smoothing via correlation lengths in the a priori covariance matrix model, does not depend on the actual measurements. As the noise error indicates, the effect of noise on the retrieval result is sufficiently small that the forward model can be assumed to be nearly linear so that the Jacobian matrix will vary only negligible. This means that the averaging kernel stays nearly identical if noise is added to the measurements and in turn the results of this section are also representative for actual measurements affected by noise even though they were derived using unperturbed measurements. Figure 11 shows the effect of the vertical correlation length on vertical resolution and retrieval noise. To decrease the retrieval noise to ≈0.5 K for most of the vertical range, while still retaining a good vertical resolution, a correlation length of 0.5 km was finally chosen. The choice of vertical correlation parameter also influences the GW observational filter. As expected, a large correlation length slightly reduces the region where sensitivity is very close to one. We further found that increasing the vertical correlation length beyond 2 km leads to unwanted oscillations in the averaging kernels, which confirms our choice of a much smaller value.
The retrieval noise is not influenced significantly by the horizontal correlation factors we examined (12.5 to 400 km) and therefore not depicted. The main effect of the horizontal correlation is to stabilise the underdetermined retrievals. Reducing the horizontal correlation length from 200 km to 100 km improves the horizontal resolution from 75 km to ≈70 km, but the non-linear retrievals do not converge for some wave structures with very small vertical and horizontal wavelength. Reducing the horizontal correlation further, the horizontal resolution derived from the averaging kernel matrix goes as low as 50 km for a horizontal correlation length of 12.5 km, i.e. is close to the horizontal measurement sampling grid at the cost of ever increasing convergence issues.
To avoid any convergence problems, a correlation length of 200 km was chosen. The GW observational filter behaves as expected when varying the horizontal correlation length, as waves become increasingly dampened for larger correlation lengths. However, this barely affects the region of the GW observational filter close to one but influences mostly the transition region. GWs can be detected if their amplitude exceeds retrieval noise. Therefore, it is important that the retrieval noise of the retrieval is smaller than the amplitude of GW that shall be observed. According to the sampling theorem, with a vertical sampling grid of 500 m one cannot resolve waves with a vertical wavelength shorter than 1 km. Practically, a slightly larger vertical wavelength than 1 km is required, meaning that we expect to consistently resolve waves from about 2 km vertical wavelength upward, which is confirmed by the GW observational filter. The maximal amplitude of actual GWs of that vertical wavelength can be determined according to the temperature amplitude saturation limit (e.g. Preusse et al., 2008). This limitT max due to static instability is proportional to the vertical wavelength: Using N=0.02 s −1 and a background temperature T of 250 K, this gives a saturation amplitude of greater than 3.2 K for waves with vertical wavelength greater than or equal to 2 km at about 40 km height. Comparing this value to the 0.5 K of retrieval noise for that height range shows that such GWs are easily detectable also in the presence of noise. As the temperature amplitude saturation limit increases with vertical wavelength, any wave with vertical wavelength longer than 2 km is therefore also detectable with the given noise level.

Deducing the gravity wave observational filter from the averaging kernel matrix
A more efficient way to derive the GW observational filter is to analyse the averaging kernel matrix of a retrieval for a climatological atmosphere of the region of interest.
Assuming the non-linearities of the forward model to be negligible and thereby the Jacobian matrix to be constant across the space spanned by the a priori atmosphere and all examined wave perturbations, the averaging kernel matrix A will be identical for all wave structures and the unperturbed atmosphere. Neglecting measurement errors, it follows that with x δ being the modulated wave structure. This shows that the averaging kernel matrix directly maps the true wave perturbations x δ onto the retrieved wave structure. Figure 12 shows the GW observational filter derived in this manner. Just as for the non-linear end-to-end simulations, only the part of thex−x a vector is used to derive the sensitivity that has a good measurement contribution and lies well within the region covered by tangent points of measurements. The perturbations of up to 5 K are small enough so that the assumption about linearity of the forward model sufficiently holds and the filter derived in this manner replicates closely the structure of the filter derived from non-linear end-to-end tests.
This method is therefore an efficient way to quickly evaluate the effect of changes in the setup before proceeding to the much more computationally intensive non-linear end-to-end  simulations. The latter are still required, as the linear approximation cannot be used to examine, e.g. the convergence of the non-linear retrieval for different atmospheric states.

Comparison of 2-D and 1-D retrievals
A 2-D retrieval reproduces wave structures with better sensitivity and less distortions than 1-D retrievals (which assume a homogeneously stratified atmosphere).
For example, Fig. 13 shows side-by-side the retrieval results for a wave of 320 km horizontal wavelength and 30 km vertical wave length. Figure 13a was generated by a series of 1-D retrievals, each using a single vertical measurement profile of the simulated measurements as input. The resulting temperature profiles were assembled to a 2-D structure by adding each retrieved temperature profile at the mean horizontal location of the tangent points of the involved measurements. Comparing the retrieved wave structure with the true wave structure (which is depicted as the black contour plot in the figure), it becomes apparent that the phase of the wave structure of the 1-D retrieval is subject to a phase shift of nearly 180 • ; the amplitude does not surpass 2.5 K and is (a) (b) Fig. 13. This shows the retrieval result for (a) a series of conventional 1-D retrievals on the left and (b) a 2-D retrieval in the middle for a wave with 30 km vertical and 320 km horizontal wavelength. Please note the different color mapping. The climatological background profile was subtracted from the result. A contour plot for the difference between the true modulated wave structure and the a priori has been included as black lines to visualize the phase shift of the 1-D retrieval. thereby much smaller than the expected 5 K. In addition, the amplitude has also a strong artificial height-dependence. The vertical wavelength is also slightly reduced. On the other hand, the 2-D retrieval shows a more pronounced wave structure and fits well to the true wave (again shown as black contour plot) in the region between 25 km and 55 km; the 2-D retrieval deviates less than 2 K from the true atmospheric state whereas the 1-D retrieval is subject to deviations of more than 6 K (the latter is partly caused by the 180 • phase shift). As depicted in Fig. 9, this retrieval has an amplitude dampening factor of 0.9.
The GW observational filter deduced using 1-D retrievals is shown in Fig. 14. In contrast to Fig. 9, also the phase of the wave was fitted (not depicted), otherwise the sensitivity would be even worse. It is apparent that fewer wave patterns can be reliably retrieved. Especially waves with large vertical wavelength are affected. This comparison shows that tomographic retrievals are superior for the purpose of GW detection.

Summary and conclusions
We performed various non-linear end-to-end simulations using simulated measurements of the PREMIER IRLS with different input atmospheres consisting of a cloudless climatological background atmosphere with modulated wave perturbations. In this way, we determined the gravity wave (GW) observational filter that shows the amplitude with which a given wave perturbation is reproduced by our tomographic retrieval setup. It was shown that the new, innovative PREMIER IRLS using the dense measurement grid of the dynamics mode combined with tomographic retrievals can resolve much finer GW structures than previously possible. GWs with a horizontal wavelength down to 100 km may be resolved depending on the vertical wavelength.
We examined different retrieval grids and found that using a coarse retrieval grid introduces an amplification of GWs with short wavelength. It is therefore essential to use a retrieval grid that is sufficiently fine for the purpose of representing examined and existing atmospheric structure. Otherwise, retrieval results may be tainted by artifacts.
By tweaking the correlation length of the a priori covariance matrix, it is possible to reduce the retrieval noise at the cost of a reduced resolution. We also found that convergence issues arise when the horizontal correlation length is reduced to or below 100 km. We examined various horizontal and vertical correlation lengths and settled for a horizontal correlation length of 200 km and a vertical correlation length of 0.5 km. With these figures, the expected retrieval noise is well below the temperature amplitude saturation limit of resolvable GWs.
A linear approximation of the observation filter derived from the averaging kernel matrix is a helpful tool when examining different setups. One may generate the averaging kernel matrix for a climatological atmosphere and use this to calculate inexpensively an approximation of the observation filter for a wide range of GWs. The observational filter derived in this manner resembles closely the observational filter derived by more accurate end-to-end simulations including the distinction between GWs tilted towards or away from the satellite.
The observational filter derived by conventional 1-D retrievals has a severely reduced sensitivity compared to the observational filter derived by our tomographic retrieval technique. Comparing the results for selected wave perturbations, we found the result of the tomographic retrieval to be much closer to the true state than the result of the 1-D retrieval. Further, the tomographic retrieval is not subject to the massive shift in phase that affects 1-D retrievals.
The PREMIER IRLS should provide 3-D temperature measurements with sufficient altitude coverage and resolution in the stratosphere to allow for simultaneous determination of GW temperature amplitudes and the associated horizontal and vertical wavelength. Therefore, the dynamics mode of the PREMIER satellite instrument will be well suited for examining the dynamic structure of the upper troposphere and stratosphere. The presented tomographic retrieval scheme resolves the dense 3-D measurement of the PREMIER IRLS dynamics mode with a comparable 3-D resolution and is thereby well suited for the retrieval of temperature data for the analysis of GWs. The across track resolution of PREMIER IRLS corresponds to the across track sampling of 25 km. Further, it is possible to retrieve temperature data with a vertical resolution of 750 m; depending on the retrieval setup even a vertical resolution of 500 m is achievable, surpassing the full-height-at-half-maximum of the vertical fieldof-view (750 m). In addition, one can nearly fully exploit the dense measurement grid of 50 km horizontal distance with an achievable horizontal along track resolution of ≈70 km, which is unprecedented for limb sounders and approaches the horizontal resolution of nadir sounders.

Code optimisation and performance
Conventional 1-D retrievals often depend on the assumption of horizontal homogeneity to represent the atmosphere by a single profile. For tomographic retrievals this assumption is discarded, requiring a much larger number of variables to properly represent the atmosphere. Combined with an increased number of measurements, the retrieval process becomes technically more demanding. To enable many such retrieval simulations, the JURASSIC2 forward model and the retrieval processor were optimised both for modern imager instruments and tomographic retrievals.
The first change improved the calculation speed of the Jacobian matrices involved in the retrieval, which JURASSIC2 calculates using a finite differences approach. For tomographic retrievals, a single ray path crosses only a minor fraction of points of the full atmospheric grid, which results in rather sparse Jacobian matrices compared to 1-D retrievals. For the specific PREMIER setup only between one and three percent of the entries of the Jacobian matrix are non-zero. By generating a data structure that tracks which points of the atmosphere influence which pencil beams, it is possible to simply skip the calculation of pencil beams unaffected by the current perturbation. In effect, zeros within the Jacobian matrix cost next to no processing time. Further, generally, only part of the pencil beams influencing a single changed measurement need to be recalculated, resulting in an overall speedup of ≈200 for the type of problems considered here.
The second change involved reworking the linear algebra of the retrieval processor. For conventional 1-D retrievals, the processing time spent in the matrix multiplications and the solving of linear equation systems is negligible compared to the calculation of the Jacobian matrices. For increased problem sizes, it becomes much more important, beginning to noticeably increase the processing time and to dominate the memory consumption. Using more efficient algorithms to solve the linear equation systems and the Automatically Tuned Linear Algebra Software (ATLAS) BLAS implementation by Whaley et al. (2001), could halve the previous memory consumption and significantly reduce the processing time.
For our simulations, a cluster of 16 computer nodes is used, each fitted with 64 Gb RAM and two AMD Quad-Core Opteron 2380 CPUs. To retrieve the baseline setup (19 493 pencil beams combined to 9191 measurements for retrieving 46 080 atmospheric state vector elements) with a single thread, roughly seven hours are needed. A simple OpenMP parallelisation of the embarrassingly parallel forward model together with the multi-threaded ATLAS backend allows a faster execution on a multi-core host. Figure A1 shows the speedup realised with the OpenMP parallelisation. From the curve can be deduced that about 93% of the retrieval code are parallelised.