Interactive comment on “ An intercomparison of stratospheric gravity wave potential energy densities from METOP GPS-radio occultation measurements and ECMWF model data

The paper by Rapp et al. addresses stratospheric gravity wave (GW) activity derived from different data sets: GPS-RO temperature profiling by GRAS onboard Metop A/B, ECMWF operational analysis, ERA-Interim reanalysis as well as Rayleigh lidar measurements at specific locations. The central subject of this study is the intercomparison of GW potential energy densities derived from ‘dry’ and ‘wet’ GPS-RO retrievals, ECMWF IFS and ERA-Interim. Generally, a good agreement is found between GW potential energy distributions derived from these data sets, which is not totally surprising given that ECMWF model assimilates GPR-RO ‘dry’ measurements with a high level of

Abstract.Temperature profiles based on radio occultation (RO) measurements with the operational European METOP satellites are used to derive monthly mean global distributions of stratospheric (20-40 km) gravity wave (GW) potential energy densities (E P ) for the period July 2014-December 2016.In order to test whether the sampling and data quality of this data set is sufficient for scientific analysis, we investigate to what degree the METOP observations agree quantitatively with ECMWF operational analysis (IFS data) and reanalysis (ERA-Interim) data.A systematic comparison between corresponding monthly mean temperature fields determined for a latitude-longitude-altitude grid of 5 • by 10 • by 1 km is carried out.This yields very low systematic differences between RO and model data below 30 km (i.e., median temperature differences is between −0.2 and +0.3 K), which increases with height to yield median differences of +1.0 K at 34 km and +2.2 K at 40 km.Comparing E P values for three selected locations at which also ground-based lidar measurements are available yields excellent agreement between RO and IFS data below 35 km.ERA-Interim underestimates E P under conditions of strong local mountain wave forcing over northern Scandinavia which is apparently not resolved by the model.Above 35 km, RO values are consistently much larger than model values, which is likely caused by the model sponge layer, which damps small-scale fluctuations above ∼ 32 km altitude.Another reason is the wellknown significant increase of noise in RO measurements above 35 km.The comparison between RO and lidar data reveals very good qualitative agreement in terms of the seasonal variation of E P , but RO values are consistently smaller than lidar values by about a factor of 2. This discrepancy is likely caused by the very different sampling characteristics of RO and lidar observations.Direct comparison of the global data set of RO and model E P fields shows large correlation coefficients (0.4-1.0) with a general degradation with increasing altitude.Concerning absolute differences between observed and modeled E P values, the median difference is relatively small at all altitudes (but increasing with altitude) with an exception between 20 and 25 km, where the median difference between RO and model data is increased and the corresponding variability is also found to be very large.The reason for this is identified as an artifact of the E P algorithm: this erroneously interprets the pronounced climatological feature of the tropical tropopause inversion layer (TTIL) as GW activity, hence yielding very large E P values in this area and also large differences between model and observations.This is because the RO data show a more pronounced TTIL than IFS and ERA-Interim.We suggest a correction for this effect based on an estimate of this "artificial" E P using monthly mean zonal mean temperature profiles.This correction may be recommended for application to data sets that can only be analyzed using a vertical background determination method such as the METOP data with relatively scarce sampling statistics.However, if the sampling statistics allows, our analysis also shows that in general a horizontal background determination is advantageous in that it better avoids contributions to E P that are not caused by gravity waves.

Introduction
It has long been known that momentum and energy transport by gravity waves (henceforth abbreviated as GWs) are of major importance for the mean thermal and dynamical state of the middle atmosphere (Lindzen, 1981;Holton and Alexander, 2000).Being mainly excited in the troposphere by flow over terrain, by convection, or by spontaneous emission, GWs may propagate both vertically and horizontally over large distances to deposit their momentum and energy far away from their source upon instability or transience (e.g., Fritts and Alexander, 2003;Sato et al., 2009Sato et al., , 2012;;Preusse et al., 2009;Bölöni et al., 2016).Thus, GWs are an important mechanism that couples the middle and upper atmosphere to the troposphere (e.g., Lübken et al., 2010, and references therein).In addition, it has recently been shown that GWs also couple the middle atmosphere downward to the troposphere (Kidston et al., 2015, and references therein).With minimum horizontal scales as small as 10 km GWs must still be parameterized in global climate models with typical horizontal resolutions of a few hundred kilometers.Hence, the development of physics-based parameterizations of GWs and their effect on the mean flow have been identified as a major research focus in the climate research community (Shepherd, 2014).
Given this large importance of GWs, it is not surprising that efforts have been undertaken to try to characterize GW sources, their propagation, and their dissipation and wave-mean-flow interaction with complementary experimental, theoretical, and numerical techniques (see, e.g., Fritts and Alexander, 2003;Plougonven and Zhang, 2014;Fritts et al., 2016;Wagner et al., 2017;Sutherland, 2010;Nappo, 2012, for recent reviews, overview papers, and text books).Ground-based remote sensing with lidars and radars and in situ observations with balloons, research aircraft, and sounding rockets are critically important for process studies.However, global satellite observations are needed to determine dominant tropospheric source regions and processes as well as global propagation pathways and the resulting gravity wave drag imposed on the mean flow to constrain GW parameterizations for climate and weather prediction models (Alexander et al., 2010;Geller et al., 2013).Since the pioneering work by Fetzer and Gille (1994), Wu andWaters (1996), andEckermann andPreusse (1999) there have been many attempts to characterize the global distribution of gravity wave activity using such different remote-sensing techniques as Limb (e.g., Ern et al., 2004Ern et al., , 2011;;Preusse et al., 2009;Zhang et al., 2012) and Nadir sounders (e.g., Hoffmann et al., 2016;Ern et al., 2017), as well as GPS-based radio occultation (RO) measurements (e.g., Tsuda et al., 2000;Hei et al., 2008;Schmidt et al., 2008Schmidt et al., , 2016;;Fröhlich et al., 2007;Hindley et al., 2015;Šácha et al., 2015;Khaykin et al., 2015;Khaykin, 2016).
This paper focusses on the derivation of gravity wave potential energy densities (E P ) from GPS RO measurements on board the operational METOP-A and METOP-B satellites operated by EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites) and the subsequent systematic comparison of E P fields with ECMWF (European Centre for Medium-Range Weather Forecasts) operational forecast and reanalysis data.This is done to answer the question of whether the sampling and data quality of the two operational METOP satellites is sufficient to characterize the global stratospheric gravity wave activity (measured in terms of E P ) on a monthly basis.Furthermore, we investigate whether the METOP observations agree quantitatively with the ECMWF model fields such that the latter can be used for the interpretation of observational results.Accordingly, this paper is organized as follows: in Sect. 2 we describe the database of METOP-A and METOP-B RO temperature data obtained between July 2014 and December 2016.In addition, we give a brief introduction to the ECMWF data sets used for comparison with the RO data.We compare both temperature data sets (RO and ECMWF data) as a baseline for the subsequent comparison of derived E P values.In Sect. 3 we describe our approach to derive E P , followed by Sect.4, where we thoroughly compare RO E P data to corresponding ECMWF data sets.Similarities and differences are discussed in Sect.5, in which we will also derive and discuss a correction for erroneous interpretation of the tropical tropopause inversion layer (TTIL) as gravity wave activity.Finally, the major findings of this study are summarized in Sect.6, in which suggestions for future work will also be made.

METOP-A/B GPS RO data
The METOP-A and B satellites orbit the Earth in a polar low Earth orbit and are the platforms for a variety of instruments supporting the European Weather Services including the Global Navigation Satellite System Receiver for Atmospheric Sounding (GRAS) with which GPS RO measurements are performed, delivering tropospheric humidity and tropospheric and stratospheric temperature profiles.During typical months, these two satellites record a total of ∼ 35 000-40 000 radio occultations.A typical sampling pattern in terms of the latitude and longitude distribution of the number of RO per month is shown in Fig. 1.This sampling is determined by the orbital geometry of the METOP satellites on the one hand and the GPS satellites on the other.Figure 1 reveals that there are typically between 10 and 50 occultations per 5 • latitude and 10 • longitude interval with maximum sampling at latitudes between 20 and 60 • north and south and minima near the poles and at the Equator.Note that we will use a corresponding gridding of 36 × 36 grid points (i.e., 5 • latitude by 10 • longitude bins) throughout this entire study.The METOP RO data are provided by the Radio Occultation Meteorology Satellite Application Facility (ROM SAF) on an operational basis in near-real time and can be downloaded from www.romsaf.org.The primary measured quantity is the bending angle of the GPS radio waves as they transverse the refracting atmosphere.From bending angle profiles corresponding refractivity profiles can be derived, from which in turn also temperature profiles can be determined (Kursinski et al., 1997).The latter can be done either by assuming that the refraction is entirely due to dry air (resulting in so-called "dry" temperatures) or by accounting for tropospheric water vapor by using additional information, e.g., from operational numerical weather forecast data in the framework of a one-dimensional variational algorithm that uses ECMWF Integrated Forecast System (IFS) data as a priori information (ROM SAF, 2014a, b, and references therein).The latter approach is pursued by the ROM SAF and corresponding temperature data are denoted "wet" temperatures.For the current study we will mainly use dry instead of wet temperatures since the latter have been derived using model output and might not be considered as "pure" measurements.Nevertheless, we will also briefly consider wet temperatures and compare them to the more "original" dry ones.Note that the ROM SAF provides dry and wet temperatures from July 2014 onwards only.Hence, in this study we restrict ourselves to the period from July 2014 to December 2016, i.e., a total of 30 months of data.
METOP temperature profiles are provided on geopotential heights, which will be used here as the vertical coordinate.The fundamental vertical resolution of the technique, z, is limited by diffraction as the GPS rays pass through the atmosphere and results in about z = 1 − 1.4 km in the altitude range between 15 and 40 km.Over this vertical interval, the horizontal line-of-sight resolution can be estimated to be around 190-270 km due to the limb geometry of the observations (see Kursinski et al., 1997;Hindley et al., 2015, for details).

ECMWF operational analysis and reanalysis data
For comparison to the METOP RO data we use two different data sets provided by the ECMWF; one is the operational analyses from the IFS.These have a horizontal grid spacing of about 16 km (T L 1279) and were evaluated on 25 pressure levels between 1000 and 1 hPa which we converted to geopotential heights and interpolated them on a regular vertical grid with 1 km spacing.We note that according to Skamarock (2004) only scales exceeding the grid spacing by several times are resolved.Model output is available every 6 h.Details about the model can be found in Malardel and Wedi (2016) and in references therein.
The second model data set that we use is the ERA-Interim reanalysis.ERA-Interim is a global atmospheric reanalysis starting from 1979 which is based on a 2006 release of the IFS.The horizontal grid spacing of the data set is approximately 80 km (T255).For the current study, model fields were evaluated on 37 standard pressure levels between 1000 and 1 hPa which we converted to geopotential heights and interpolated them on a regular vertical grid with 1 km spacing.For details about ERA-Interim see Dee et al. (2011).
Please note that ECMWF does assimilate RO bending angle data (among many other data sets) from a variety of instruments including (but not limited to) the METOP data for both the ERA-Interim reanalysis and the IFS analyses (see Poli et al., 2010, as well as the ECMWF website).Thus, ECMWF model fields and METOP RO data are obviously not completely independent.

Comparison between RO and ECMWF temperature data
In this subsection we systematically compare RO temperatures with ERA-Interim and IFS model data.As a start, Fig. 2 shows zonal mean temperatures for the months March, June, and December 2015 derived from METOP GPS RO dry data (left column), GPS RO wet data (middle column), and from ERA-Interim.Note that from now on we will refer to METOP GPS RO dry and wet data as "RO-dry" and "ROwet" data for brevity.Overall, all data sets agree well, with, however, notable differences between the dry temperatures and the other two data sets in the troposphere and at the highest altitudes above 40 km.These findings are not surprising given that the retrieval for wet temperatures uses ECMWF IFS data as a priori information, the assumption of dry conditions is certainly violated in the troposphere, and the quality of RO observations in general decreases significantly above ∼ 40 km altitude Marquardt and Healy (2005).In the following, we hence restrict our comparison to altitudes between 20 and 40 km.
For a more quantitative comparison, we have binned the ECMWF data sets on the same space and time grid as the RO data; i.e., mean profiles were determined for the period of 1 month and a latitude-longitude-altitude grid of 5 • by 10 • by 1 km. Figure 3 shows corresponding scatter plots between RO-dry temperatures and corresponding IFS data for all 30 months of data considered in this study (i.e., July 2014-December 2016) as well as histograms of the temperature differences between the data for three selected altitudes.This reveals very large correlation coefficients close to 1 between the data with a general degradation of the (still very good) correlation as well as an increasing bias between the data with increasing altitude.The full altitude variation of the correlation coefficients between the considered data sets as well as the median temperature differences along with corresponding 10 and 90 % percentiles is shown in Fig. 4.This again shows an almost perfect correlation between ERA-Interim and IFS data (as expected) and between the RO-wet temperatures and the IFS.Again, only the dry temperatures show a notable disagreement from the other data sets at altitudes above ∼ 35 km.This is further quantified with the median biases (and percentiles) shown in panel b of Fig. 4, which shows a median bias of +1 K (+2 K) between RO-dry temperatures and the IFS (i.e., IFS temperatures are larger than dry RO temperatures) at 34 km (40 km), with a corresponding large variability range as indicated by the percentiles.We note that these values are in excellent quantitative agreement with a previous study in which GPS RO observations were compared to ECMWF data (Scherllin-Pirscher et al., 2011).Compared to this bias of the RO-dry data, it is again not surprising to see that the RO-wet temperatures show a much smaller bias (close to zero) to the IFS and that also the corresponding variability range is greatly re- 2014c).In all, RO temperatures agree well with ERA-Interim and IFS temperatures such that it appears justified to proceed and next compare corresponding E P values.

Derivation of E P
We next turn to the derivation of E P from the various input temperature data sets considered in this study.E P is defined as follows: where g is acceleration of gravity, N 2 = g T 0 • ( dT 0 dz + g c p ) is the (squared) buoyancy frequency with the specific heat capacity of air for constant pressure c p , T is the temperature perturbation owing to the GW, and T 0 is the background temperature.The overbar denotes averaging, which is here carried out over the spatial domain of the latitude-longitude grid and the time period of 1 month.In Eq. (1) all quantities depend on height z except for g, for which we use a constant value of 9.81 ms −2 .The main challenge in deriving E P (z) from measured temperature profiles lies in the separation between background and perturbations.Different studies have used various approaches such as filtering of profiles in the vertical or in the horizontal provided that the horizontal sampling is sufficient.See Khaykin (2016) and Ehard et al. (2015) for recent critical discussions of the advantages and disadvantages of different techniques.
For this study, we follow the approach of Ehard et al. (2015); i.e., we apply a fifth-order Butterworth filter with a cutoff wavelength of 15 km to vertical temperature profiles from the RO measurements, ERA-Interim, the IFS, and ground-based lidar measurements.Applying this filter to altitude profiles implies that scales longer than 15 km are assumed to be the "background" (climatological structure plus planetary waves), denoted T 0 (z), while shorter scales are assumed to be fluctuations due to atmospheric gravity waves.This separation is expected to work well except for in the tropical stratosphere, where Kelvin waves are known to occur with vertical wavelengths well below 15 km (e.g., Ern et al., 2008;Randel and Wu, 2005).Hence, E P must be expected to be biased high in the tropics.Nevertheless, we stick to this approach since it has the advantage that all data sets analyzed in this study can be treated with identical analysis routines, thus allowing us to directly and quantitatively compare E P values from four independent data sets.
Resulting T 0 (z) profiles are then used to derive N 2 (z) profiles.Arbitrarily chosen sample profiles from RO-dry data are shown in Fig. 5. Figure 5 shows both cases with strong (middle panel) and weak GW activity (left panel).These sample profiles further show that the background temperature determination has weaknesses in cases with a very pronounced tropopause as in the right panel.We will come back to this issue in more detail in Sect. 5. Here, neither the pronounced tropopause (at around 17 km) nor the inversion layer above (i.e., between 20 and 25 km) is well captured by the Butterworth filter, resulting in unrealistically large temperature perturbations which might not be confused with real gravitywave-induced temperature perturbations.This is a general problem with all techniques that analyze vertical temperature profiles, which has motivated many authors to exclude the tropopause region and the lowest altitudes above it from further analyses (see, e.g., Schmidt et al., 2008, for a detailed discussion and an approach to derive GW properties in the vicinity of the tropopause).For this reason, we will exclude altitudes below 20 km from our analysis and focus on the altitude range between 20 and 40 km only, knowing, of course, that the largest altitudes need to be treated with care since noise of RO data is known to pick up significantly above ∼ 35 km altitude (Marquardt and Healy, 2005).(but still slightly underestimated) by the IFS but completely missed by ERA-Interim due to the much coarser horizontal resolution of the latter.Finally, at the highest altitudes, the overall seasonal variation of E P that is observed with the RO sensors is reproduced by the models, but modeled E P values are smaller than those derived from RO observations by factors between 2 and 3.This is expected since the sponge layer in the ECMWF models starts strongly damping any smallscale structures above 10 hPa or ∼ 32 km (Jablonowski and Williamson, 2011;Ehard, 2017) and since RO measurement noise is picking up substantially above 35 km (see Marquardt and Healy, 2005, and our analysis in Fig. 4 and corresponding discussion).
Next, we compare the same RO time series to local E P observations obtained with Rayleigh lidar (see Fig. 8).The portable lidar systems as well as the data analysis procedure used during the three campaigns have been described in detail in Kaifler et al. (2015Kaifler et al. ( , 2017)).In short, Rayleigh lidar measurements yield relative density profiles at altitudes where pure molecular scatter accounts for the signal, i.e., from above the stratospheric aerosol layer.Hence, data are available for altitudes above ∼ 30 km (and below ∼ 90 km) but may be extended to lower altitudes after careful analysis, ensuring that stratospheric aerosol scatter did not contribute to the signal.Relative density profiles are then converted to temperatures, applying hydrostatic downward integration.Finally, E P values are derived in the same manner as for the RO data described above (see Sect. 3).
The comparison shown in Fig. 8 reveals that lidar and RO data generally show very similar seasonal variation.However, the comparison also shows that the local lidar observations yield significantly larger E P values by up to a factor of ∼ 2. This is likely because the lidar observations are sensitive to a larger part of the gravity wave spectrum than the RO observations.As described in Sect.2.1, the horizontal line of sight of RO observation is approximately 190-270 km.Hence, depending on the orientation of the wave vector relative to this line of sight, the RO technique may not resolve waves with horizontal wavelengths shorter than these 190-270 km (if the phase fronts are aligned with the line of sight; the RO technique might, however, be able to detect GWs with shorter horizontal wavelengths than is the case if the phase fronts are perpendicular to the line of sight; see Kursinski et al. (1997) and de la Torre and Alexander (2005) for details).Hence, it is clear that RO observations are only sensitive to GWs with rather large horizontal wavelengths whereas lidar observations may also detect much smallerscale gravity waves.Note that there is also a (moderate) difference in vertical resolution, which is 900 m for the lidar temperatures and ∼ 1.4 km for the RO data (Kaifler et al., 2015;Kursinski et al., 1997).In addition, we also need to realize that the spatial sampling for both data sets is very different: while the E P values based on RO data are typically based on 20-40 single (snapshot) temperature profiles that have been obtained in a geographical area of 5 • in latitude sampling on the resulting E P values, it is conceivable that the large geographical area over which the RO data are obtained might result in a smearing out of local GW maxima and should hence tend to smaller values compared to local observations.
In all, we conclude from the comparison of time series at the three considered locations that the fit between GPS RO and IFS and ERA-Interim data is generally very good whereas comparison to local observations indicates that RO E P values are low biased -which is likely due to different observational filters of both techniques (see, e.g., Alexander et al., 2010;Ern et al., 2004, for a thorough discussion of observational filters of different techniques).Next, we finally compare GPS RO with IFS and ERA-Interim data on a global basis.For all 30 months between July 2014 and December 2016 we have computed E P on a grid of 5 • in latitude, 10 • in longitude, and 1 km in the vertical for the whole considered altitude range of 20-40 km.For each altitude, we have then analyzed the relation between the two GPS RO data sets and the model data sets in terms of correlation coefficients as well as in terms of absolute differences.An initial impression of the statistical relation between E P values from RO-dry data and from IFS data is presented in Fig. 9, which shows corresponding scatter plots along with a linear regression to the data as well as histograms of the absolute difference between the two data sets for three selected altitudes.Figure 9 shows a very large correlation of R = 0.94 at 22 km, a minimum value of R = 0.45 at 28 km, and a slightly larger value of R = 0.56 again at 38 km altitude.Furthermore, it is common to all three histograms that IFS values are biased low with respect to the RO data.Interestingly, though, the distribution is broadest at the lowest considered altitude with much narrower distributions above.
The complete altitude variation of correlations as well as biases is shown in Fig. 10, which shows correlation coefficients and median differences (along with 10 and 90 % percentiles) between ERA-Interim and IFS, between RO-dry data and IFS data, and last but not least between RO-wet data and IFS data.Figure 10 shows several interesting features.Starting with the correlation coefficients, those are generally large (between 1.0 and 0.5) except for the altitude range between 25 and 30 km where the correlation of both RO data products (wet and dry) with model data show a minimum with values as low as 0.4.Above 30 km, however, correlations coefficients increase again.Besides this striking minimum between 25 and 30 km, the overall envelope of the altitude variation shows larger correlation coefficients between 0.9 and 1.0 below 25 km and values between 0.8 (for the correlation between ERA-Interim and IFS) and 0.5 (for the correlations between the RO-dry data and IFS data) at 40 km.Turning to absolute differences (right panel in Fig. 10), the median differences between ERA-Interim and IFS data are very small (less than 1 J kg −1 ) with IFS values being slightly larger than ERA-Interim values.Concerning the absolute differences between RO and IFS data, both RO data products yield systematically larger E P values than the IFS, where, however, the median difference between the RO-wet data and the IFS data is significantly smaller than the difference between the more "original" RO-dry data and the IFS data.Interestingly, both the median difference and its variability (indicated by the percentiles) is quite large at 20 km and decreases significantly up to an altitude of 25 km, above which both median differences and related variability increase again up to the maximum altitudes considered.

Discussion
In order to identify the reason for the reduced correlation between RO and IFS data between 25 and 30 km as well as the relatively large bias below ∼ 23 km, we next consider a comparison of latitude-longitude distributions of E P values at selected altitudes based on RO-dry data and IFS data.Corresponding results for December 2015 and June 2015 are presented in Figs.11 and 12, respectively.We start with a discussion of the relatively low correlation coefficients at altitudes between 25 and 30 km.Inspection of Figs.11 and 12 reveals that the likely reason for this is that apparently the IFS is hardly simulating any gravity wave activity at the altitude levels of lowest correlation whereas the observations do show some weak but clearly detectable GW activity.The reason why the IFS does not simulate any (very weak) GW activity in the considered vertical wavelength range at these altitudes is not clear at this point but is consistent for all months considered in this study and should be further investigated in the future.As for the bias at altitudes below 25 km, the E P distributions shown at 20 and 22 km show that the strongest (apparent) GW activity is here observed in a band of ±20 • around the Equator with significantly larger values seen in RO data than in IFS data.This is, however, the region of the tropical tropopause and its related TTIL.Note that it is on purpose that we refer to the tropical tropopause inversion layer as TTIL instead of the more commonly known TIL, since the latter term has usually only been used for the midlatitude TIL and not the tropical one that we are dealing with here (Birner et al., 2002(Birner et al., , 2006;;Pilch Kedzierski et al., 2016).That this is indeed the case for the here considered data set is demonstrated in Fig. 13, which shows zonal mean N 2 values based on RO and IFS data.Note that the N 2 values in Fig. 13 were computed from monthly mean zonal mean temperatures that must not be confused with the N 2 values used in our E P calculation, which is based on T 0 profiles.Remember that T 0 profiles result from filtering individual temperature profiles with a fifth-order Butterworth filter with cutoff wavelength at 15 km such that T 0 profiles only contain spatial scales larger than 15 km and hence do not contain information on the TTIL. Figure 13 clearly shows that it is indeed the latitude and altitude range of the TTIL which coincides with correspond-ing regions of large E P values in the considered data sets.In addition, Fig. 13 also shows that the TTIL is more pronounced in the RO data than in the ERA-Interim data.Hence, it is tempting to speculate that the large E P values seen in the tropics and the corresponding large differences between the RO data and the IFS data is because our algorithm to derive E P values from temperature profiles by means of separating background temperatures from gravity-wave-induced disturbances fails in this altitude and latitude region.In order to test this idea further, we present zonal mean E P values as a function of latitude and altitude between 20 and 40 km altitude based on both RO-dry data and IFS data in Fig. 14.This figure clearly shows the region of large E P values between 20 and 25 km altitude and at latitudes between −20 and +20 • .It also shows that RO values in this region are significantly larger than in the IFS data set.In order to test whether these are indeed real indications of gravity wave activity or rather artifacts due to the TTIL we have next applied our algorithm to derive E P values to monthly mean zonal mean temperature profiles.For those, it can safely be assumed that they do not contain any remaining gravity wave signatures (since many profiles have been averaged) such that any significant nonzero E P values must be artifacts due to shortcomings of the algorithm.The result of this exercise is shown in the middle panels of Fig. 14.Quite obviously this analysis yields regions of very large apparent E P values in regions of the TTIL.
Compared to the panels in the upper row of the figure, it is also clear that these artifacts actually dominate the E P values in the TTIL region.In addition, we note that additional artifacts are observed at higher altitudes and also in other latitude and altitude regions.These may be caused by tropical Kelvin waves or other planetary-scale features such as inertial instability (e.g., Ern et al., 2008;Smith and Riese, 1999).However, for these, their absolute values are significantly less than in the data sets in the upper row such that the contribution of these artifacts to the overall E P values is not significant.This is also clearly seen in the lowermost panels of Fig. 14, which show the difference of the full E P distribution (in the top row) and the contributions from the monthly mean zonal mean profiles (in the middle).In these "corrected" E P distributions, the maximum values in the tropical TIL region have basically disappeared, whereas there is hardly any change visible at other altitude and latitude regions.Coming back to panel b of Fig. 10 we hence conclude that the relatively large differences seen below 25 km do not reflect real differences in terms of gravity wave activity in RO data and model data.
Instead, the differences are caused by differences in the representation of the TTIL and the difficulty to properly derive E P values in its environment from vertical profiles alone.We finally attempt to determine the quality of the corrected E P values in Fig. 14 by comparing them to E P values using a horizontal background determination method.Horizontal estimation of T 0 was previously found to be superior to a vertical background determination by Khaykin (2016) and Schmidt et al. (2016).While the sampling statistics of the METOP RO data on a daily basis (i.e., only 1100 profiles distributed over the whole globe) is too poor to allow us to apply a horizontal background determination to them we may easily perform a corresponding analysis of the highresolution IFS data.For this purpose the spectral model output of the IFS for December 2015 has been reconstructed at T42, i.e., at a horizontal grid spacing of 500 km.These fields have then been used as background temperatures T 0 (z, λ, φ), where λ is latitude and φ is longitude, in order to compute monthly mean zonal mean distributions of E P .Such monthly mean zonal mean E P distributions for December 2015 are presented in Fig. 15.In the same figure we also show corresponding fields of the vertical kinetic energy, VE = 1 2 w 2 Geller and Gong (2010).Note that VE is a good indicator of gravity waves in the stratosphere since vertical velocities due to other air motions are significantly smaller.While VE values are significantly smaller than E P values (by about a factor of 1000 in the IFS model) it is still instructive to compare the spatial morphology of the corresponding fields.This comparison clearly reveals that the proposed correction of E P distributions derived using a vertical background determination (see Fig. 14 and related text) improves the comparison between E P and VE but that it cannot eliminate all features that are apparently not due to gravity waves.Closer inspection of the data sets reveals that this is partly because some of the non-gravity wave structures (mainly the TTIL) are not zonally homogeneous such that correcting for them using zonal mean fields cannot eliminate the non-gravity wave structures completely.We hence conclude that this correction may be recommended for application to data sets that can only be analyzed using a vertical background determination method such as for the METOP data with relatively scarce sampling statistics.However, even after this correction, regions within ±30 • latitude around the Equator need to be considered with care due to additional potential contamination of E P by Kelvin waves or other planetary-scale features.In any case, if the sampling statistics allows, our analysis clearly shows that in general a horizontal background determination is advantageous in that it better avoids contributions to E P that are not caused by gravity waves.

Summary and conclusions
In this paper we compared operational METOP GPS RO temperatures and derived gravity wave potential energy densities with corresponding ECMWF operational analysis and ERA-Interim reanalysis data sets.This was done to answer two questions: firstly whether the sampling and data quality of the operational RO data set is sufficient to properly characterize the global gravity wave activity (measured in terms of E P ) on a monthly basis and, secondly, whether the METOP observations are consistent with the ECMWF model fields such that the latter can be used for the interpretation of observational results.
For this purpose, we analyzed a total of 30 months of RO data for the period from July 2014 to December 2016.We calculated monthly mean temperatures and E P values on a grid of 5 • in latitude, 10 • in longitude, and at a vertical resolution of 1 km for altitudes between 20 and 40 km.This was done for two RO data sets, namely for so-called "dry" and "wet" data both provided by EUMETSAT's ROM SAF.Dry temperatures are directly derived from refractivity profiles which in turn are estimated from bending angle observations with the GPS RO technique.In contrast, wet temperatures are the result of a one-dimensional variational retrieval that uses additional a priori information on atmospheric hu- midity and temperature from ECMWF model fields.Subsequently both temperatures and E P values from RO observations and from ECMWF analysis and reanalysis model fields were compared rigorously.The comparison of temperatures showed very low systematic differences between RO-dry temperatures and ECMWF model fields between 20 and 30 km (i.e., median temperature differences between −0.2 and +0.3 K), which then increased with height to yield median differences of +1.0 K at 34 km and +2.2 K at the maximum considered altitude of 40 km.Compared to this, median differences between RO-wet temperatures and ECMWF model data were below 0.16 K for all considered altitudes, which is as expected since ECMWF model data were used to constrain the RO data retrieval.
We then introduced a method to derive E P from temperature profiles by applying a fifth-order Butterworth filter with cutoff wavelength of 15 km to both RO and model data.An initial comparison of E P time series in selected altitude ranges and at three selected locations in Sodankylä, northern Scandinavia, in the German Bavarian Forest, and in Lauder, New Zealand, yielded overall very good agreement: below 35 km, this agreement was both very good in terms of seasonal variation and in terms of absolute E P values.A striking result, however, was that for northern Scandinaviawhich is known as a region of strong orographic wave activity -the horizontally coarser-resolved ERA-Interim data underestimated a large winter peak of E P that was present in both the RO data and the higher-resolution IFS data.At altitudes above 35 km, however, both models did follow the observed seasonal variation of E P qualitatively but underestimated the observed values by about a factor of 2. This is likely caused by the damping of small-scale model structures by the model's sponge layer.Also, it is well known that noise in RO data picks up substantially above 35 km such that several previous studies have recommended restricting the useful range of RO data for GW analysis to below 35 km (e.g., Schmidt et al., 2008).This previous recommendation is clearly supported by our analysis.
The same E P time series from RO observations were then also compared to local Rayleigh lidar observations.This comparison showed a qualitatively similar seasonal E P variation with both experimental techniques but it also revealed that the RO technique underestimates the locally observed values by about a factor of 2. This low bias is likely caused by the very different observational filter of RO and lidar observations where in particular the long line of sight of RO observations that are carried out in limb geometry severely hampers the detection of waves with horizontal wavelengths smaller than 190-270 km while the lidar observations are also sensitive to much smaller horizontal wavelengths.Finally we compared the full 30-month data set of RO and model E P fields.The corresponding statistical analysis shows large correlation coefficients (0.4-1.0) between all considered data sets (RO-dry, RO-wet, ERA-Interim, and IFS) for all altitudes between 20 and 40 km.A minimum correlation (of still 0.4) was found at altitudes around 28 km, where the ECMWF analysis and reanalysis fields do not seem to capture the GW activity that is observed in the RO data.The reason for this discrepancy could not be identified and should be investigated in a future study.Concerning absolute differences between observed and modeled E P values, the median difference was relatively small at all altitudes with an exceptional feature between 20 and 25 km where both the median difference between RO and model data increased and where the corresponding variability was also found to be very large.The reason for this was identified as an artifact in the E P algorithm: this erroneously interprets the pronounced climatological feature of the TTIL at latitudes between ±20 • and altitudes between 20 and 25 km as gravity wave activity, hence yielding (a) very large E P values in this area and (b) large differences between model and observations because the RO data show a much more pronounced TTIL than IFS and ERA-Interim.Based on that finding we also suggested a correction for this effect based on an estimate of this "artificial" E P using monthly mean zonal mean temperature profiles which do reveal a very pronounced TTIL but which should not contain any remaining GW signatures due to strong averaging.In addition, this technique to derive and correct E P based on vertical profiles was compared to an alternative method applying a horizontal background temperature determination method to IFS data.We find that the above-introduced correction may be recommended for application to data sets that can only be analyzed using a vertical background determination method such as the METOP data with relatively scarce sampling statistics.However, if the sampling statistics allows, our analysis also shows that in general a horizontal background determination is advantageous in that it better avoids contributions to E P that are not caused by gravity waves like the TTIL and potentially also Kelvin waves and other planetary-scale features with short vertical wavelengths (i.e., less than 15 km).
In summary, our analysis shows good quantitative agreement between monthly mean RO-dry and ERA-Interim and IFS data in the altitude range between 20 and 40 km altitude.Hence, both research questions posed at the beginning of this study can be answered positively: for one, this good agreement shows that METOP RO-dry data are a suitable database to study monthly mean global gravity wave activity in the altitude range between 20 and 40 km (with the caveat that the tropical latitudes need to be considered with particular care).In addition, the good agrement between ROdry and ECMWF data also implies that the combination of both appears to be a versatile combined data set for the study of processes determining the GW climatology.Future questions to be considered are, for example, how far the strong stratospheric jet streams influence the observed GW morphology in the stratosphere.While model results of Dunkerton (1984) and more recently also Sato et al. (2009) and Sato et al. (2012) have long suggested that the waves should be refracted into the jet streams, observational evidence for this process based on global data is still scarce.This and other research questions will be investigated in future studies.

Figure 1 .
Figure 1.(a) Number of METOP-A and B radio occultations per 5 • latitude and 10 • longitude bin in June 2015.The total number of occultations in this month is about 35 000.(b) Number of occultations per 5 • latitude bin integrated over all longitudes.

Figure 2 .
Figure 2. Zonal mean temperatures as a function of latitude and altitude for the months March, June, and December 2015 (a-i) from METOP-A and B radio occultations (a, d, g) and from ERA-Interim (c, f, i).

Figure 3 .Figure 4 .
Figure 3. Scatter plots (a) between RO-dry temperatures and corresponding IFS data for 30 months of data between July 2014 and December 2016 for three selected altitudes.The red line shows a linear fit to the data with slope b, y intercept a, and correlation coefficient R (see insert).Panel (b) shows histograms of the corresponding temperature differences between IFS and RO-dry data for the same selected altitudes.
Figure 5. (a) Sample radio occultation temperature profiles from December 2015 (black lines) with background profiles (red lines) as determined with a fifth-order Butterworth filter with 15 km cutoff wavelength following Ehard et al. (2015).(b) Corresponding temperature perturbation profiles (radio occultation profile minus background profile).

Figure 6 .Figure 7 .
Figure 6.Monthly mean latitude-longitude cross sections of E P at selected altitudes of 30, 33, 36, and 39 km (a-l) for December 2015.(a, d, g, j) METOP RO-dry data, (b, e, h, k) IFS data, and (c, f, i, l) ERA-Interim data.In all panels, black contour lines show zonal wind values from ERA-Interim.

Figure 8 .
Figure 8. (a-c) Comparison of time series of monthly mean E P from METOP RO data for different altitude ranges (black, blue, and red curves; see insert for color code) with local Rayleigh lidar measurements of E P for the stations of Sodankylä (a, d), the Bavarian Forest (b, e), and Lauder (c, f).Lidar E P are shown as yellow (25-35 km), light blue (35-45 km), or green (45-55 km) lines.(d-f) Number of RO profiles (black lines) and nightly mean lidar profiles (light blue lines) entering the monthly mean shown in the panels above.

Figure 9 .Figure 10
Figure 9. (a, c, e) Scatter plots between E P values derived from the IFS and RO-dry data for three different altitudes, i.e., 22 km (a, b), 28 km (c, d), and 38 km (e, f).The red line shows a linear fit to the data with slope b, y intercept a, and correlation coefficient R (see insert).(b, d, f) Corresponding histograms of the difference between the two data sets.

Figure 11
Figure 11.(a, c, e) Latitude-longitude distributions of E P based on GPS RO-dry data for December 2015 and altitudes of 20, 28, and 38 km (a-fs).(b, d, f) Same as panels (a, c, e) but based on IFS data.In all panels black contours show zonal wind values from ERA-Interim.

Figure 13 .
Figure 13.Zonal mean distribution of N 2 as a function of latitude and altitude for the months June 2015 (a, c) and December 2015 (b, d) based on GPS RO-dry data (a, b) and IFS data (c, d).
Figure 14.(a, b): Monthly mean zonal mean distributions of E P as a function of latitude and altitude for December 2015 based on RO-dry data (a) and IFS data (b).(c, d) Zonal mean apparent E P values derived from applying the E P algorithm to monthly mean zonal mean temperature profiles.(e, f) Difference between (a, b) and (c, d).
Figure 15.(a) Monthly mean zonal mean distribution of E P from IFS data derived after detrending in the horizontal with T42 IFS fields.(b) Monthly mean zonal mean distribution of VE = 1 2 w 2 .
Comparison of METOP E P values with ECMWF model data and ground-based lidar measurementsWe next present a systematic comparison of E P values derived from METOP RO-dry temperatures, the IFS, and ERA-Interim.As an initial impression, Fig.6shows monthly mean latitude-longitude cross sections of E P at selected altitudes of 30, 33, 36, and 39 km for December 2015.At 30 km, the RO data reveal pronounced GW activity over Scandinavia, over the Iberian peninsula and north Africa, and in a band in the vicinity of the Equator, with strongest activity in the tropical central Pacific (135-180 • E).Moving to 33 km altitude, E P values increase with pronounced activity still over Scandinavia, strong activity at around 40 • N in the Atlantic storm track region, and an additional activity center over the northern part of South America.At larger altitudes, these general features remain, but become smeared out geographically.Generally speaking, this overall morphology of GW activity is well reproduced by both the IFS and ERA-Interim with some notable differences.First of all, E P values from