Validation of the CrIS fast physical NH 3 retrieval with ground-based FTIR

. Presented here is the validation of the CrIS (Cross-track Infrared Sounder) fast physical NH 3 retrieval (CFPR) column and proﬁle measurements using ground-based Fourier transform infrared (FTIR) observations. We use the total columns and proﬁles from seven FTIR sites in the Network for the Detection of Atmospheric Composition Change (NDACC) to validate the satellite data products. The overall FTIR and CrIS total columns have a positive correlation of r = 0.77 ( N = 218) with very little bias (a slope of 1.02). Binning the comparisons by total column amounts, for concentrations larger than 1.0 × 10 16 molecules cm − 2 , i.e. ranging from moderate to polluted conditions, the relative difference is on average ∼ 0–5 % with a standard deviation of 25–50 %, which is comparable to the estimated retrieval uncertainties in both CrIS and the FTIR. For the smallest total column range ( < 1.0x × 10 16 molecules cm − 2 ) where there are a large number of observations at or near the CrIS noise level (detection limit) the absolute differences between CrIS and the FTIR total columns show a slight positive column bias. The CrIS and FTIR proﬁle comparison differences are mostly within the range of the single-level retrieved proﬁle values from estimated retrieval uncertainties, showing average differences in the range of ∼ 20 to 40 %. The CrIS retrievals typically show good vertical sensitivity down into the boundary layer which typically peaks at ∼ 850 hPa ( ∼ 1.5 km). At this level the median absolute difference is 0.87 (std = ± 0.08) ppb, corresponding to a median relative difference of 39 % (std = ± 2 %). Most of the absolute and


Introduction
The disruption of the nitrogen cycle by the human creation of reactive nitrogen has created one of the major challenges for humankind (Rockström et al., 2009). Global reactive nitrogen emissions into the air have increased to unsurpassed levels  and are currently estimated to be four times larger than pre-industrial levels (Holland et al., 1999). As a consequence the deposition of atmospheric reactive nitrogen has increased causing ecosystems and species loss (Rodhe et al., 2002;Dentener et al., 2006;Bobbink et al., 2010). Ammonia (NH 3 ) as fertilizer is essential for agricultural production and is one of the most important reactive nitrogen species in the biosphere. NH 3 emission, atmospheric transport, and atmospheric deposition are major causes of eutrophication and acidification of soils and water in seminatural environments (Erisman et al., 2008(Erisman et al., , 2011. Through reactions with sulfuric acid and nitric acid, ammonium nitrate and ammonium sulfate are formed, which embody up to 50 % of the mass of fine-mode particulate matter (PM 2.5 ) (Seinfeld and Pandis., 1988;Schaap et al., 2004). PM 2.5 has been associated with various health impacts (Pope III et al., 2002). At the same time, atmospheric aerosols impact global climate directly through their radiative forcing effect and indirectly through the formation of clouds (Adams et al., 2001;Myhre et al., 2013). By fertilizing ecosystems, deposition of NH 3 and other reactive nitrogen compounds also plays a key role in the sequestration of carbon dioxide (Oren et al., 2001).
Despite the significance and impact of NH 3 on the environment and climate, its global distribution and budget are still relatively uncertain (Erisman et al., 2007;Clarisse et al., 2009;Sutton et al., 2013). One of the reasons is that in situ measuring of atmospheric NH 3 at ambient levels is complex due to the sticky nature and reactivity of the molecule, leading to large uncertainties and/or sampling artefacts with the currently used measuring techniques (von Bobrutzki et al., 2010;Puchalski et al., 2011). Measurements are also very sparse. Currently, observations of NH 3 are mostly available in north-western Europe and central North America, supplemented by a small number of observations made in China (Van Damme et al., 2015b). Furthermore, there is a lack of detailed information on its vertical distribution as only a few dedicated airborne measurements are available (Nowak et al., 2007(Nowak et al., , 2010; Leen et al., 2013;Whitburn et al., 2015;. The atmospheric lifetime of NH 3 is rather short, ranging from hours to a few days. In summary, global emission estimates have large uncertainties. Estimates of regional emissions attributed to source types that are different from the main regions are even more uncertain due to a lack of process knowledge and atmospheric levels (Reis et al., 2009).
Over the last decade the development of satellite observations of NH 3 from instruments such as the Cross-track Infrared Sounder (CrIS, , the Infrared Atmospheric Sounding Interferometer (IASI, Clarisse et al., 2009;Coheur et al., 2009;Van Damme et al., 2014a), the Atmospheric Infrared Sounder (AIRS, Warner et al., 2016) and the Tropospheric Emission Spectrometer (TES, Beer et al., 2008;Shephard et al., 2011) have shown the potential to improve our understanding of NH 3 distribution. Recent studies show the global distribution of NH 3 measured at a twice daily scale (Van Damme et al., 2014a, 2015a can reveal seasonal cycles and distributions for regions where measurements were unavailable until now. Comparisons of these observations to surface observations and model simulations show underestimations of the modelled NH 3 concentration levels, pointing to underestimated regional and national emissions Shephard et al., 2011;Heald et al., 2012;Nowak et al., 2012;Zhu et al., 2013;Van Damme et al., 2014b;Lonsdale et al., 2017;Schiferl et al., 2014Schiferl et al., , 2016. However, the overall quality of the satellite observations is still highly uncertain due to a lack of validation. The few validation studies showed a limited vertical, spatial and or temporal coverage of surface observations for a proper uncertainty analysis (Van Damme et al., 2015b;Sun et al., 2015). A recent study by Dammers et al. (2016a) explored the use of Fourier transform infrared (FTIR-NH 3 , Dammers et al., 2015) observations to evaluate the uncertainty of the IASI-NH 3 total column product. The study showed the good performance of the IASI-LUT (look-up table; Van Damme et al., 2014a) retrieval with a high correlation (r ∼ 0.8), but indicated an underestimation of around 30 % due to potential assumptions of the shape of the vertical profile (Whitburn et al., 2016;IASI-NN, neural network), uncertainty in spectral line parameters and assumptions on the distributions of interfering species. The study showed the potential of using FTIR observations to validate satellite observations of NH 3 , but also stressed the challenges of validating retrievals that do not provide the vertical measurement sensitivity, such as the IASI-LUT retrieval. Since no IASI satellite averaging kernels are provided for each retrieval, and thus no information is available on the vertical sensitivity and/or vertical distribution of each separate observation, it is hard to determine the cause of the discrepancies between the observations.
The new CrIS fast physical retrieval  uses an optimal estimation retrieval approach that provides the information content and the vertical sensitivity (derived from the averaging kernels; for more details see , and robust and straightforward retrieval error estimates based on retrieval in-put parameters. The quality of the retrieval has so far not been thoroughly examined in comparison to other observations.  used Observing System Simulation Experiment (OSSE) studies to evaluate the initial performance of the CrIS NH 3 retrieval, and report a small positive retrieval bias of 6 % with a standard deviation of ±20 % (ranging from ±12 to ±30 % over the vertical profile). Note that no potential systematic errors were included in these OSSE simulations. Their study also shows good qualitative comparisons with the Tropospheric Emission Spectrometer (TES) satellite (Shephard et al., 2011) and the ground-level in situ quantum cascade laser (QCL) observations (Miller et al., 2014) for a case study over the Central Valley in CA, USA, during the DISCOVER-AQ campaign. However, currently there has not been an extensive validation of the CrIS NH 3 retrievals using direct comparisons with vertical profile observations. In this study we will provide both direct comparisons of the CrIS-retrieved profiles and groundbased FTIR observations as well as comparisons of CrIS total column values and the FTIR and IASI.

The CrIS fast physical retrieval
CrIS was launched in late October 2011 on board the Suomi NPP platform. CrIS follows a sun-synchronous orbit with a daytime overpass time at 13:30 LT (local time) (ascending) and a night-time equator overpass at 01:30 LT. The instrument scans along a 2200 km swath using a 3 × 3 array of circular-shaped pixels with a diameter of 14 km at nadir for each pixel, which become larger ovals away from nadir. In this study we use the NH 3 retrieval as described by . The retrieval is based on an optimal estimation approach (Rodgers, 2000) that minimizes the differences between CrIS spectral radiances and simulated forward model radiances computed from the Optimal Spectral Sampling method (OSS) OSS-CrIS (Moncet et al., 2008), which is built from the well-validated Line-By-Line Radiative Transfer Model (LBLRTM) (Clough et al., 2005;Shephard et al., 2009;Alvarado et al., 2013) and uses the HITRAN database (Rothman et al., 2013) for its spectral lines. The fast computational speed of OSS facilitates the operational production of CrIS-retrieved (level 2) products using an optimal estimation retrieval approach (Moncet et al., 2005). The CrIS OSS radiative transfer forward model computes the spectrum for the full CrIS LW band, at the CrIS spectral resolution of 0.625 cm −1 (Tobin, 2012); thus the complete NH 3 spectral band (near 10 µm) is available for the retrievals. However, only a small number of microwindows are selected for the CrIS retrievals to both maximize the information content and minimize the influence of errors. Worden et al. (2004) provides an example of a robust spectral region selection process that takes into consideration both the estimated errors (i.e. instrument noise, spectroscopy errors, interfering species, etc.) and the associated information content in order to select the optimal spectral regions for the retrieval. The a priori profiles selection for the optimal estimation retrievals follows the TES retrieval algorithm (Shephard et al., 2011). Based on the relative NH 3 signal in the spectra the a priori is selected from one of three possible profiles representing unpolluted, moderate, and polluted conditions. The initial guess profiles are also selected from these three potential profiles.
An advantage of using an optimal estimation retrieval approach is that averaging kernels (sensitivity to the true state) and the estimated errors of the retrieved parameter are computed in a robust and straightforward manner (for more details see . The total satellite retrieved parameter error is expressed as the sum of the smoothing error (due to unresolved fine structure in the profile), the measurement error (random instrument noise in the radiance spectrum propagated to the retrieval parameter), and systematic errors from uncertainties in the nonretrieved forward model parameters and cross-state errors propagated from retrieval to retrieval (i.e. major interfering species such as H 2 O, CO 2 , and O 3 ) (Worden et al., 2004).
As of yet we have not included error estimates for the systematic errors. The CrIS smoothing error is computed, but since in these FTIR comparison results we apply the FTIR observational operator (which accounts for the smoothing error), the smoothing error contribution is not included in the CrIS errors reported in the comparisons. Thus, only the measurement errors are reported for observations used here; these errors can thus be considered the lower limit of the total estimated CrIS retrieval error. Figure 1 shows an example of CrIS NH 3 observations surrounding one of the ground-based FTIR instruments. This is a composite map of all days in Bremen with observations in 2015. This figure shows the widespread elevated amounts of NH 3 across north-western Germany as observed by CrIS.
Since the goal of this analysis is to evaluate the CrIS retrievals that provide information beyond the a priori, we only performed comparisons when the CrIS spectrum presents a NH 3 signal. We also focused our efforts on FTIR stations that have FTIR observations with total columns larger than 5 × 10 15 molecules cm −2 (∼ 1-2 ppb surface VMR (volume mixing ratio). This restriction does mean that a number of sites of the FTIR-NH 3 data set will not be used. For comparability of this study to the results of the IASI-LUT evaluation in an earlier study by Dammers et al. (2016a) we include a short paragraph on the performance of the IASI-LUT and the more recent IASI-NN product when applying similar constraints.

FTIR-NH 3 retrieval
The FTIR-NH 3 product used in this study is similar to the set described in Dammers et al. (2016a) and is based on the retrieval methodology described by Dammers et al. (2015). The retrieval methodology uses two spectral microwindows with spectral width that depends on the NH 3 background concentration determined for the observation stations and location (wider window for stations with background concentrations less than one ppb  (Rodgers, 2000) is used, implemented in the SFIT4 algorithm (Pougatchev et al., 1995;Hase et al., 2004Hase et al., , 2006. There are a number of species that can interfere to some extent in both windows, with the major species being H 2 O, CO 2 and O 3 and the minor species N 2 O, HNO 3 , CFC-12, and SF 6 . The HITRAN 2012 database (Rothman et al., 2013) is used for the spectral lines. A further set of spectroscopic line parameter adjustments are added for CO 2 taken from the ATMOS database (Brown et al., 1996) as well as a set of pseudo-lines for the broad absorptions by the CFC-12 and SF 6 molecules (created by NASA-JPL, G.C. Toon, http://mark4sun.jpl.nasa.gov/pseudo.html). The NH 3 a priori profiles are based on balloon measurements (Toon et al., 1999) and refitted to match the local surface concentrations (depending on the station either measured or estimated by model results). For the interfering species a priori profiles we use the Whole Atmosphere Community Climate Model (WACCM, Chang et al., 2008, v3548). The estimated errors in the FTIR-NH 3 retrievals are of the order of ∼ 30 % (Dammers et al., 2015) with the uncertainties in the NH 3 line spectroscopy being the most important contributor. Based on the data requirements in Sect. 2.1, a set of seven stations is used (Table 1). For all sites except Wollongong in Australia we use the basic narrow spectral windows. For Wollongong the wide spectral windows are used. For a more detailed description of each of the stations see the publications listed in Table 1 or Dammers et al. (2016a).

IASI-NH 3
The CrIS retrieval will also be compared with corresponding IASI/FTIR retrievals using results from a previous study by Dammers et al. (2016a). Both the IASI-LUT (Van Damme et al., 2014a) and the IASI-NN (Whitburn et al., 2016) retrievals from observations by the IASI instrument aboard MetOp-A will be used. A short description of both IASI retrievals is provided here; for a more in-depth description see the respective publications by Van Damme et al. (2014a) and Whitburn et al. (2016). The IASI instrument on board the MetOp-A platform is in a sun-synchronous orbit and has a daytime overpass at around 09:30 LST (local solar time) and a night-time overpass at around 21:30 LST. The instrument has a circular footprint of about 12 km diameter for nadirviewing angles with of nadir observations along a swath of 2100 km. Both IASI retrievals are based on the calculation of a dimensionless spectral index called the hyperspectral range  , 2014a). The HRI is representative of the amount of NH 3 in the measured column. The IASI-LUT retrieval makes a direct conversion of the HRI to total column density with the use of a look-up table (LUT). The LUT is created using a large number of simulations for a wide range of atmospheric conditions which link the thermal contrast (TC, the difference between the air temperature at 1.5 km altitude and the temperature of the Earth's surface) and the HRI to a NH 3 total column density. The retrieval includes a retrieval error based on the uncertainties in the initial HRI and TC parameters. The more recent IASI-NN retrieval (Whitburn et al., 2016) follows similar steps but it makes use of a neural network. The neural network combines the complete temperature, humidity and pressure profiles for a better representation of the state of the atmosphere. At the same time the retrieval error estimate is improved by including error terms for the uncertainty in the profile shape, and the full temperature and water vapour profiles. The IASI-NN version uses the fixed profiles that were described by Van Damme et al. (2014a) but allows for the use of third party profiles to improve the representation of the NH 3 atmospheric profile. The IASI-LUT and IASI-NN retrievals have both been previously compared with FTIR observations (Dammers et al., 2016a, b). They compared reasonably well with correlations around r = 0.8 for a set of FTIR stations, with an underestimation of around 30 % that depends slightly on the magnitude of total column amounts, with the IASI-NN performing slightly better.

Data criteria and quality
NH 3 concentrations show large variations both in space and time as a result of the large heterogeneity in emission strengths due to spatially variable sources and drivers such as meteorology and land use . This high variability poses challenges in matching groundbased point observations made by FTIR observations with CrIS downward-looking satellite measurements which have a 14 km nadir footprint. For the pairing of the measurement data we apply data selection criteria similar to those described in Dammers et al. (2016a) and summarized in Table 2. To minimize the impact of the heterogeneity of the sources, we choose a maximum of 50 km between the centre points of the CrIS observations and the FTIR site location. To diminish the effect of temporal differences between the FTIR and CrIS observations, a maximum time difference of 90 min is used. Topographical effects are reduced by choosing a maximum altitude difference of 300 m at any point between the FTIR site location and the centre point of the satellite pixel location. The altitude differences are calculated using the Space Shuttle Radar Topography Mission Global product at 3 arc-second resolution (SRTMGL3, Farr et al., 2007). To ensure the data quality of CrIS-NH 3 retrieval for version 1.0, a small number of outliers with a maximum retrieved concentration above 200 ppb (at any point in the profile) were removed from the comparison data set. While potentially a surface NH 3 value of 200 ppb (and above) is possible (i.e. downwind of forest fires), it is highly unlikely to occur over the entire footprint of the satellite instrument. Moreover, after inspecting these data points, they seem to be affected by numerical issues in the fitting procedure (possibly due to interfering species). As we are interested in validating the CrIS observational information (not just a priori information), we only select comparisons that contain some information from the satellite (degrees of freedom for signal -DOFS -≥ 0.1). Do note that on average the observations have a DOFS between 0.9 and 1.1. The DOFS > 0.1 filter only removes some of the outliers at the lower end. No explicit filter is applied to account for clouds; however, clouds will implicitly be accounted for by quality control as CrIS will not measure a NH 3 signal (e.g. DOFS < 0.1) below optically thick clouds (e.g. cloud optical depth > ∼ 1). In addition, the CrIS observations are matched with FTIR observations taken only during clear-sky conditions, which mostly eliminates influence from cloud cover. Finally, the high signal-to-noise ratios (SNR) of the CrIS instrument allows it to retrieve NH 3 from a thermal contrast approaching 0 K during daytime observations (Clarisse et al., 2010). Given this, we decided not to apply a thermal contrast filter to the CrIS data. No additional filters are applied to the FTIR observations beyond the clear-sky requirement. For both IASI retrievals, we use the same observation selection criteria as described in Dammers et al. (2016a). The set of criteria is similar to those used here for the CrIS observations. Observations from both IASI retrievals are matched using the overpass time, and longitudinal and latitudinal positions. For comparability with CrIS a spatial difference limit of 50 km limit was used, instead of the 25 km spatial limit used in the previous study. Furthermore we apply the thermal contrast (> 12 K, difference between the temperatures at 1.5 km and the surface) and Earth's skin temperature criteria to the IASI observations to match the previous study.

Observational operator application
To account for the vertical sensitivity and the influence of the a priori profiles of both retrievals, we apply the observational operator (averaging kernel and a priori of the retrieval) of the FTIR retrieval to the CrIS-retrieved profiles. The CrIS observations are matched to each individual FTIR observation in time and space following the matching criteria. The FTIR averaging kernels, a priori profiles and retrieved profiles are first mapped to the CrIS pressure levels (fixed pressure grid, layers are made smaller or cut off for observations above elevation to fit the fixed pressure grid). Following Rodgers and Connor (2003) and Calisesi et al. (2005) this results in the mapped FTIR averaging kernel, A mapped ftir , the mapped FTIR a priori, x mapped,apriori ftir , and the mapped FTIR-retrieved profile, x mapped ftir . Then we apply the FTIR observational operator to the CrIS observations using Eq. (1).
x CrIS = x mapped,apriori ftir (1) provides an estimate of the FTIR retrieval applied to the CrIS satellite profile. Next we evaluate both total column and profile measurements.
For the first validation step, following Dammers et al. (2016a), who evaluated the IASI-LUT (Van Damme et al., 2014a) product, we sum the individual profile (x CrIS ) to obtain a column total to compare to the FTIR total columns. This step gives the opportunity to evaluate the CrIS retrieval in a similar manner as was done with the IASI-LUT retrieval. If multiple FTIR observations match a single CrIS overpass we also average those together into a single value as well as each matching averaged CrIS observation. Therefore, it is possible to have multiple FTIR observations, each with multiple CrIS observations all averaged into a single matching representative observation. For the profile comparison this averaging is not performed to keep as much detail available as possible. An important point to make is that this approach assumes that the FTIR retrieval gives a better representation of the truth. While this may be true, the FTIR retrieval will not match the truth completely. For readability we assume that the FTIR retrieval indeed gives a better representation of the truth, and in the next sections we will describe the case in which we apply the FTIR observational operator to the CrIS values. For the tenacious reader we included a similar set of results in the appendix, using the CrIS observational operator instead of the FTIR observational operator, as the assumption of the FTIR being true is not exactly right.

Total column comparison
The total columns are averaged as explained in Sect. 2.4 to show a direct comparison of FTIR measurements with CrIS observations in Fig. 2. A 3σ outlier filter was applied to calculate the regression statistics. The filtered outliers are displayed in grey, and may be caused by low information content (DOFS) and terrain characteristics. For the regression we used the reduced major axis regression (Bevington and Robinson, 1992), accounting for possible errors both in the x and y values. There is an overall agreement with a correlation of r = 0.77 (P < 0.01, N = 218) and a slope of 1.02 (±0.05). At the lower range of values the CrIS column totals are higher than the observed FTIR values. The CrIS retrieval possibly overestimates due to the low sensitivity to low concentrations. Without the sensitivity the retrieval will find a value more closely to the a priori, which may be too high.  Figure 3 shows the comparisons at each station. When the comparisons are broken down by station (Fig. 3), the correlation varies from site to site, from a minimum of 0.28 in Mexico City (possibly due to retrieval errors associated with the highly irregular terrain) to a maximum of 0.84 in Bremen. Similarly to Mexico City the comparison also shows an increase in scatter for Pasadena, where the FTIR site is also located on a hill. In Toronto and Bremen there is good agreement when NH 3 is elevated (> 20 × 10 15 molecules cm −2 ), and low bias in the CrIS total columns for intermediate values (between 10 and 20 × 10 15 molecules cm −2 ) except for the outlying observation in Bremen, which is marked as an outlier by our 3σ filter used for Fig. 2. In Wollongong, there is less agreement between the instruments. There are two comparisons with large CrIS to FTIR ratios while most of the other comparisons also show a bias for CrIS. For both cases the bias can be explained by the heterogeneity of the ammonia concentrations in the surrounding regions. The two outlying observations were made during the end of November 2012, which coincides with wildfires in the surrounding region. Furthermore the Wollongong site is located on the coast, which will increase the occurrences in which one instrument observes clean air from the ocean while the other observes inland air masses.
The mean absolute (MD) and relative difference (MRD) are calculated following Eqs. (4) and (5); with N being the number of observations. We evaluate the data by subdividing the comparisons over a set of total column bins as a function of the FTIR total column value of each individual observation. The bins (with a range of 5 × 10 15 to 25 × 10 15 molecules cm −2 with iterations steps of 5 × 10 15 molecules cm −2 ) give a better representation of the performance of the retrieval as it shows the influence of the retrieval as a function of the magnitude of the total column densities. The results of these total column comparisons are presented in Fig. 4. Table 3 summarizes the results for each of the FTIR to satellite column comparisons into two total column bins, which splits the comparisons between smaller and larger than 10 × 10 15 molecules cm −2 . A few combinations of the IASI-NN and FTIR retrievals have a small denominator value that causes problems in the calculation of the MRD. A 3σ outlier filter based on Table 3. Results of the total column comparisons of the FTIR to CrIS, FTIR to IASI-LUT and FTIR to IASI-NN. N is the number of averaged total columns, MD is the mean difference [10 15 molecules cm −2 ], MRD is the mean relative difference [frac, in %]. Take note that the combined value N does not add up with all the separate sites as observations have been included for FTIR total columns > 5 × 10 15 molecules cm −2 .

Retrieval
Column total range N MD in 10 15 MRD in % FTIR mean in in molecules cm −2 (1σ ) the relative difference is applied to remove these outliers (< 10 × 10 15 molecules cm −2 , only the IASI-NN set). The statistical values are not given separately by site because of the low number of matching observations for a number of the sites. The CrIS/FTIR comparison results show a large positive difference in both the absolute (MD) and relative (MRD) for the smallest bin, (5.0-10.0 × 10 15 molecules m −2 ). The rest of the CrIS/FTIR comparison bins with NH 3 values > 10.0 × 10 15 agree very well with a nearly constant bias (MD) around zero, and a standard deviation of the order of 5.0 × 10 15 , which slightly dips below zero in the middle bin. The standard deviation over these bins is also more or less constant, and the weak dependence on the number of observations in each bin indicates that most of the effect is coming from the random error on the observations. The relative difference becomes systematically smaller with in-creasing column total amounts, and tends towards zero with a standard deviation ∼ 25-50 %, which is on the order of the reported estimated errors of the FTIR retrieval (Dammers et al., 2015).
For a comparison with previous reported satellite results, we included both the IASI-LUT (Van Damme et al., 2014a) and the IASI-NN (Whitburn et al., 2016) comparisons with the FTIR observations. To put the results of this study into perspective of the IASI-LUT and IASI-NN products we added Fig. A1 to the Appendix, which shows the total column comparison for both products. The IASI products show similar differences as a function of NH 3 column bins, which is somewhat different from the CrIS/FTIR comparison results. The absolute difference (MD) is mostly negative with the smallest factor for the smallest total column bin, with a difference around −2.5 × 10 15 (std = ±3.0 × 10 15 , N = 229) molecules cm −2 , which slowly increases as a func-tion of the total column. However, the relative difference (MRD) is at its maximum for the smaller bin with a difference of the order −50 % (std = ∼ ±50 %, N = 229) which decreases to ∼ −10-25 % (std = ±25 %) with increasing bin value. For both the IASI-NN and IASI-LUT retrievals we find an underestimation of the total columns, which originates mostly from a large systematic error in combination with more randomly distributed error sources such as the instrument noise and interfering species, which are similar to results reported earlier for IASI-LUT (Dammers et al., 2016b).
A number of factors, besides the earlier reported FTIR uncertainties, can explain the differences between the FTIR and CrIS measurements. The small positive bias found for CrIS points to a small systematic error. The higher SNR, from both the low radiometric noise and high spectral resolution, enables it to resolve smaller gradients in the retrieved spectra, which can potentially provide greater vertical information and detect smaller column amounts (lower detection limit). This could explain the larger MRD and MD CrIS differences at the lower end of the total column range. However, a number of standalone tests with the FTIR retrieval showed only a minor increase in the total column following a decrease in spectral resolution, which indicates that the spectral resolution itself is not enough to explain the difference.

Profile comparison
The CrIS-satellite-and FTIR-retrieved profiles are matched using the criteria specified above in Table 2 and compared. It is possible for a CrIS observation to be included multiple times in the comparison as there can be more than one FTIR observation per day, and/or, the possibility of multiple satellite overpasses that match a single FTIR observation.

A representative profile example
An example of the profile information contained in a representative CrIS and FTIR profile is shown in Fig. 5. Although the vertical sensitivity and distribution of NH 3 differs per station this is fairly representative. The FTIR usually has a somewhat larger DOFS of the order of 1.0-2.0, mostly depending on the concentration of NH 3 compared to the CrIS total of ∼ 1 DOFS. Figure 5a shows an unsmoothed FTIR averaging kernel [vmr vmr −1 ] of a typical FTIR observation. The averaging kernel (AVK) peaks between the surface and ∼ 850 hPa, which is typical for most observations. In specific cases with plumes passing over the site, the averaging kernel peak is at a higher altitude, matching the location of the NH 3 plume. The CrIS averaging kernel (Fig. 5b) usually has a maximum somewhere in between 680 and 850 hPa depending on the local conditions. This particular observation has a maximum near the surface, an indication of a day with high thermal contrast. Both the FTIR and CrIS concentration profiles have a maximum at the surface with a con-tinuous decrease that mostly matches the a priori profile in a shape following the low DOFS. This is visible for layers at the lower pressures (higher altitudes) where the FTIR and CrIS a priori and retrieved volume mixing ratios become similar and near zero. The absolute difference between the FTIR and CrIS profiles can be calculated by applying the FTIR observational operator to the CrIS profile, as we described in Sect. 2.5. The largest absolute difference (Fig. 5d) is found at the surface, which is also generally where the largest absolute NH 3 values occur. The FTIR smoothed relative difference (red, striped line) peaks at the pressure where the sensitivity of the CrIS retrieval is highest (∼ 55 %), which goes down to ∼ 20-30 % for the higher altitude and surface pressure layers. Overall the retrievals agree with most of the difference explained by the estimated errors of the individual retrievals. For an illustration of the systematic and random errors on the FTIR and CrIS profiles shown in Fig. 5; see the figures in the Appendix. For the FTIR error profile see Fig. A2 (absolute error) and Fig. A3 (relative error) and for the CrIS measurement error profile see Fig. A4. Please note that we only show the diagonal error covariance values for each of the errors, which is common practice. The total column of our example profile is ∼ 20 × 10 15 molecules cm −2 which is a slightly larger value than average. The total random error is < 10 % for each of the layers, mostly dominated by the measurement error, which is somewhat smaller than average (Dammers et al., 2015) following the larger NH 3 VMR. A similar value is found for the CrIS measurement error with most layers showing an error < 10 %. The FTIR systematic error is around ∼ 10 % near the surface and grows to 40 % for the layers between 900 and 750 hPa. The error is mostly due to the errors in the NH 3 spectroscopy (Dammers et al., 2015). The shape of the relative difference between the FTIR and CrIS closely follows the shape systematic error on the FTIR profile, pointing to that error as the main cause of difference.

All paired data
In Fig. 6 all the individual site comparisons were merged. The Mexico City site was left out of this figure because of the large number of observations in combination with a difference in pressure grid due to the high altitude of the city obscuring the overall analysis and biasing the results towards the results of one station. Similar to the single profile example, the FTIR profile peaks near the surface for most observations, slowly going towards zero with decreasing pressure. When compared to the representative profile example a number of differences emerge. A number of FTIR observations peak further above the surface and are shown as outliers, which drag the mean further away from the median values. The combined CrIS profile in Fig. 6 shows a similar behaviour, although for the lowest pressure layer it has a lower median and mean compared to the layer above. The difference between Figs. 5 and 6e derives mostly from the number (c) shows the retrieved profiles of both FTIR (blue) and CrIS (cyan) with the FTIR values mapped to the CrIS pressure layers. Also shown are the FTIR a priori (green), the CrIS a priori (purple), the CrIS-retrieved profile smoothed with the FTIR averaging kernel [CrIS (FTIR AVK)] (yellow) and the FTIR profile smoothed with the CrIS averaging kernel [FTIR (CrIS AVK)](red). In (d), the blue line is the absolute difference between the FTIR profile (blue, c) and the CrIS profile smoothed with the FTIR averaging kernel (yellow, c) with the red line as the corresponding relative difference. of observations used in the box plot, many with weak sensitivity at the surface. Similar to the single profile example in Fig. 5, the FTIR averaging kernels in Fig. 6c peak on average near or just above the surface (with the diagonal elements of the AVK's shown in the figure). The sensitivity varies a great deal between the observations as shown by the large spread of the individual layers. The CrIS averaging kernels (Fig. 6g) usually peak in the boundary layer around the 779 hPa layer with the two surrounding layers having somewhat similar values. The instrument is less sensitive to the surface layer as is demonstrated by the large decrease in the AVK near the surface, but this varies depending on the local conditions. We find the largest absolute differences in the lower three layers, as was seen in the example in Fig. 5, although the differences decrease rather than increase. The relative difference shows a similar shape to Fig. 5. Overall both retrievals show agreement. The relative differences in the single-level retrieved profile values in Fig. 6h show an average difference in the range of ∼ 20 to 40 % with the 25th and 75th percentiles at around 60-80 %, which partially follows from our large range of concentrations. The absolute difference shows an average difference in the range of −0.66 to 0.87 ppb around the peak sensitivity levels of the CrIS observations (681 to 849 hPa). The lower number of surface observations follow on from the fact that only the Bremen site is located at an altitude low enough for the CrIS retrieval to provide a result at this pressure level. Due to this difference in retrieval layering, the remaining 227 observations mostly follow from matching observations in Bremen, which is located in a region of significant NH 3 emissions. The switch between negative and positive values in the absolute difference (see Fig. 6d) occurs in the two lowest layers dominated by the Bremen observations and provides insight into the relation between absolute differences as a function of retrieved concentration. Figure 7 shows a summary of the differences as a function of the individual NH 3 VMR layer amounts. As seen before in the column comparison, e.g. Figs. 2 and 4, the CrIS retrieval gives larger total columns than the FTIR retrieval for the small values of VMR. For increasing VMRs, this slowly tends to a negative absolute difference with a relative difference in the range of 20-30 %. However, note that the number of compared values in these high VMR bins are by far lower than in the first three bins leading to a relatively smaller effect in the total column and merged VMR figures (Figs. 2 and 6) from these high VMR bins. We now combine the results of Figs. 6 and 7 with Fig. 8 to create a set of subplots showing the difference between both retrieved profiles as a function of the maximum VMR of each retrieved FTIR profile. For the layers with pressure less than 681 hPa we generally find agreement, which is expected but not very meaningful, since there is not much NH 3 (and thus sensitivity) in these layers and any differences are smoothed out by the application of the observational operator. The relative differences for these layers all lie around ∼ 0-20 %. For the lowest two VMR bins we again find that CrIS gives larger results than the FTIR, around the CrIS sensitivity peak in the layer centred around 849 hPa, and to a lesser extent in the layer below. At these VMR levels (< 2 ppb) the NH 3 signal approaches the spectral noise of the CrIS measurement, making the retrievals more uncertain. The switch lies around 2-3 ppb, where the difference in the SNR between the instruments becomes less of an issue. Also easily observed is the relation between the concentration and the absolute and relative differences. This can be explained by the difference in sensitivity of the instruments, and the measurement noise of both instruments. For the largest VMR bin [> 4.0 ppb] we find that CrIS is biased for the four lowest layers. Differences are largest in the surface layer where only a few observations are available, almost all from the Bremen site. Most of these CrIS observations have a peak satellite sensitivity at a higher altitude than the FTIR. Assuming that most of the NH 3 can be found directly near the surface, with the concentration dropping off with a sharp gradient as a function of altitude, it is likely that these concentrations are not directly observed by the satellite but are observed by the FTIR instruments. This difference in sensitivity should be at least partially removed by the application of the observational operator but not completely, due to the intrinsic differences between both retrievals. The CrIS retrieval uses one of three available a priori profiles, which Figure 7. Summary of the absolute and relative actual error as a function of the VMR of NH 3 in the individual FTIR layers. The box edges are the 25th and 75th percentiles, the black line in the box is the median, the red square is the mean, the whiskers are the 10th and 90th percentiles, and the grey circles are the outlier values outside the whiskers. Only observations with a pressure greater than 650 hPa are used. The top panel shows the absolute difference for each VMR bin, the bottom panel shows the relative difference for each VMR bin. is chosen following a selection based on the strength of NH 3 signature in the spectra. The three a priori profiles (unpolluted, moderately polluted and polluted) are different in both shape and concentration. Out of the entire set of 2047 combinations used in Fig. 8, only six are from the non-polluted a priori category. About one-third of the remaining observations use the polluted a priori, which has a sharper peak near the surface (see Fig. 5c) compared to the moderately polluted profile, which is used by two-thirds of the CrIS retrievals shown in this work. Based on the results as a function of retrieved VMR (as measured with the FTIR so not a perfect restriction), it is possible that the sharper peak at the surface as well as the low a priori concentrations are restricting the retrieval. The dependence of the differences on VMR can also possibly follow from uncertainties in the line spectroscopy. In the lower troposphere there is a large gradient in pressure and temperature and the impact of any uncertainty in the line spectroscopy is greatly enhanced. Even for a day with large thermal contrast and NH 3 concentrations (e.g. Fig. 5), the difference between both the CrIS and FTIR retrievals was dominated by the line spectroscopy. This effect is further enhanced by the higher spectral resolution and reduced instrument noise of the FTIR instrument, which potentially makes it more able to resolve the line shapes.
To summarize, the overall differences between both retrievals are quite small, except for the lowest layers in the NH 3 profile where CrIS has less sensitivity. The differences mostly follow the errors as estimated by the FTIR retrieval and further effort should focus on the estimated errors and uncertainties. A way to improve the validation would be to add a third set of measurements with a better capability to vertically resolve NH 3 concentrations from the surface up to ∼ 750 hPa (i.e. the first 2500 m). One way to do this properly is probably by using airplane observations that could measure a spiral around the FTIR path coinciding with a CrIS overpass. The addition of the third set of observations would improve our capabilities to validate the satellite and FTIR retrievals and point out which retrieval specifically is causing the absolute and relative differences at each of the altitudes.

Conclusions
Here we presented the first validation of the CrIS-NH 3 product using ground-based FTIR-NH 3 observations. The total column comparison shows that both retrievals have a correlation of R = 0.77 (P < 0.01, N = 218) and almost no bias with an overall slope of 1.02 (std = ±0.05). For the individual stations we find varying levels of agreement, mostly lim- Figure 8. Summary of differences as a function of maximum volume mixing ratio (VMR). The maximum VMR of each FTIR profiles is used for the classification. Absolute (a) and relative profile differences (b) following the FTIR and CrIS (FTIR AVK applied) profiles. Observations are following pressure layers, i.e. the midpoints of the CrIS pressure grid. The box edges are the 25th and 75th percentiles, the black line in the box is the median, the red square is the mean, the whiskers are the 10th and 90th percentiles, and the grey circles are the outlier values outside the whiskers.
ited by the small range of NH 3 total columns. For FTIR total columns > 10 × 10 15 molecules cm −2 the CrIS and FTIR observations are in agreement with only a small bias of 0.4 (std = ±5.3) × 10 15 molecules cm −2 , and a relative difference 4.57 (std = ± 35.8) %. In the smaller total column range the CrIS retrieval shows a positive bias with larger relative differences 49.0 (std = ± 62.6) % that mostly seem to follow from observations near the CrIS detection limit. The results of the comparison between the FTIR and the IASI-NN and IASI-LUT retrievals are comparable to those found in earlier studies. Both IASI products showed smaller total column values compared to the FTIR, with a MRD ∼ −35 -−40 %. On average, the CrIS retrieval has one piece of information, while the FTIR retrieval shows slightly more vertical information with DOFS in the range of 1-2. The NH 3 profile comparison shows similar results, with a small mean negative difference between the CrIS and FTIR profiles for the surface layer and a positive difference for the layers above the surface layer. The relative and absolute differences in the retrieved profiles can be explained by the estimated errors of the individual retrievals. Two causes of uncertainty stand out with the NH 3 line spectroscopy being the biggest factor, showing errors of up to 40 % in the profile example. The second factor is the signal-to-noise ratio of both instru-ments which depends on the VMR: under large NH 3 concentrations, the FTIR uncertainty in the signal is in the range of 10 %; for measurements with small NH 3 concentrations this greatly increases. Future work should focus on improvements to the NH 3 line spectroscopy to reduce the uncertainty coming from this error source. Furthermore an increased effort is needed to acquire coincident measurements with the FTIR instruments during satellite overpasses as a dedicated validation effort will greatly enhance the number of available observations. Furthermore, a third type of observation measuring the vertical distribution of NH 3 could be used for comparisons with both the FTIR and CrIS retrievals to further constrain the differences. These observations could be provided by an airborne instrument flying in spirals around an FTIR site during a satellite overpass.
Data availability. FTIR-NH 3 data (Dammers et al., 2015) can be made available on request (M. Palm, Institut für Umweltphysik, University of Bremen, Bremen, Germany). The CrIS-FRP-NH 3 science grade (non-operational) data products used in this study can be made available on request (M. W. Shephard, Environment and Climate Change Canada, Toronto, Ontario, Canada). The IASI-NH 3 product is freely available at http://www.pole-ether.fr/etherTypo/ index.php?id=1700&L=1 (Van Damme et al., 2015a). Figure A1. Correlation between the FTIR and the IASI-LUT (a, blue) and IASI-NN (b, red) total columns using the coincident data from all measurement sites. The horizontal and vertical bars show the total estimated error on each FTIR and CrIS observation. A 3σ outlier filter was applied to the IASI-LUT data set and the same observations were removed from the IASI-NN set. In contrast to the earlier study by Dammers et al. (2016a) no thermal contrast filter was applied to the data set.     Fig. 6e and b. Each of the boxes edges are the 25th and 75th percentiles, the black lines in each box is the median, the red square is the mean, the whiskers are the 10th and 90th percentiles, and the grey circles are the outlier values outside the whiskers. Figure A6. Summary of the absolute and relative actual error as a function of the VMR of NH 3 in the individual FTIR layers. The box edges are the 25th and 75th percentiles, the black line in the box is the median, the red square is the mean, the whiskers are the 10th and 90th percentiles, and the grey circles are the outlier values outside the whiskers. Only observations with a pressure greater than 650 hPa are used. Panel (a) shows the absolute difference for each VMR bin, (b) shows the relative difference for each VMR bin. Figure A7. Summary of actual errors as a function of VMR. The maximum VMR of each FTIR profiles is used for the classification. Absolute (a) and relative profile differences (b) following the FTIR (CrIS AVK applied) and CrIS profiles. Observations are following pressure layers, i.e. the midpoints of the CrIS pressure grid. The box edges are the 25th and 75th percentiles, the black line in the box is the median, the red square is the mean, the whiskers are the 10th and 90th percentiles, and the grey circles are the outlier values outside the whiskers.