First intercalibration of column-averaged methane from the Total Carbon Column Observing Network and the Network for the Detection of Atmospheric Composition Change

. We present the ﬁrst intercalibration of dry-air column-averaged mole fractions of methane (XCH 4 ) retrieved from solar Fourier transform infrared (FTIR) measurements of the Network for the Detection of Atmospheric Composition Change (NDACC) in the mid-infrared (MIR) versus near-infrared (NIR) soundings from the Total Carbon Column Observing Network (TCCON). The study uses multi-annual quasi-coincident MIR and NIR measurements from the stations Garmisch, Germany (47.48 ◦ N, 11.06 ◦ E, 743 m a.s.l.), and Wollongong, Australia (34.41 ◦ S, 150.88 ◦ E, 30 m a.s.l.). Direct comparison of the retrieved MIR and NIR XCH 4 time series for Garmisch shows a quasi-periodic seasonal bias leading to a standard deviation (stdv) of the difference time series (NIR–MIR) of 7.2 ppb. After reducing time-dependent a priori impact by using realistic site- and time-dependent ACTM-simulated proﬁles as a common prior, the seasonal bias is reduced (stdv = 5.2 ppb). A linear ﬁt to the MIR/NIR scatter plot of monthly means based on same-day coincidences does not show a y -intercept that is statistically different from zero, and the MIR/NIR intercalibration factor is found to be close to ideal within 2- σ uncertainty, i.e. 0.9996(8). The difference time series (NIR–MIR) do not show a signiﬁcant trend. The same basic ﬁndings hold for Wollongong. In particular an overall MIR/NIR intercalibration factor close to the ideal 1 is found within 2- σ uncertainty. At Wollongong the seasonal cycle of methane is less pronounced and corresponding smoothing errors are not as signiﬁcant, enabling standard MIR and NIR retrievals to be used directly, without correction to a common a priori. Our results suggest that it is possible to set up a harmo-nized NDACC and TCCON XCH 4 data set which can be exploited for joint trend studies, satellite validation, or the inverse modeling of sources and sinks.

sinks are the uptake of methane by soils or the reaction with chlorine radicals (Denman et al., 2007).
Since the beginning of industrialization, methane concentrations in the atmosphere have more than doubled (e.g., Etheridge et al., 1998). However, there was a period of nearzero growth at the beginning of this century (Dlugokencky et al., 2003;Bousquet et al., 2006), and after 2006 the atmospheric methane concentration started to increase again (Rigby et al., 2008;Dlugokencky et al., 2009). The increase for the years 2007-2008 has been quantified, and possible causes discussed (e.g. Bousquet et al., 2011;. More recently, it has been shown via ground-based FTIR (Fourier transform infrared) methane column measurements that the renewed increase after 2006 has been ongoing for about ≈ 5 yr until the present (end of 2011) with a rate of ≈ 5 ppb yr −1 above northern mid-latitudes .
Ground-based column measurements of methane are complementary to in situ measurements in many respects; e.g. column measurements are representative of a larger geographical region (e.g. , while in situ measurements can represent a specific location or biome. Measured methane columns are impacted by the varying stratospheric contribution, while the interpretation of surface measurements to infer sources and sinks can be impacted by so-called rectifier effects resulting from errors in the transport modeling. Rectifier effects can be avoided if column measurements are used, because these are insensitive to vertical mixing (Gloor et al., 2000). In situ measurements are directly traceable to calibration standards, while ground-based column measurements can be traced back to such standards via aircraft calibration campaigns. Column measurements are preferred for satellite validation since they provide the same quantity as satellites measure.
There are two established global networks performing ground-based remote sensing measurements of columnintegrated methane. Within the Network for the Detection of Atmospheric Composition Change (NDACC , http://www. ndacc.org) solar FTIR measurements in the mid-infrared (MIR) have been performed for about two decades (currently 22 stations). Retrievals of methane from NDACC-MIR spectra have been used for trend studies (Angelbratt et al., 2011;Sussmann et al., 2012) and satellite validation (e.g. Sussmann et al., 2005). Since 2004 the NDACC has been complemented by the Total Carbon Column Observing Network (TCCON, http://www.tccon.caltech.edu/), which is dedicated to high-precision retrievals of climate gases (e.g. CO 2 , CH 4 , N 2 O) from solar absorption spectra in the nearinfrared (NIR) spectral region (Wunch et al., 2011a). TC-CON has been used for the validation of models (Houweling et al., 2010) and satellite measurements of methane (e.g. Morino et al., 2011;Schneising et al., 2012), but also for deriving information on sources and sinks of greenhouse gases (e.g. Wunch et al., 2009;Chevallier et al., 2011;Keppel-Aleks et al., 2012). The TCCON measurements are calibrated against the World Meteorological Organization (WMO) in situ trace gas measurement scales, using profiles obtained by aircraft in situ measurements flown over TC-CON sites (Washenfelder et al., 2006;Deutscher et al., 2010;Wunch et al., 2010;Messerschmidt et al., 2011;Geibel et al., 2012). Currently, there are 18 operational TCCON stations, most of which have been established during the last couple of years.
If a sufficiently precise and accurate relationship can be established between the NDACC and TCCON columnaveraged dry-air mole fractions of methane, then data from the two networks could be combined to provide wider spatial and temporal coverage than either network individually. This is not only an advantage for satellite validation but also provides the opportunity for trend analysis dating back 15 yr before TCCON operations began. It is, therefore, the goal of this study to establish the NDACC-TCCON intercalibration for XCH 4 . An important question in this context is whether or not one overall intercalibration factor for all stations can be found and quantified, or whether a site-and time-dependent intercalibration parameterization, with a significant linear and/or seasonal component, is necessary.
Our paper is structured as follows: After introducing the participating FTIR sites and their measurement settings in Sect. 2 along with the MIR and NIR retrieval strategies, we describe our intercomparison method (Sect. 3). The results are shown in Sect. 4. Section 5 gives a summary and Sect. 6 the conclusions with recommendations on the joint use of the MIR and NIR data along with an outlook.
2 Ground-based sounding of columnar methane in the MIR and NIR

Garmisch FTIR soundings
The Garmisch solar FTIR system (47.48 • N, 11.06 • E, 743 m a.s.l.) is operated by the group "Variability and Trends" at the Institute for Meteorology and Climate Research, Karlsruhe Institute of Technology, Germany. Operation of a Bruker IFS125HR interferometer was initiated in 2004 as part of TCCON, and the system took part in the aircraft calibration campaign of the EU project IMECC (Infrastructure for Measurement of the European Carbon Cycle) Geibel et al., 2012). Columnaveraged methane is retrieved from single-scan measurements in the NIR (see Table 1 for the spectral micro windows) recorded with an InGaAs diode using a maximum optical path difference of 45 cm. The FTIR system also performs NDACC-type measurements in the MIR (Table 1)  Garmisch FTIR have been used for satellite validation (de Laat et al., 2010;Morino et al., 2011;Wunch et al., 2011b), carbon cycle research , and studies of atmospheric variability and trends (e.g., Borsdorff and Sussmann, 2009;Sussmann et al., 2011). The intercalibration uses the Garmisch time series of July 2007-December 2011 which comprises 3403 MIR spectra and 35 171 NIR spectra.

Wollongong FTIR soundings
The Wollongong  (Griffith et al., 1998). It was replaced in 2007 with a Bruker IFS 125HR instrument set up for measurements in both the MIR and the NIR spectral ranges (Jones et al., 2013;Wunch et al., 2011a). For this study only the Bruker data were used. Spectra in the MIR range are recorded with an InSb detector, using an optical path difference of 257 cm and averaging two successive scans with an integration time of approximately four minutes. The settings for the NIR measurements are identical to those at Garmisch. The intercalibration uses the Wollongong time series of June 2008-December 2011 which comprises 1405 MIR spectra and 15 787 NIR spectra.

MIR and NIR retrieval strategies
The codes SFIT (MIR) and GFIT (NIR) have common roots as to the ray tracing and forward model; however, the inverse models are different. For the retrieval of XCH 4 from NDACC-type MIR measurements the retrieval strategy MIR-GBM v1.1  is used in this study along with the spectralfitting software SFIT2 ver. 3.94 (Pougatchev et al., 1995). The basic features of MIR-GBM v1.1 are given in Table 1. SFIT is set up for a full profile retrieval via the use of a climatological covariance ("optimal estimation") or an inverse covariance, i.e. an ad hoc regularization matrix. The a priori volume mixing ratio (vmr) profiles used for SFIT, i.e. one fixed profile per site have, been derived from the Whole Atmosphere Chemistry Climate Model (WACCM; ; see Fig. 1 and Appendix B for details. For SFIT methane retrievals we found a Tikhonov-L1 regularization scheme to be favorable, with the regularization applied to an a priori profile given in relative units (per cent scale) and with an altitude-constant regularization strength . This is what we call the MIR-GBM v.1.1 retrieval strategy, and it includes the use of 4-times daily NCEP pressure/temperature/humidity profiles to calculate the dryair column, and 3 MIR spectral micro windows along with HITRAN 2000. The MIR retrievals are used as retrieved, i.e. they are not calibrated, e.g. to WMO/GAW trace gas measurement scales.
TCCON-type NIR measurements are analyzed with the spectral fitting software GFIT ver. 4.4.10 (release ggg 20091107) referred to as "GFIT" hereafter. The basic features of GFIT are given in Table 1, while more details can be found in Wunch et al. (2011a). GFIT uses an a priori profile derived from mid-latitude FTIR balloon measurements (Fig. 1a). Note there has been a recent GFIT update, i.e. ver. 4.8.6 (release ggg 2012 July Update) using site-and timedependent a priori profiles (see Fig. 1b and "Note: impact of GFIT 2012 update"). Column-averaged dry-air mole fractions are retrieved by scaling an a priori profile to provide the best fit to the measured spectra and, finally, by dividing these columns by the dry-air column. The dry-air column is directly derived from the simultaneously retrieved O 2 column. GFIT uses a broad spectral window including full bands in the NIR. The GFIT XCH 4 results are scaled by a calibration factor of 0.978 that has been obtained from coincident measurements with aircraft equipped with WMO-scale in situ instrumentation, and this bias is attributed to spectroscopy uncertainties . Note that a recent European aircraft campaign provided another calibration factor for XCH 4 ; see Geibel et al. (2012) for details. We use the Wunch et al. (2010) factor for this paper because it is the official factor used within TCCON for the time being. The averaging kernels for the NIR and MIR retrievals are given in Fig. 2.

Intercomparison method
Any direct comparison of two different remote sounders is potentially complicated because in general they contain a differing a priori impact, i.e. effects from (i) differing a priori profiles and (ii) differing smoothing effects because of differing averaging kernels influencing the retrieved trace gas column amounts. Therefore, our intercomparison strategy comprises (i) an approach for eliminating the impact from differing a priori profiles (Sect. 3.1) and (ii) a strategy for optimum selection of a common a priori profile model in order to minimize smoothing errors (Sect. 3.2). Finally, we investigate the impact from applying the strategies (i) and (ii) upon the time series (Sect. 3.3).

Eliminating the impact from differing a priori profiles
According to Rodgers (2000) the impact from differing a priori profiles can be taken into account by an a posteriori adjustment of the soundings for a common a priori profile x common . This approach has been applied recently for the comparison of carbon dioxide and methane columns measured by SCIAMACHY to ground-based FTIR measurements and to model results (Reuter et al., 2011;Schneising et al., 2012). In our case we obtain corrected column-averaged mole fractions c cor for the MIR or NIR soundings which can be directly compared: Here c represents the column-averaged mole fraction of methane retrieved from MIR or NIR spectra. For every model layer l the difference between 1 (i.e. the ideal averaging kernel) and the vector component a l of the total column averaging kernel in this layer is multiplied with the difference between the common a priori mole fraction x l common and the FTIR (MIR or NIR) a priori mole fraction x l a as well as with the pressure difference between the lower and upper boundaries of layer l; p 0 denotes the surface pressure.
Obviously, this correction can be neglected in cases of the averaging kernel being close to ideal or the a priori profile x a being close to x common . However, this is not the case in our application since the MIR and NIR a prioris and the MIR and NIR averaging kernels differ; see Figs. 1 and 2, respectively.
Equation (1) has been designed for post-retrieval exchange of an a priori profile. Therefore, in the ideal case, it should yield the same results as performing a retrieval after exchanging the a priori beforehand. However, Eq. (1) uses averaging kernels which are linear approximations of the retrieval which is non-linear in x. We show in Appendix A that this non-linearity is small and negligible within the context of this paper. Therefore, we will be able to use in this paper retrievals re-run after exchanging the a priori beforehand, along with retrievals corrected a posteriori via Eq. (1). If the latter are exploited, the reason has been to save computation efforts.

Strategy for selecting a common a priori
After correction to a common a priori x common , there is still the smoothing term (1 − a l ) (x l common − x l true ). This smoothing term varies seasonally because of the zenith angle dependency of the averaging kernels ( Fig. 2). Also the magnitude of the smoothing term is different for MIR and NIR because of the differing averaging kernels. Our strategy to minimize this difference is to use time-dependent and site-dependent profiles x common (t, lat, lon) that are as close as possible to x true (t, lat, lon) at a site at the moment of observation.
Therefore, we favor the use of ACTM CH 4 model profiles for each site as common a priori; see Fig. 1 and Appendix B for details. Briefly, ACTM-simulated vertical profiles of dry-air mole fractions on the native model vertical grid and nearest horizontal grid of the FTIR sites are sampled at 3-hourly intervals for use as a priori in this study. We interpolated the model profiles for each measurement time on the model pressure grid and applied this interpolated profile. Another favorable choice (especially for Wollongong) is the use of the MIR retrieval a priori which is a time-constant but site-dependent prior x common (lat, lon) derived from the WACCM model. See also Appendix B for a description of how the WACCM-based prior has been set up.
The benefit of using ACTM will be demonstrated later in quantitative terms; i.e. we will find a smaller seasonal bias between MIR and NIR retrievals using ACTM profiles as x common compared to two possible other ad hoc choices for x common , namely using the time-constant (MIR or NIR) retrieval a prioris. To show this, the following 4 cases will be investigated in parallel: (i) using the original MIR and NIR aprioris, (ii) using time-dependent ACTM profiles as common prior x common , (iii) using the constant MIR retrieval a priori as x common , and (iv) using the constant NIR retrieval a priori as x common .

Impact of varied a priori profiles on the time series
For the intercomparisons we use monthly means calculated from individual MIR and NIR measurements recorded on the same days. Only months with > 5 measurements have been included.
An example for the bias and the seasonal variation induced by changing an a priori profile is visualized in Fig. 3. It shows the impact on the Garmisch NIR time series from changing the standard GFIT a priori profile to ACTM profiles ( Fig. 3a). An insignificant bias results (−0.27 ± 0.58 ppb) along with a significant change of the seasonal cycle (difference time series with stdv = 2.1 ppb). The analogous plot for Wollongong (Fig. 3b) shows a similar change in seasonality (stdv = 2.8 ppb) along with a larger, significant bias (−5.04 ± 1.07 ppb). The latter may be understood by the larger overall discrepancy between the GFIT a priori profile and the ACTM profiles at Wollongong compared to the Garmisch case; see Fig. 1. Figure C1 shows analogous plots for all the other cases with exchanged prior for Garmisch and Wollongong. Numbers are listed in Table 2. Each exchange causes a bias and a change in seasonality. The impact on seasonality tends to be larger for the cases where the original a priori profile is replaced by time-dependent ACTM profiles compared to the other cases. This is because in the cases where one of the two constant retrieval a priori profiles is used as common prior, the seasonal variation of the correction term is only driven by changes in the averaging kernels as a function of zenith angle. This can be seen from Table 2, e.g. stdv = 1.7 ppb for Garmisch MIR retrieved with GFIT a priori compared to stdv = 4.7 ppb for the retrieval based on ACTM, or stdv = 0.9 ppb for Wollongong NIR retrieved with WACCM a priori compared to stdv = 2.8 ppb for the retrieval based on ACTM. Figure 4a shows a scatter plot of the NIR and MIR monthly means as retrieved with the original a prioris for Garmisch and Wollongong, respectively. Error bars on data points are 2-σ uncertainties derived from the stdv of the linear slope fit Table 2. Impact of varied a priori profiles on mean XCH 4 level retrieved in the NIR and MIR, and stdv of differences (retrieval with new a priori -retrieval with original a priori). Numbers are for monthly means constructed from same-day measurement coincidences. Uncertainties are 2 times the standard errors of the mean (2-σ /sqrt (n)). mean difference stdv of differences n, number of (retrieved with new a (retrieved with new a coincident priori -retrieved with priori -retrieved with monthly original a priori) original a priori) data set new a priori means (ppb) (2 stdv/ √ 2). (Remark: we used this way of obtaining error bars because they reflect both the statistical uncertainty of the individual monthly means originating from the scatter of the retrievals and systematic errors of the monthly means due to errors in the seasonality. We found that the latter (systematic) error contribution is the dominant one: calculating the stdv of the monthly means directly from the retrievals gave significantly smaller numbers; i.e. retrieval scatter is not the dominant source of uncertainty. Furthermore, this (insignificant) uncertainty of the monthly means from the retrieval scatter changes strongly from month to month, because of the varying number of available measurements. Therefore, we did not use the scatter of the retrievals for weighting the individual monthly means during the slope fits.) Uncertainties for the slopes are derived from the fit and are at 2-σ .

Direct comparison
The linear MIR/NIR slopes (obtained from linear fits forced through zero) are not significantly different from 1 for both stations, i.e. 0.9998(11) for Garmisch and 0.9987 (16) for Wollongong. In other words, there is no evidence from the direct comparison that an intercalibration of the MIR and NIR data sets would be required before using them together. This will be shown and discussed in more detail in the correlation analysis of Sect. 4.3 (along with the other cases where common a prioris are used for the NIR and MIR data). Figure 5a shows the same MIR and NIR monthly mean data as time series. It can be seen that the MIR and NIR seasonalities differ significantly (stdv = 7.2 ppb for the difference time series shown in the upper trace). An analogous plot for Wollongong can be found in Appendix C (Fig. C2c).

Comparison with common a priori: analysis of seasonality
Figure 5b show both NIR and MIR time series, but now retrieved using ACTM profiles as common a priori as described in Sect. 3. By comparison to the original time series (Fig. 5a) it can be seen that the exchange of the a priori profiles affects the MIR retrievals in a different way than the NIR retrievals. This is because of the differing original a priori profiles ( Fig. 1) and the differing averaging kernels (Fig. 2).

Stdv of NIR-MIR difference time series
The effect of using the common ACTM a priori is that the seasonality of the MIR and NIR XCH 4 time series are in better agreement: the stdv of the difference time series NIR-MIR has been 7.2 ppb for the original time series (Fig. 5a). After using the common ACTM a priori (Fig. 5b) the stdv of the difference time series is reduced to 5.2 ppb. Analogous plots for Wollongong can be found in Appendix C: here, the original stdv of 7.1 ppb (Fig. C2c) is reduced to stdv = 6.6 ppb (Fig. C2d) if ACTM profiles are used. Obviously, the reduction of stdv's by use of the time-dependent ACTM prior is smaller for Wollongong than for Garmisch. This may be understood by the fact that the Southern Hemisphere seasonal cycle (Wollongong) is less pronounced compared to the Northern Hemisphere cycle at Garmisch -and because of this reason the use of the time-constant original prior is a better approximation for Wollongong than for Garmisch. Figure  using ACTM, one of the two retrieval a prioris (WACCM or GFIT) has been used as common a priori profile: e.g. the original stdv of 7.2 ppb for Garmisch (Fig. 5a) is only reduced to stdv = 6.5 ppb (Fig. C2a) if the WACCM a priori profile is used, and it is reduced to 6.2 ppb if the GFIT a priori is used as a common prior (Fig. C2b). Obviously, the reduction of stdv's is smaller for the cases using one of the constant retrieval a prioris as common prior compared to the ACTM cases. This confirms what has been postulated in Sect. 3.2, namely that the seasonally varying smoothing term can be minimized by using the more realistic ACTM model as common prior.

NIR-MIR cross-correlation
Now we use the concept of cross-correlation to characterize and quantify the difference in NIR and MIR seasonalities shown in Fig. 5a as well as the reduction of this difference by using a common prior; see Fig. 6. In a strict mathematical sense, the seasonalities of the NIR and MIR data retrieved with the original a priori (blue line in Fig. 6) cannot be described by a simple phase shift because (i) the maximum of the cross-correlation is at zero time delay, (ii) the recurrences are weaker than the central maximum, and (iii) both the central maximum and the maxima of the recurrences are altogether < 1. However, the cross-correlation does show periodic recurrences, and the wings of the maxima are asymmetric towards negative time delays of about 1 month at half maximum. This behavior can be interpreted as being similar to a phase shift, and we will use the term "seasonal bias" for this behavior in the following discussion. For the data based on the common ACTM a priori (red line in Fig. 6) two things have changed: (i) the asymmetry of the maximum is reduced, and (ii) the maximum cross-correlation has increased and is closer to 1. This means that the seasonal bias is reduced by the use of ACTM. Figure C3a and b show similar but weaker effects for the cases where either of the two retrieval a prioris is used as common prior: the increase of the maxima towards 1 is less pronounced. Figure C3c-e show the analogous cases for Wollongong. Obviously, compared to Garmisch there are nearly no recurrences, and in the cases with common a prioris (red lines) the value of the maximum cross-correlation is similar to the reference cases with original a priori (blue lines). This can be understood by the fact that the seasonal cycle of the Southern Hemisphere site Wollongong is much less pronounced compared to the Northern Hemisphere site Garmisch, and this is in line with the findings from our analysis of stdv's in the previous section.

Autocorrelation of NIR-MIR difference time series
Now we investigate the residual in Fig. 5b (stdv = 5.2 ppb) in more detail. An autocorrelation of this residual indicates that it is no white-noise residual but still contains some seasonality (blue line in Fig. 7). However, this seasonality has been reduced by the use of the common ACTM prior compared to the case with original a prioris. This can be seen via the larger-amplitude recurrences of the black line in Fig. 7 compared to the blue line. Figure 7 also shows that, for cases using either of the constant retrieval a prioris as common prior, the maxima of the recurrences are in between the original and the ACTM case (red and green lines in Fig. 7). This confirms once more that the ACTM prior does the best job in reducing the seasonal bias. Next we investigate the reason for the residual seasonality in Fig. 5b (stdv = 5.2 ppb). The question is whether one can understand the maxima of the corrected NIR-MIR differences (March 2008, March 2010, and March 2011 to be due to an SZA (airmass) dependency. We prepared coincidences now on a 10-min scale (our initial coincidences had been    Fig. 5b. We conclude that the observed small airmass dependency of the corrected NIR-MIR differences is not the dominant driver of their observed residual seasonality of Fig. 5b. From this we conjecture that the origin of this residual seasonality may be due to differences in the smoothing of x l ACTM − x l true for MIR and NIR retrievals (see Sect. 3.2 for a discussion of this smoothing term).

Trend of the NIR-MIR difference time series
Another finding from analyzing the difference time series NIR-MIR is that they do not show a significant trend; this is important for trend studies based on joint use of MIR and NIR data. The trends have been obtained by a linear fit to the monthly mean difference time series. See Table 3 for derived numbers on trends and uncertainties for both stations and all cases with different a prioris.

Comparison with common a priori: correlation analysis
The data sets for our correlation analysis are displayed via scatter plots of MIR and NIR monthly means: Fig. 4a shows the Garmisch and Wollongong case retrieved with the original retrieval a prioris, Fig. 4b with common ACTM prior, and Fig. 4c with common WACCM prior. Another case using the constant NIR (GFIT) retrieval a priori as common prior is given in Appendix C (Fig. C5). Table 4 gives an analysis of correlation significance via a ttest. The table shows numbers of Pearson's correlation coefficient r and the derived quality measure r √ ((n−2)/(1−r 2 )), where n is the number of coincident monthly means. Significant correlation is achieved if the quality measure exceeds the t-value. The numbers show for both Garmisch and Wollongong data a significant MIR-NIR correlation with > 99 % probability for all cases, even for the cases where the retrievals are based on the original a priori. However, the benefit of using the ACTM model as common prior can be seen via a significantly enlarged quality measure: for Garmisch, the quality measure increases from 10.12 to 15.27 if the ACTM is used instead of the original a priori; for Wollongong the quality measure is increased from 7.17 to 7.75. Obviously, the improvement of using ACTM is more pronounced for Garmisch compared to Wollongong. As discussed before, this can be interpreted as a more pronounced seasonal cycle at Garmisch. The other cases, using either of the two retrieval a prioris as common prior, only show weaker effects upon the quality measure compared to the reference case with original a prioris. This once more confirms the advantage of using ACTM as a common prior in terms of bringing the (pronounced Northern Hemisphere) seasonality into agreement.

Significance of intercept and slope
The NIR and MIR retrieval methods are predicted to be both linear and have no intercept. If we apply least squares Atmos. Meas. Tech., 6, 397-418  fits allowing for nonzero intercepts to the Wollongong and Garmisch data sets, the fits yield intercepts that are relatively large (typically 200 ppb or ≈ 10 % of the XCH 4 values), but these are for all cases not significant within 2-σ uncertainty; see Table 5. This is a direct consequence of the relatively small dynamical range of XCH 4 of ≈ 3 % (Fig. 4). Because of this situation we decided to perform fits with zero intercept, as concluded earlier by Wunch et al. (2010) in an analogous case. The slopes obtained from fits forced through zero are given in Table 5 as well. For the majority of cases (5 out of 8) the XCH 4 intercalibration factors (i.e. slopes MIR/NIR) do not differ significantly from 1 within 2-σ uncertainty. This holds for both Garmisch and Wollongong MIR and NIR data retrieved with the original a prioris (slope 0.9998(11) or 0.2 per mille relative difference for Garmisch, slope 0.9987(16) or 1.3 per mille rel. difference for Wollongong), as well as for Garmisch and Wollongong data retrieved with the common WACCM prior (slope 0.9994(10) or 0.6 per mille rel. difference, and slope 0.9996(16) or 0.4 per mille rel. difference, respectively), and also for Garmisch data retrieved with common ACTM prior (slope 0.996(8) or 0.4 per mille rel. difference). There are 3 cases where we also find slopes close to 1, however, with small deviations from 1 just above (2σ ) significance level (Table 5): for Garmisch data retrieved with common GFIT a priori we find a slope of 0.9980(10), for Wollongong data corrected to the common ACTM prior 1.0026(15), and for Wollongong data retrieved with common GFIT prior we find a slope of 1.0019(17). The slopes of these 3 cases correspond to differences in XCH 4 of 3.6-4.8 ppb or 1.9-2.6 per mille. Although these NDACC-TCCON differences are significant within 2-σ , we want to note that they are relatively small, i.e. even smaller than the TCCON target accuracy of 3 per mille.

Summary on the intercalibration of NDACC and TCCON XCH 4 data
We conclude from the previous sections (in particular, Table 5) that the direct comparison of the original Garmisch and Wollongong MIR and NIR data sets as retrieved shows a very good overall agreement within the error bars: slope 0.9998(11) or relative difference 0.2 per mille for Garmisch, and slope 0.9987(16) or relative difference 1.3 per mille for Wollongong. That is, we do not find the need for applying an overall MIR/NIR intercalibration factor. However, the Garmisch MIR and NIR time series based upon the original retrieval a prioris do contain a significant seasonal bias, which appears to be dominated by the differing a priori profiles and averaging kernels of the MIR and NIR retrievals. It was shown that this seasonal bias can be significantly reduced by implementing the same a priori for the MIR and NIR data sets. This common a priori should ideally be based on a realistic site-specific and time-dependent model. This approach allows for the reduction of the differing smoothing errors due to the differing averaging kernels leading to better agreement of the MIR and NIR seasonal cycles. The impact of this is stronger for Garmisch with its more pronounced (Northern Hemisphere) seasonal cycle compared to Wollongong. As outlined in the previous chapters the best choice for Garmisch is the one with ACTM as common prior (MIR/NIR slope = 0.9996(8), stdv = 5.2 ppb). In Fig. 8a such a joint (NIR plus MIR) data set is shown for Garmisch; the monthly means have been constructed from the columns retrieved from the individual MIR and NIR spectra recorded within this month, each column weighted by the number of scans per spectrum. For Wollongong, MIR and NIR data agree well with original a prioris (slope = 0.9987(16), stdv = 7.1 ppb); see Fig. 8b for the joint (MIR plus NIR) data set. The advantage of using the common ACTM prior is less prominent in terms of MIR/NIR stdv (i.e. 6.6 ppb) due to the weaker seasonal cycle (compared to Garmisch). Another fact is that for the Wollongong ACTM case there is this small but significant deviation from the ideal intercalibration factor 1, i.e. 1.0026(15). Therefore, a recommended alternative for Wollongong would be to use the common WACCM prior leading to a close-to-ideal slope of 0.9996(16), although the stdv is slightly increased (7.3 ppb). The joint data set based on the WACCM option is displayed in Fig. 8c. Note that there are practically no differences between Fig. 8b and c.

Conclusions on joint use of NDACC and TCCON XCH 4 data
It has been shown recently that the MIR XCH 4 data can be used as retrieved for trend studies, if such studies are based on de-seasonalized data . On the other hand we have shown in this paper that in general the information content and smoothing errors of the NIR and MIR retrievals can be significantly different, leading to differing seasonalities. Therefore, the use of these data sets for satellite validation or flux inversions would need to take the a priori profiles and averaging kernels of the retrievals into account. The use of a joint NDACC and TCCON data set for satellite validation would ideally be performed using satellite data based on the same common realistic (model) a priori as used for the NIR and MIR ground FTIR data. This can either be done by reprocessing the satellite data with the common a priori or, with less effort, by using Eq. (1).
In future work we will apply the concepts introduced in this study to all other existing stations that perform coincident MIR and NIR soundings of column-averaged methane. The goal is to further confirm or refine the intercalibration behavior found in this work.
Finally, we investigated the recent MIR retrieval update (GFIT ver. 4.8.6, release ggg 2012 July Update; see "Note: impact of GFIT 2012 update"). We found that, using GFIT 2012, the slopes for the direct NIR-MIR comparison are again not significantly different from 1, as found previously using GFIT 2009. However, GFIT 2012 is based upon a more realistic (i.e. site-and time-dependent) set of a priori profiles. Figure 1b shows that these are quite similar to the ACTM profiles (Fig. 1a). We conjecture that the new GFIT 2012 a priori profiles should be a good choice for use as a common priori in order to minimize the impact from differing a priori profiles and smoothing errors for the purpose of joint NDACC and TCCON studies and satellite validation.

Note: impact of GFIT 2012 update
After completion of this work a new official release of the GFIT (NIR) retrieval software has become available and been released (GFIT ver. 4.8.6, release ggg 2012 July Update). The main change relative to the GFIT version used in our paper (GFIT ver. 4.4.10, release ggg 20091107) has been that the (one) a priori profile used for all sites is now being corrected for the actual tropopause altitude on a per-day and a per-site basis; see Fig. 1b. Figure 9a shows that the impact of this update is negligible in terms of the questions investigated in our paper; i.e. the bias (GFIT 2012 minus GFIT 2009) is only −0.3 (±0.09) ppb for Garmisch and no significant additional seasonality is introduced (difference time series with stdv = 0.3 ppb). Also for Wollongong only a small impact is found in the bias (−1.68 ± 0.47 ppb) and seasonality (stdv = 1.2 ppb); see Fig. 9b. This means that the basic findings and conclusions from our paper should hold for retrievals with the new GFIT version as well. For example, Fig. 10 also shows that, using the GFIT 2012 version, the slopes for the direct NIR-MIR comparison are again not significantly different from 1, as found previously using GFIT 2009 (Fig. 4a).

Validity of the linear approximation of Eq. (1)
Equation (1) contains an approximation as it uses averaging kernels which are linear approximations of the true retrieval which is non-linear in the state vector x. To investigate the validity of this approximation within the context of our paper, we performed new retrievals of the full Garmisch MIR and NIR time series using 3-hourly ACTM model profile as prior and compared this to the alternative way of replacing the original a priori by ACTM, namely via Eq. (1). These two different versions of ACTM-based time series were compared to the time series retrieved with original a priori using 10-min coincidences. The results are shown in Fig. A1a. Here, the differences of the retrievals using ACTM prior and the retrievals using the original prior are displayed via red crosses. The black crosses are the differences of the retrievals (based on the original a priori) corrected to ACMT prior via Eq. (1) and the retrieval with the original a priori. It can be seen in Fig. A1a that there are only small differences between the red and black crosses, and this is visualized via green (1). (a) Red: XCH 4 from Garmisch NIR measurements retrieved with 3-hourly ACTM profiles minus retrievals using the original (GFIT) prior. Black: same as red but using Eq. (1) for a posteriori exchange of the a priori profile. Green: difference between red and black -deviations from zero are due to non-linearity of the retrieval. Data basis is retrievals from individual NIR and MIR spectra constructed from same-day NIR-MIR coincidences. (b) Same as (a) but for Garmisch NIR retrievals using the WACCM a priori profile (i.e., the prior of the MIR retrievals), (c) Garmisch MIR retrievals using the 3-hourly ACTM profiles, (d) Garmisch MIR retrievals using the GFIT a priori, (e) Wollongong NIR retrievals using the WACCM a priori, and (f) Wollongong MIR retrievals using the GFIT a priori. Table A1. Impact of non-linearity on XCH 4 from using Eq. (1) for a posteriori exchange of an a priori profile versus performing a retrieval with exchanged a priori. Data basis is retrievals from individual MIR and NIR spectra, from same-day NIR-MIR coincidences. Uncertainties are 2 times the standard errors of the mean (2-σ /sqrt (n)). mean bias from non-linearity (XCH 4 retrieved with stdv from non-linearity exchanged a priori minus (XCH 4 retrieved with exchanged a XCH 4 from use of Eq. 1) priori minus XCH 4 from use of Eq. 1) data set a priori (ppb) crosses. Figure A1b-f show analogous plots of the effects on MIR and NIR retrieval differences by exchanging their original a prioris with ACTM (MIR and NIR), GFIT (MIR), and WACCM (NIR) a prioris for both stations. We derived from Fig. A1 numbers on the mean bias and the seasonality of the bias introduced by the use of Eq. (1). These are summarized in Table A1. The general finding from Fig. A1 and Table A1 is that the non-linearity introduces significant but very small mean biases in both MIR and NIR cases at Wollongong and Garmisch, and also the seasonality of these biases is negligible or small. Only in the case of Garmisch data based upon ACTM a priori were non-linearity errors of > 1 ppb (bias and seasonal/zenith angle dependent stdv) found.

Appendix B
Description of the a priori models

B1 ACTM-based prior
The model used for obtaining a common a priori profile of the MIR and NIR retrievals is the CCSR/NIES/FRCGC AGCM-based chemistry transport model (i.e., ACTM), which has been developed for simulating the major longlived greenhouse gases (Patra et al., 2009). The ACTM simulations are conducted at T42 spectral truncations in the horizontal (≈ 2.8 × ≈ 2.8 degrees latitude/longitude) and 67 vertical levels covering the height range from the earth's surface to the mesosphere (≈ 1.3×10 −5 σ pressure or ≈ 80 km). The emissions and loss of methane in ACTM are adopted from the TransCom-CH 4 simulation protocol (Patra et al., 2011). Comparisons showed that forward ACTM simulations of annual-mean methane are in close agreement (within 1 ppb) with measurements from surface sites as to inter-hemispheric gradients (Patra et al., 2011). ACTM-simulated vertical profiles of dry-air mole fractions on the native model vertical grid and nearest horizontal grid of the FTIR sites are sampled at 3-hourly intervals for use as a priori in this study. We interpolated the model profiles for each measurement time on the model pressure grid and applied this interpolated profile.

B2 WACCM-based prior
Chemical profiles for all targeted NDACC and many background species have been generated for all NDACC, TC-CON and other sites for use as a priori. These a priori profiles have several advantages over other sources of a priori information. The modeled data employs surface emission data that can provide more accurate low-altitude mixing ratios that the FTIR retrieval may not be sensitive to and may not be included in other a priori sources, e.g. satellite profiles. By deriving a mean a priori from a long-term model run, the variability of the mean is also determined and is a sole source variability and a valuable component for understanding smoothing by the retrieval. To the accuracy of the model the interspecies correlations are self-consistent. The global surface-to-mesosphere model provides consistency for all sites in the altitude of interest for the FTIR retrievals. The WACCM model is described in Garcia et al. (2007).
To provide a priori that are as unbiased as possible, the a priori are an average from monthly sampling of the 40-yr portion from 1980 to 2020 of a 75-yr CCMVal model intercomparison. The CCMVal project is described in Eyring et al. (2007) and compares several models under specific IPCC scenarios for ozone recovery. In particular we use a moderate set of scenarios following REF2 and IPCC scenarios A1B for greenhouse gas emissions, AR4 for sea surface temperatures and surface halogen as prescribed by WMO/UNEP. Details can be found in Eyring et al. (2007). These a priori provide a reasonable mean from which observations will vary. The a priori were tested for applicability at all sites before adoption as an NDACC a priori standard.  Fig. 3 but showing the impact on a) Garmisch NIR retrievals using the Garmisch WACCM a priori profile (i.e. the standard prior of the Garmisch MIR retrievals) as prior, b) Garmisch MIR retrievals using 3-hourly ACTM profiles, c) Garmisch MIR retrievals using the GFIT a priori profile (i.e. the standard prior of the NIR retrievals), d) Wollongong NIR retrievals using 3-hourly ACTM profiles, e) Wollongong NIR retrievals using the Fig. C1. Same as Fig. 3 but showing the impact on (a) Garmisch NIR retrievals using the Garmisch WACCM a priori profile (i.e. the standard prior of the Garmisch MIR retrievals) as prior, (b) Garmisch MIR retrievals using 3-hourly ACTM profiles, (c) Garmisch MIR retrievals using the GFIT a priori profile (i.e. the standard prior of the NIR retrievals), (d) Wollongong NIR retrievals using 3-hourly ACTM profiles, (e) Wollongong NIR retrievals using the WACCM a priori profile, (f) Wollongong MIR retrievals with the a priori profile corrected to 3-hourly ACTM profiles via Eq. (1), and (g) Wollongong MIR retrievals using the GFIT a priori profile. WACCM a priori profile, f) Wollongong MIR retrievals with the a priori profile corrected to 3-hourly ACTM profiles via Eq. 1, g) Wollongong MIR retrievals using the GFIT a priori profile.  Fig. 5 but using a) for Garmisch the WACCM a priori profile as common prior (i.e. the standard prior of the Garmisch MIR retrievals), b) for Garmisch the GFIT a priori profile (i.e. the standard prior of the NIR retrievals), c) for Wollongong the original a prioris, d) for Wollongong a correction to 3-hourly ACTM profiles as common prior, e) for Wollongong the WACCM a priori profile, f) for Wollongong the GFIT a priori profile.  Fig. 6, i.e. cross-correlations using original priors (blue) compared to cases with varied common a priori profiles (red): a) Garmisch with WACCM prior, b) Garmisch with GFIT prior, c) Wollongong with ACTM prior, d) Wollongong with WACCM prior, e) Wollongong with GFIT prior.