Interactive comment on “ Intercomparability of X CO 2 and X CH 4 from the United States TCCON sites

This manuscript presents a study of the intra-network bias of the XCO2 and XCH4 among the four currently operating TCCON sites in the United States using a pair of portable low resolution spectrometers of the type EM27/SUN designated as mFTSs in the manuscript. Both, the TCCON spectrometer with a spectral resolution of 0.2 cm-1 and the mFTSs with a spectral resolution of 0.5 cm-1 measure the direct solar radiation in the near infrared spectral region using standard InGaAs detectors. The intercomparison campaign was performed within a short time period of 5 weeks to reduce the potential drift between the mFTSs. The authors consider different reasons for the residual differences in the Xgas products between the TCCON sites and the mFTSs. Some of these reasons, like the air-mass-dependent artifacts, surface pressure bias, a priori temperature profile error and the sensitivity of the averaging kernel difference, C1


Introduction
The Total Carbon Column Observing Network (TCCON) is a network of ground-based spectrometers that record near infrared (IR) direct solar spectra from which column abundances of greenhouse gases are retrieved (Wunch et al., 2011b(Wunch et al., , 2015)).Column average dry-air mole fractions (DMFs, or X gas where "gas" is the species of interest) mea-sured by multiple TCCON sites are used to evaluate X gas retrievals from satellite measurements (for example, Dils et al., 2014;Kulawik et al., 2016;Nguyen et al., 2014;Wunch et al., 2011a).TCCON measurements are tied to the World Meteorological Organization (WMO) in situ trace gas measurement scales through extensive comparisons with in situ DMF profiles obtained by balloon and aircraft measurements (Deutscher et al., 2010;Geibel et al., 2012;Messerschmidt et al., 2011;Washenfelder et al., 2006;Wunch et al., 2010).
For the TCCON to meet the goals of satellite validation and carbon cycle flux estimates, measurements need be precise and accurate.Currently, the 2σ single sounding uncertainties of the TCCON are estimated to be 0.8 ppm (0.2 %) X CO 2 and 7 ppb (0.4 %) X CH 4 (Wunch et al., 2010).Systematic errors such as spectral ghosts (Messerschmidt et al., 2010), pressure offsets, instrument misalignment, or improper fitting of the continuum curvature (Kiel et al., 2016) can, however, produce systematic biases between sites that will remain even after averaging many single sounding measurements.An error analysis by Wunch et al. (2015) suggests that biases of 0.2 % for X CO 2 and 0.4 % for X CH 4 could exist in the network even though the retrieval algorithm (GGG) has undergone continual improvements designed to reduce such biases.
In this study we quantify bias in X CO 2 and X CH 4 among the four operational TCCON sites in the United States (US) in 2015.These sites were at (1) the California Institute of Technology (Caltech), Pasadena, California (CA); (2) Armstrong Flight Research Center (AFRC), Edwards, CA; (3) Lamont, Oklahoma (OK); and (4) Park Falls, Wisconsin (WI).Bias quantification was accomplished by comparisons with two mobile EM27/SUN spectrometers (Gisi et al., 2012).A map of the US 2015 TCCON sites is shown in Fig. 1.The campaign is described in Sect.2; the data processing and some sensitivity tests are described in Sect.3. Comparisons between the sites are made in Sect. 4.

US TCCON 2015 intercomparability campaign
This campaign involved a comparison of simultaneous sideby-side measurements from two EM27/SUN instruments with TCCON measurements.One EM27/SUN instrument is operated by Caltech and one by Los Alamos National Laboratory (LANL).These instruments have been described in detail elsewhere (Gisi et al., 2012).Briefly, similar to the TC-CON spectrometers, they measure direct solar near IR spectra, albeit at a lower resolution (0.5 cm −1 versus 0.02 cm −1 ).They include an in-built solar tracker and are small and stable enough to be easily transported.We also designate them as mFTSs for mobile Fourier transform spectrometers (mFTSs) herein.For this study, both mFTSs employed the standard In-GaAs (indium gallium arsenide) detector.To reduce the potential for drift between the mFTSs, the campaign was completed within a 5-week period.Based on the lack of drift between the two mFTSs, we conclude that the retrievals from their observations are internally precise over this period so their X gas measurements can be used as transferable comparison products.
The general strategy of the campaign was to visit each of the four TCCON sites shown in Fig. 1 and attempt at least 5 days of measurements.Two mFTSs were used so any drift in their measurements would be noticed.In addition to the spectrometers, a traveling Coastal Environment Weather Station with a ZENO ® data logger and Setra barometer was used for regular meteorological surface measurements at the AFRC, Lamont, OK, and Park Falls, WI, sites.At Caltech the on-site ZENO ® data logger and Setra barometer were used.This type of barometer is used at each of the four US TC-CON sites.The Setra sensor has a resolution of 0.1 hPa and a stated accuracy of 0.3 hPa.A Paroscientific 765-16B Portable Barometric Digiquartz ® pressure standard with a stated accuracy of ± 0.08 hPa or better was used as a traveling pressure standard.The Digiquartz ® was compared with each of the on-site barometers.Surface pressure is important to the X gas retrievals because it is used to derive the pressure altitude for the site.
In Table 1 we present the dates of the campaign as well as the number of coincident averaged measurements.Oc-casionally one mFTS recorded significantly fewer spectra due to unexpected halts during acquisition.This issue was mostly resolved by updating to the latest firmware provided by Bruker ™ while at AFRC, but it shows an advantage of having multiple mFTS instruments.Our quality control filters were set after a preliminary look at the data.For this study our filters included 392 ppm < X CO 2 < 404, 1.79 ppm < X CH 4 < 1.865 ppm, and solar variation < 0.5 % within an interferogram.Prior to the campaign several of the TCCON sites used a mercury manometer as an absolute pressure reference.In the comparisons shown here, the current version of the public TCCON data (R0 for Park Falls, R1 for all others) are used where the surface pressure measurements at all sites are tied to the Digiquartz ® (Iraci et al., 2014;Wennberg et al., 2014a, b, c).The mFTSs used the meteorological data from the Caltech on-site station or from the traveling Setra barometer with offsets applied to match the Digiquartz ® .

Site characteristics -Caltech
The Caltech site is located in Pasadena, CA (34.136 • N, 118.127 • W; 240 m a.s.l.), in the California South Coast Air Basin (SoCAB).Pasadena is in an urban environment where there are large diurnal variations of X gas pollutants because of emissions and advection (Wunch et al., 2009(Wunch et al., , 2016)).Emissions from the basin are estimated to be 167 Tg CO 2 yr −1 and 448 ± 91 Gg CH 4 yr −1 (Wunch et al., 2016).Pasadena is located towards the northern end of the basin, which is bounded by mountains.Two additional sides of the basin are also bounded by mountains, and the other side is bounded by the Pacific Ocean.General conditions during the August 2015 campaign were mostly clear skies with some cirrus clouds.We treat 2 different weeks at Caltech separately to estimate the limits of our methodology.The mean measured daytime X H 2 O for both weeks was 3540 ± 840 ppm (1σ ).

Site characteristics -AFRC
The AFRC (also called Dryden or Edwards) is located in the Mojave desert at 34.960 • N, 117.881 • W (700 m a.s.l).It is approximately 100 km north of Caltech and 100 km east of Bakersfield, CA.AFRC is on a military base, but the surrounding area is much less densely populated than the So-CAB.The area is mostly flat and devoid of vegetation.General conditions here during the campaign were cloud free with daytime surface temperatures of 36.4 +4.0 −13.2 • C (95 % confidence intervals, or CI) and a mean measured daytime X H 2 O of 2640 ± 250 ppm (1σ ).

Site characteristics -Lamont
The Lamont, OK, site is located in an agricultural region that is mostly flat with some rolling hills (36.604 • N, 97.486 • W; 320 m a.s.l.).It is situated on the Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) site.The  surrounding area is sparsely populated.During the campaign cumulus clouds were present covering from less than 5 % to approximately 40 % of the sky.The mean measured daytime X H 2 O for the campaign week was 5080 ± 890 ppm (1σ ).

Site characteristics -Park Falls
The Park Falls, WI, TCCON site has been described in more detail elsewhere (Washenfelder et al., 2006).Briefly, the site is in a sparsely populated but heavily forested region with low topographic relief (45.945 • N, 90.273 • W; 473 m a.s.l.).Conditions were highly variable, ranging from nearly cloud free to full coverage by stratocumulus clouds.Despite planning more days at this site, the often cloudy conditions contributed to collecting the least amount of data.On 11 September 2015, the TCCON IFS 125HR instrument was realigned as part of routine maintenance.We treat the days before and the day after alignment separately.The mean measured daytime X H 2 O was 2480 ± 750 ppm (1σ ) for this period.
3 Data processing and sensitivity tests Parker et al. (2015) reported on the comparability of the mFTSs X gas products during the campaign and did not report any drift between them.The modulation efficiency (ME) at maximum optical path difference (MOPD) was reported to be 0.997-0.999for the LANL mFTS throughout the campaign.The reported ME at MOPD for the Caltech mFTS was lower and more variable, though it is unclear whether or not this variation was due to error in the characterization.
A combined mFTS comparison product was created using an unweighted average of the measurements from the two spectrometers based on the recommendations of Parker et al. (2015).This reduces the drift (if any) by one of the instruments.The observed biases of 0.05 ppm X CO 2 and −1 ppb X CH 4 between the mFTSs were added to the Caltech mFTS products before combining with the LANL products.
As a first comparison to the mFTS data, no adjustments to TCCON data are made.These retrievals use the operational GGG2014 algorithm (Wunch et al., 2015).Retrievals with the mFTSs are also performed using GGG2014 with the EGI (EM27/SUN GGG Interferogram processing) suite for Figure 2. Sensitivity of TCCON-and mFTS-retrieved X CO 2 (a) and X CH 4 (b) to a +10 K change in the planetary boundary layer (surface-700 hPa) a priori temperature.Green and black points are raw sensitivities, and blue and grey points are their differences during the two times at Caltech.Points are 10 min averages, n = 397.For X CO 2 the TCCON-EM27 differences are small (< 0.15 %) but air mass dependent.For X CH 4 the TCCON-EM27 differences are larger (0.3-0.4 %) but with little air mass dependence.The strong air mass dependence for X CO 2 suggests that air mass needs to be taken into account for X CO 2 surface temperature error adjustments.
automation purposes (Hedelius et al., 2016).Both the highand low-resolution retrievals used the same model pressure, temperature, altitude, and water profiles (pTz+H 2 O) generated from the NCEP/NCAR 2.5 • reanalysis product (Kalnay et al., 1996).One profile interpolated to local solar noon is used per day in GGG2014.Several sensitivity tests have already been performed for TCCON retrievals using GGG2014 (Wunch et al., 2015) as well as for the mFTS retrievals using GGG2014 (Hedelius et al., 2016).We repeat some tests for data collected at the Caltech site.To test the sensitivity to the lower tropospheric temperature, a +10 K change is applied for all levels at or below 700 hPa.The results are shown in Fig. 2 as a function of air mass (AM).We do not expect the temperature sensitivity to be the same for changes over fewer levels.In Table 2 we list changes in X CO 2 and X CH 4 at an air mass of 1.5 for temperature changes over different levels.Though the temperature bias at the surface is significant, comparison with sonde measurements suggest it decreases rapidly with altitude, making a bias of +10 K all the way to 700 hPa highly unlikely (David Pollard, personal communications, 2016).

Comparisons
Because of different spectral resolutions between the TC-CON instruments (0.02 cm −1 ) and the traveling spectrometers (0.5 cm −1 ), we anticipate systematic differences in their X gas retrievals (Gisi et al., 2012;Petri et al., 2012).Even in the absence of instrumental problems, spectroscopic inadequacies can cause systematic differences that correlate with T (temperature) errors, surface pressure errors, and solar zenith angle (SZA; Wunch et al., 2011b).In addition, the instruments have different averaging kernels (AKs) due to differences in spectral resolution.Thus, even though we use the same a priori gas volume mixing ratio and temperature profiles, errors therein will produce differences in the retrieved X gas products (e.g., compare Wunch et al., 2015, andHedelius et al., 2016).In this section we consider five reasons why the X gas products between the two instrument types (mFTSs and TCCON) may differ.
First, we consider AM-dependent artifacts that arise due to the effect of spectroscopic errors being resolution dependent.Second, we consider how surface pressure bias could affect retrievals, noting that surface pressure bias should be minimal amongst the current US TCCON data because of standardization to the common traveling Digiquartz ® standard.Third, we consider effects of errors in the a priori temperature profile on retrievals from higher-versus lower-resolution spectra.Fourth, we consider the effects of differences in sensitivity from the AKs.Finally, we mention how a non-ideal ILS (instrument line shape) may affect retrievals.

Unadjusted comparisons and AM dependence
The comparisons prior to accounting for differences in temperature sensitivities and AKs are shown as box plots in Fig. 3 ( = TCCON − mFTS).The mFTS data were scaled to match the TCCON product and center the difference about zero, by dividing by scaling factors of 0.9987 for X CO 2 and 1.0073 for X CH 4 .These factors were based on the TCCON and mFTS data at all sites and were used in combination with the TCCON to in situ profiles bias correction (Wunch et al., 2015).An additional scaling factor is used because retrievals from lower-resolution spectra are biased compared to higher-resolution spectra due to errors in a priori profiles and spectroscopy (Gisi et al., 2012;Hedelius et al., 2016;Petri et al., 2012).For the box plots, we use the convention that the whiskers are 90 % CI.
AM-or SZA-dependent differences may arise due to spectroscopic errors (Frey et al., 2015).At higher SZAs sunlight passes through a longer atmospheric path, which increases the depth of the measured transmission lines.Spectroscopic errors can lead to bias that varies with SZA, even in clean air sites (Wunch et al., 2011b).Though adding in an AMdependent correction did not improve the long-term mFTS to TCCON comparison in previous studies (Hedelius et al., 2016), here we noted significant AM dependencies.Airmass-dependent corrections are accounted for in TCCON data, but these are developed for the high-resolution observations (Wunch et al., 2011b).When we attempted to correct the X gas from the mFTS measurements as a function of SZA, we noted significant influences from local sources and sinks, even at the non-Caltech sites.This complicated the separation of the spurious effects with AM from true atmospheric variation.Additional measurements in areas with little atmospheric variation could aid in accounting for AM artifacts (Klappenbach et al., 2015).In this study, we apply a symmetric basis function to the mFTS products following Eq.(A12) in Wunch et al. (2011b), with coefficients determined empirically to reduce the overall diurnally varying difference data between the mFTS and TCCON retrievals.Further, for estimates of bias we only use data within ±2 h of local noon so that comparisons are over similar SZAs at all sites.This constrains comparison data to have an AM between 1.05 and 1.85 (site means between 1.10 and 1.46).Recent work has shown residual dependencies on AM that could cause a high bias of ∼ 1 ppb X CH 4 between AM 1.10 and 1.46 (Matthaeus Kiel, personal communications, 2017).

Surface pressure and temperature considerations
Surface pressure is used in the calculation of the dry-air column in GGG.It is an input to the retrievals to set the pressure altitudes of each site.A +1 hPa bias in surface pressure leads to average biases of approximately +0.036 % X CO 2 and +0.039 % X CH 4 , respectively, for 10 • < SZA < 20 • and +0.034 % X CO 2 and +0.049 % X CH 4 , respectively, for 70 • < SZA < 80 • (Wunch et al., 2015).Because pressure measurements are tied to the same Digiquartz ® sensor (accuracy of ±0.08 hPa), surface pressure errors are expected to contribute less than 0.01 % to the X CO 2 and X CH 4 retrievals.
At different temperatures, the distribution of the molecular J states differs, which can affect the relative strengths of overlapping lines from different species.In GGG bands are chosen to be reasonably temperature insensitive by including both high and low J lines to average out temperature sensitivity.In the lower-resolution spectra, lines are less well resolved.When the algorithm attempts to fit the lines, the overall fit may still be good even if fits for individual species are incorrect, but in compensating ways.
We define a temperature error as the a priori surface interpolated temperature minus the measured site temperature.Histograms of the temperature errors at the different sites are shown in Fig. 4. In general, NCEP temperatures are typi- cally cooler than those measured on site.At AFRC the difference is particularly large: the NCEP reanalysis product underestimates the surface temperatures by ∼ 10 K at times in this desert region for this particular week.We also compared interpolated surface temperatures from the European Centre for Medium-Range Weather Forecasts (ECMWF; 0.125 • × 0.125 • ), MERRA-2 (Modern Era Retrospective-Analysis for Research and Applications), GEOS-5 (Goddard Earth Observing System Model), and NAM12 (North American Mesoscale Forecast System, 12 km).Model surface temperature is lower than the AFRC TCCON temperature in all cases, and three of the five models have noon differences of ∼ 10 K. Differences are ∼ 7 K for GEOS-5 and ∼ 5 K for NAM12.Though error in the measurement may contribute to part of the T difference, the lower-resolution dynamical models may have a difficult time reproducing surface T at AFRC.
To account for error in the a priori temperature profiles near the surface, we apply two different tests separately.First, we define the temperature error from the surface to 700 hPa as equal and apply the results described in Sect.3. Second, we apply corrections defining the temperature error separately at each level.The error at each level k was defined as the difference from the NCEP profile potential temperature θ NCEP,k −θ measured,s (where "s" stands for surface) when θ measured,s >θ NCEP,k .Thus potential temperatures aloft are always greater than or equal to θ measured,s .Both corrections reduce the diurnal trend of the X CH 4 and X CO 2 during the middle hours of the day but do not significantly alter the comparisons in the late afternoon.True temperature profiles are likely different from the NCEP noon profiles.Future releases of GGG will apply a post facto temperature correction for the lowest 3 km based on temperature-dependent water lines (Toon et al., 2016b).For future studies, we recommend adding dedicated sondes as part of the instrument suite for these field campaigns.

Averaging kernel
igure 5.A comparison of the averaging kernels at three different SZAs for the higher-resolution (HR) and lower-resolution (LR) instruments.The LR instruments are more sensitive to changes at the surface but less sensitive to changes in the stratosphere.

Averaging kernel differences
AKs (Fig. 5) are different for the 0.02 and 0.5 cm −1 instruments.We apply Eq. (A13) from Wunch et al. (2011a) to the TCCON X gas (c) product to reduce the smoothing error (the contribution of different AKs).We denote the mFTS by subscript 1, the TCCON by subscript 2, and the TCCON product adjusted to reduce the smoothing error of the mFTS AKs (AKs) as 1←2.
A ˆrepresents a retrieved quantity, the subscript "a" denotes the prior, h is the pressure weighting function described by Connor et al. (2008), a is the column AK, x is the DMF a priori profile, and γ is the overall scaling factor applied to the TCCON a priori profile to obtain the retrieved X gas .
Both the TCCON and the mFTS use the same a priori profiles.In Eq. ( 1), the TCCON profile γ x a is treated as an approximation to the true atmospheric DMF profile (compare Eq. 3 from Rodgers and Connor, 2003).This is a better approximation in a sparsely populated location such as Lamont than at Caltech where local anthropogenic emissions strongly influence the atmosphere.However, overall the application of Eq. ( 1) only makes differences of 0.00 +0.04 −0.04 ppm and 0.01 +0.17 −0.07 ppb (95 % CI) for X CO 2 and X CH 4 in this dataset.
GGG2014 a priori profiles do not take into account local anthropogenic emissions at the surface.In Fig. 6 we plot the in situ DMFs of CO 2 and CH 4 measured near the surface throughout the day as well as those from the a priori profiles used in the GGG2014 retrievals at the Caltech site.The in situ measurements were recorded using a Picarro cavity ring down spectrometer, with standardization by comparison to three NOAA (National Oceanic and Atmospheric Administration) standards every 23 h.Given the intense local emissions, the measured in situ DMFs are significantly larger than the a priori near the surface.Using the same assumptions as Hedelius et al. (2016), the X gas retrievals for two instruments in a polluted environment where the true and a priori profiles differ only at the surface are related by Note the error term has been omitted.The subscript s represents the surface.These assumptions are better for X CO 2 than for X CH 4 as changes in tropopause height can also make the a priori methane profile significantly different from the true profile (Saad et al., 2014).Over this time at Caltech, X HF averaged ∼ 50 ppt and γ HF averaged ∼ 0.87, suggesting an a priori tropopause height that is too low.Using the β value from Saad et al. (2014) we estimate a 13 % difference in γ HF due to tropopause height would cause about a 0.24 % change in γ CH 4 (∼ 4 ppb), which is large enough that Eq. ( 2) is not valid for X CH 4 .We apply Eq. ( 2) to the X CO 2 TCCON retrievals at the Caltech TCCON site, which leads to an adjustment of 0.22 +0.54 −0.35 ppm (95% CI).

Effects of a non-ideal ILS
Imperfections in the ILS due to misalignment of the TCCON FTSs can also cause site biases.At the sites described in this study, weekly internal lamp measurements of the internal, calibrated HCl cells (Hase et al., 2013) are collected from the 125HR instruments.We use LINEFIT 14.5 (Hase et al., 1999) software on HCl lines from monthly-averaged spectra to characterize the ILS.For Park Falls spectra were averaged before and after realignment.In Fig. 7 are the ME and phase error (PE) with OPD.An ME not equal to 1 can indicate instrument misalignment, which may be from shear, angular, or defocus misalignment.
Effects of different types of misalignment on ME are not independent (Toon et al., 2016a).However, parameterizing changes in ME with OPD can be used to assess effects on X gas retrievals (Griffith and Macatangay, 2010;Velazco et al., 2016;Wunch et al., 2011Wunch et al., , 2015)).These previous studies have found that each 1 % increase in ME at MOPD leads to a decrease on the order of 0.04 % in X CO 2 , though the change does vary with SZA.For X CH 4 , there is a decrease on the order of 0.03-0.05% for a 1 % increase in ME at MOPD.The cause of the change in ME with OPD can, however, also significantly influence results.For example, Wunch et al. (2015) noted significantly different results for the same change in ME when the cause is shear versus angular misalignment.
We estimate biases based on ME at MOPD values alone, compared with AFRC.Based on the LINEFIT analysis of the lamp spectra, we would expect a low X CO 2 bias of 0.02 % for www.atmos-meas-tech.net/10/1481/2017/Atmos.Meas.Tech., 10, 1481-1493, 2017 Caltech, a high bias of 0.05 % for Lamont, and a high bias of 0.09 % for Park Falls (prior to realignment).The results of our study are not consistent with this expectation.Only Park Falls is consistently in the right direction with a bias of ∼ 0.18 % before realignment.After realignment, Park Falls X CO 2 was more in line with the other spectrometers, although based on the ME at MOPD results alone there should have been a change in the opposite direction.The Park Falls ILS was much more symmetrical after realignment, as seen by the PE curve in the lower panel of Fig. 7 being much closer to zero.For X CH 4 , both Park Falls and Lamont are biased in the expected direction from Armstrong, and the Park Falls-1 bias is ∼ 0.25 %.However, the Lamont bias is greater than expected from the single value parameterization.A more complex parameterization of the ILS effect on X gas (e.g., using the full function of ME with OPD, accounting for SZA dependence) might reduce the expected versus observed mismatch.
The X air parameter from GGG can be used as a diagnostic for large misalignments, timing, and surface pressure errors.X air is calculated by dividing the sum of all non-water molecules based on the surface pressure by the retrieved column of dry air based on column O 2 .X air should be close to 1.0 and not vary, though empirically it is approximately 2% lower due to spectroscopic errors for oxygen (Washenfelder et al., 2006).Wunch et al. (2015) showed an increase of about 0.3 % in X air for a 1% increase in ME at MOPD due to shear misalignment, and the change due to angular misalignment was < 0.03%.In Fig. 8 X air is shown for all the sites.At Park Falls X air was approximately 0.979 before and 0.983 after alignment, which could correspond to an ME increase of about 0.013 at MOPD from shear realignment.LINE-FIT results actually show a decrease in ME at MOPD after 11 September 2015, but X CO 2 and X CH 4 decreased.Based on X air , X CO 2 was expected to change by ∼ 0.2 ppm (compared with ∼ 0.08 ppm) and X CH 4 was expected to change by 0.7-1.2ppb (compared with ∼ 1.5 ppb).Residual differences may indicate measurement uncertainties.

Truncated 125HR interferograms comparisons
Retrievals from the 125HR and mFTS instruments are inherently different due to the differences in resolution.By truncating the longer 125HR interferograms to the same length as those collected from the mFTS, similar-resolution spectra are obtained.This likely eliminates most discrepancies between the different types of measurements, except for some residual instrumental imperfections such as instrument misalignment or ghosts.Truncation also reduces the effects of ME variations due to the smaller MOPD.Truncation has been performed in past studies comparing retrieved X gas from different-resolution spectrometers (Gisi et al., 2012;Hedelius et al., 2016;Petri et al., 2012).This test provides little new information if truncation changed all retrieved DMFs in a uniform manner.However, past studies showed trun- . TCCON X air compared with mFTS X air within ±2 h of local noon.The differences are scaled by 1.001 to be centered about zero.X air can be used as a diagnostic for misalignments, timing, or surface pressure errors.The results of the truncation test are shown Fig. 9, and changes are most easily seen from the unscaled (open) points.The sign of the change for X CO 2 is inconsistent for the different sites.Previous studies also noted changes that were negative (Petri et al., 2012), positive (Gisi et al., 2012), or both (but with a preference towards negative; Hedelius et al., 2016) when using lower-resolution spectra.For lowerresolution spectra X CH 4 increases, in agreement with previous studies (Hedelius et al., 2016;Petri et al., 2012).

Biases to overall median
The medians and standard deviations for data before and after considering differences in AKs, and surface temperature are shown in Fig. 9. Though we have attempted to reduce artificial diurnal variation between the different instruments with the AM correction, there may still be some residual de-Atmos.Meas. Tech., 10, 1481-1493, 2017 www.atmos-meas-tech.net/10/1481/2017/pendence with SZA.To reduce this dependence, which is larger at higher SZAs, only data within ±2 h of local noon are used.We use the Kruskal-Wallis one-way analysis of variance test, which assumes ordinal but not necessarily normally distributed data (Kruskal and Wallis, 1952), and compare data from each site to the median of data from all sites.
The null hypothesis of this test is the medians do not significantly differ.Line styles indicate the degree of significance by the Kruskal-Wallis tests.
Pooled differences are listed in Table 3 for different adjustments.These are represented by the averages of the group median differences, the overall median, and the average standard deviations.Park Falls TCCON data prior to realignment of the spectrometer are omitted.The sum of the median differences decreases for X CO 2 after adjustments.However, this is not true of X CH 4 , which increases in variability after adjustment.Despite this overall increase for X CH 4 , these adjustments better reflect the intercomparability of the sites rather than the intercomparability of measurements from differing instruments.From Table 3, we estimate the average biases of all sites compared to the median to be 0.03 % X CO 2 and 0.08 % X CH 4 .

Confidence intervals of the pairwise differences
We use the Critchlow-Fligner method to estimate simultaneous CI for the differences between all pairs of sites (Hollander et al., 2014).The Critchlow-Fligner test is nonparametric so it is less sensitive to outliers and few assumptions are needed about the distribution of the underlying population of data.We use α = 0.05 to obtain 95 % confidence intervals of the differences between sites.Results are presented in Fig. 10 in order of decreasing median difference and separated by gas and adjustments.At the bottom are the ordering of the sites.
This comparison suggests for X CO 2 is lowest for Lamont and highest for Park Falls-1 in both cases.There is a difference between the 2 different weeks at Caltech for unknown reasons.The largest difference within a 95 % CI is 0.6 ppm between Park Falls and Lamont; this difference is 1.0 ppm for the truncation test.However, most mid-range values are ∼ 0.2 to 0.3 ppm.
For X CH 4 , there was more of a change in site order between the two cases.For the truncation comparison the differences are even greater than AM+T+AK comparison as indicated in Table 3.The largest difference within a 95 % CI is 4 ppb between Lamont and Caltech.For the truncation test the largest difference is between Armstrong and Caltech and is greater than 5 ppb.Mid-range values are 2-3 ppb.

Conclusions
We estimate the range of statistically significant site-to-site bias amongst the sites as < 0.3 ppm for X CO 2 and < 3 ppb for X CH 4 .These were determined by comparing TCCON data with simultaneously collected data from co-located portable spectrometers, which we have assumed to be internally precise over the duration of the campaign.This assumption is supported by standard deviations of only 0.15 ppm for X CO 2 and 1 ppb for X CH 4 for the 10 min averaged differences between the two mFTS instruments over the campaign.Five reasons X gas could differ among instruments were considered: (1) differences in averaging kernels, (2) differences in spurious air mass dependence from spectroscopy errors, (3) the a priori profile (e.g., temperature profile), (4) error in the measured surface pressure, and (5) instrument misalignments.Of these, the last four can cause site-to-site biases in the TCCON, and empirical adjustments to make the mFTS and TCCON datasets more comparable were made to the first www.atmos-meas-tech.net/10/1481/2017/Atmos.Meas.Tech., 10, 1481-1493, 2017 -1 -0.5 0 0.5 1 . Pairwise 95 % CI of differences between sites.Differences for data within ±2 h local noon.Comparisons are ranked in order of decreasing mean difference.For each species, plots are shown for (1) corrections for air mass, differences in temperature sensitivity errors defining temperature errors layer by layer, and a reduction of the smoothing error from different averaging kernels; (2) differences by comparing results from 125HR spectra with lowered resolutions.At the bottom are the site orderings.Lines between indicate when the pairwise difference is first more than 0.
three.When the 125HR interferograms were truncated so the spectra would be the same resolution as the mFTSs, differences from the first three inherently go away.As the spectroscopy is improved, the data should have smaller AM-dependent artifacts, though for now an empirical correction is used for the TCCON (Wunch et al., 2011).Updates to the retrieval algorithm to include line mixing may also make the AM dependence more predictable (Hartmann et al., 2009).The corrections based on T errors described in Sect.4.2 are for the differences in sensitivity to T error between the mFTS and TCCON instruments and not for the different T errors at each TCCON site.Large temperature errors of +10 K from the surface through 850 hPa could cause errors of 0.08 % in X CO 2 and 0.11 % in X CH 4 at an air mass of 1.5.Biases due to a non-ideal ILS will be reduced in future versions of the GGG retrieval algorithm.Biases in surface pressure data can cause site biases but are expected be less than 0.01 % in the current data revisions because surface pressure data were standardized to the same traveling standard.We recommend regular (∼ annual, depending on the pressure sensor accuracy) comparisons of meteorological pressure measured by on-site barometers with a universal standard for those making similar column measurements.
Remaining differences are most likely from a combination of other errors mentioned by Wunch et al. (2015), such as instrumental misalignment and Doppler shifting of solar lines with respect to telluric lines.Some of these uncertainties will be reduced in the next version of GGG.Other remaining differences may be due in part to noise.Sufficiently large sample sizes should have helped reduce bias from noise, and the 15 min running standard deviations for TCCON were 0.11 % X CO 2 and 0.13 % X CH 4 .Apparent differences between the weeks at Caltech suggest we are near the precision limit of our current methodology.Though we reduced the contributions of X gas from different instruments, there may remain additional contributions because of differences in resolution (Petri et al., 2012).
United States TCCON site-to-site biases measured herein are within the 2σ X CO 2 and X CH 4 uncertainties stated by Wunch et al. (2010).We suggest repeating this study comparing results from traveling spectrometers with those from the stationary TCCON sites, especially when aircraft and air-core data are not available to check for bias.Ideally repeat campaigns will include multiple traveling mFTS instruments.Others may even consider taking three mFTS instruments so if there is a change from one it would be noticeable by comparing with the other two.When collocated, three or more EM27/SUN instruments can easily be operated by just one or two people.Multiple instruments also provide backup in case problems arise with one and can increase the signal to noise ratio.As a backup strategy, one traveling mFTS can be taken in the field and compared with an mFTS instrument left in a fixed location before and after the campaign.This second strategy is acceptable when there are no instrumental issues, or if it is known when and how issues affect X gas measurements.This type of campaign can be repeated every few years, or with different sites (e.g., Sha et al., 2016), or with different gases that can be measured with an extended-band InGaAs detector with spectral filters (Hase et al., 2016).Similar studies should, however, also consider the current precision limits of these comparisons on various timescales.We hope others will improve on our methodology to estimate inter-site biases using portable spectrometers.A sufficient number of aircraft profiles may also aide in determining intercomparability.The NASA Atmospheric Tomography Mission (ATom), for example, will conduct global flights summer 2016 through spring 2018 and will include profile measurements of CO 2 , CH 4 , CO, and N 2 O over many of the TCCON sites (https://espo.nasa.gov/home/atom).Data from ATom can be used to reevaluate TCCON uncertainties in the next version of GGG.
Data availability.TCCON data are currently hosted on the CDIAC and will also be available on the Caltech library data archive by the end of the year (Iraci et al., 2014;Wennberg et al., 2014a, b, c).Mobile FTS data are available upon request to the authors.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Map of the United States with TCCON sites that were active in 2015 labeled.Normalized difference vegetation index (NDVI) from Terra MODIS (Moderate Resolution Imaging Spectrometer; Didan, 2015) and nightlights from VIIS (Visible Infrared Imaging Radiometer Suite) in red are shown for September 2015.

Figure 3 .
Figure3.Differences between the TCCON and the mFTS products that are unadjusted except overall scale factors have been applied to the mFTS data (X CO 2 : 0.9987; X CH 4 : 1.0073).Box plots width represents number of comparison points.They are drawn with the center line as median; the center box is the middle 50 % range of data and the whiskers are the 90 % range.

Figure 4 .
Figure 4. Histograms of differences in temperature from those used in the retrievals at the surface (NCEP model) as opposed to the temperature measured at the TCCON sites.

Figure 6 .
Figure 6.(a) Diurnal variation of in situ DMFs measured near the surface at Caltech on the days of TCCON to mFTS comparisons.A priori surface values are marked by an "x" at noon.(b) GGG2014 a priori profiles used in the retrievals, with lower CO 2 and CH 4 than was measured near the surface.Surface pressure is indicated by the dashed line.

Figure 7 .
Figure 7. Modulation efficiency and phase error for each of the 125HR instruments describe the ILS.Results are calculated from HCl lines using LINEFIT 14.5 on monthly averages of internal lamp spectra.For Caltech, 2 different months are shown and Park Falls-1 corresponds to August 2015 and Park Falls-2 corresponds to October 2015.

Table 1 .
Number of measurements prior to any filtering.

Table 2 .
Percent changes for T sensitivities at an air mass of 1.5 and a temperature change of +10 K.

Table 3 .
Mean differences pre-and post-adjustment for ±2 h of local noon.
Medians and standard deviations of the TCCON data compared to the mFTS product after various adjustments.Line style represents the significance of the difference of the group median from the median of all data by the Kruskal-Wallis test (p < 0.05 -, p < 0.2 --, otherwise . ..).Legend entries indicate what adjustments were applied to the data to make measurements from the different instrument types more comparable.Open symbols did not have a scaling factor applied to center about zero.AM is air mass adjustment; T is temperature error adjustment; AK is averaging kernel adjustment.