Consistent validation of satellite CO

Carbon–climate feedbacks are a major uncertainty in predicting the climate
response to anthropogenic forcing (Friedlingstein et al., 2006). Currently,
about 10 Gigatons (Gt) of carbon are emitted per year from human activity
(e.g., fossil fuel burning, deforestation), of which about 5 Gt stays in the
atmosphere, causing an annual CO

This paper tests different characteristics of model and satellite CO

The remainder of the paper is organized as follows: Sect. 2 describes
the satellite X

The characteristics of the sets of carbon dioxide that will be compared to TCCON are summarized in Table 1. The following sections contain detailed descriptions of the data set versions and characteristics.

The TCCON consists of ground-based Fourier transform spectrometers (FTSs)
that measure high spectral (0.02 cm

Summary of the CO

TCCON site locations used for this work. The color indicates the year when each station started collecting data.

The total column dry-air mole fractions of CO

We use 20 TCCON stations, distributed globally (see Fig. 1), and these data
have been used extensively for satellite validation and bias correction
(e.g., Butz et al., 2011; Morino et al., 2011; Wunch et al., 2011b; Reuter et
al., 2011; Schneising et al., 2012; Oshchepkov et al.; 2012), in flux
inversions (Chevallier et al., 2011), and in model comparisons (Basu et al., 2011; Saito et al., 2012). We use the GGG2014 data when available, and the
GGG2012 data from sites Four Corners and Tsukuba_120HR. Note
that the overall bias between the GGG2012 and GGG2014 X

Stations which have special circumstances regarding validation and are
considered locally influenced are Garmisch which is in the midst of
complicated terrain, for which local atmospheric transport is difficult to model
and to measure from space; Four Corners, which is located in the vicinity of
two power plants with large CO

The Greenhouse gases Observing SATellite (GOSAT) takes measurements of
reflected sunlight in three shortwave bands with a circular footprint of
approximately 10.5 km diameter at nadir (Kuze et al., 2009; Yokota et al.,
2009; Crisp et al., 2012). The first useable science measurements were made
in April 2009, but due to changing observational modes in the early months,
we use data beginning in July 2009. In this work, we use column-averaged dry-air mole fraction (X

The following description of the SCanning Imaging Absorption spectroMeter
for Atmospheric CHartographY (SCIAMACHY) CO

The Bremen Optimal Estimation DOAS (BESD) algorithm is designed to analyze
SCIAMACHY sun normalized radiance measurements to retrieve the column-averaged dry-air
mole fraction of atmospheric carbon dioxide (X

The retrieved 26-elements state vector consists of a second-order polynomial
of the surface spectral albedo in both fit windows, two instrument
parameters (spectral shift and slit functions full width at half maximum in both fit windows, described in Reuter et al., 2010), a temperature
profile shift, a scaling of the H

A post-processor adjusts the retrieved X

CarbonTracker (CT) is an annually updated analysis of atmospheric carbon
dioxide distributions and the surface fluxes that create them (Peters et al.,
2007). CarbonTracker uses the Transport Model 5 (TM5) offline atmospheric
tracer transport model (Krol et al., 2005) driven by meteorology from the
European Centre for Medium-Range Weather Forecasts (ECMWF) operational
forecast model and from the ERA-Interim reanalysis (Dee et al., 2011) to
propagate surface emissions. TM5 runs at a global 3

In order to explicitly quantify the impact of transport uncertainty and
prior flux model bias on inverse flux estimates from CarbonTracker, the
CT2013b release is composed of a suite of inversions, each using a different
combination of prior flux models and parent meteorological model. Sixteen
independent inversions were conducted, using two terrestrial biosphere flux
priors, two air–sea CO

Monitoring Atmospheric Composition and Climate (MACC,

We show comparisons between satellite X

Time series for matches of CT2013b, MACC, SCIAMACHY, and GOSAT
vs. TCCON at Lamont

Fit of

Bias (left panel) and standard deviation (right panel) for CT2013b, MACC, BESD-SCIAMACHY, and ACOS-GOSAT vs. TCCON stations, arranged from high to low latitude. Comparisons which have a particularly low number of matches are TSUKUBA and Lauder for SCIAMACHY and Lauder for GOSAT.

The SCIAMACHY and GOSAT comparisons in this paper are based on two different
definitions of coincidence criteria between TCCON and satellite data.
Satellite measurements, which satisfy the so-called geometric criteria, are
within

The choices used in this paper regarding model/TCCON match-ups are linearly
interpolating to the TCCON latitude, longitude, and time for the models, and
using the TCCON surface pressure for calculating X

Because of the earth's curvature, high-latitude sites could have relaxed
coincidence in longitude, particularly for geometric coincidence. However
stations north of 60

Overall bias (left panel, with error bars showing the standard deviation of the bias) and standard deviation (right panel, with stars showing the predicted error for satellites) for most stations (some stations removed, see text).

Figure 3 shows a summary of the comparisons for geometric criteria where
satellite matches are not averaged. Averaging and the effects of coincidence
criteria and satellite averaging are addressed in Sect. 3.4. The black box
shows five European stations which are very close, geographically, yet have
different biases. The gray bars labeled TCCON bias uncertainty in Fig. 3
signify the overall calibration uncertainty in TCCON which is estimated to be
0.4 ppm (Wunch et al., 2010, 2011a). The significance of the bias vs. TCCON is
estimated by the 5 %

Figure 4 shows the biases and standard deviations grouped globally and over
the northern and southern hemispheres. To estimate the overall bias and
standard deviations for single observations, we take out the outliers as
follows. As described in Sect. 2.1, we take out JPL, Four Corners, Bremen,
Garmisch, and Izaña for averaging. For satellites, we remove the above
plus Tsukuba and Lauder due to limited numbers of comparisons for SCIAMACHY.
For the mean NH bias, we take out stations poleward of 60

We test whether the biases seen in Figs. 3 and 4 are persistent from year to year. When at least two full-year averages exist for a station, the standard deviation of the yearly bias is calculated. The average over all stations of the yearly bias standard deviation is 0.3 ppm for all sets (CT2013b, MACC, SCIAMACHY, GOSAT).

Another important comparison is of the predicted and actual errors. The predicted error (also referred to as the a posteriori error) is reported for each satellite product and the actual error we take to be the standard deviation of the satellite observation vs. TCCON. These two quantities should agree if the TCCON error is much smaller than the a posteriori error and the coincidence criteria do not degrade the agreement. The predicted and actual errors vary from site to site, e.g., from variations in albedo, aerosol composition, and solar zenith angle. We calculate the correlation between the standard deviation vs. TCCON and the predicted error for each site as follows: the standard deviation of the satellite vs. TCCON is calculated at each TCCON station. The correlation of the standard deviation and predicted errors by station are calculated. ACOS-GOSAT has a 0.6 correlation and BESD-SCIAMACHY has a 0.5 correlation. This indicates that the predicted error should be utilized, e.g., when assimilating ACOS-GOSAT, as the variability in the predicted error represents variability in the actual error, though not perfectly. A scale factor should also be applied to the predicted errors. For ACOS-GOSAT the predicted error averaged over all TCCON sites is 0.9 ppm, as compared to the actual error of 1.7 ppm and can be corrected by applying a factor 1.9 to the reported GOSAT errors. For BESD-SCIAMACHY, the prediction error of 2.3 ppm multiplied by 0.9 agrees with the 2.1 ppm actual error.

We now directly compare performance of geometric and dynamic coincidence
criteria and averaging in terms of error. Figure 5 shows SCIAMACHY and GOSAT
standard deviations vs. TCCON for geometric and dynamical coincidence
criteria in the Northern Hemisphere. The stations used were those that had
entries for all comparisons, listed in the Fig. 5 caption. For

Standard deviation of SCIAMACHY and GOSAT minus TCCON for
different coincidence criteria and number of satellite observations
averaged,

Averaging matches of satellite data vs. TCCON at Lamont. As the number averaged increases, the standard deviation vs. TCCON decreases. CT2013 at the satellite vs. at CT2013 at TCCON (purple) is used to quantitate spatiotemporal mismatch error. The points are fit to Eq. (2) (black). For GOSAT the uncorrected data are also fit (black dashed). The initial guess minus TCCON standard deviation is shown as a green dashed line. We see that in this case, for GOSAT at Lamont, averaging more than about four observations improves over the initial guess.

To test the effects of spatial averaging, we calculate station by station
standard deviations of satellite–TCCON matched pairs as a function of

Bias for 3-month groups for each station, where each station is normalized to have 0 yearly bias. For satellites, stations are included when at least 20 matches are found in each season. Dynamic coincidence criteria are used. The station colors are coded by location: far NH gray, European and Park Falls red/yellow, midlatitude green, SH blue.

Bias for 3-month groups for Southern Hemisphere (left panel), 0–45

We calculate

The northern hemispheric average values, corrected by co-location and TCCON
error, are

The green dashed line in Fig. 6 shows the standard deviation of the
satellite prior vs. TCCON. Although using an optimal constraint will
result in an error lower than the prior error in the absence of systematic
errors, these satellite retrievals of CO

It is important to determine whether there are seasonally dependent biases, as these will impact flux distributions. We look at 3-month periods (DJF, MAM, JJA, SON), with the overall yearly bias at each site subtracted out to isolate the seasonal biases. To get enough comparisons, we use the dynamical criteria for satellite coincidences, as using the geometric criteria cuts down the comparisons with sufficient seasonal coverage to three stations (Park Falls, Lamont, and Wollongong). This is a simple averaging method which will later be compared to seasonal cycle amplitude fit results.

Figure 7 shows the biases for stations that have at least 20 matches in each
season, and Fig. 8 shows the results averaged by SH, 0–45

Seasonal cycle amplitude for different latitudes. Stations included for satellites are Karlsruhe, Orléans, Park Falls, Lamont, Darwin, Reunion (GOSAT), and Wollongong. Stations included for models are Sodankylä, Bialystok, Karlsruhe, Orléans, Park Falls, Lamont, Tsukuba, Saga, Darwin, Reunion, Wollongong, Lauder_120HR, and Lauder_125HR. Bold shows entries with statistically significant differences.

We compare model and satellite X

Seasonal cycle amplitude. The TCCON values are shown by the
circles. The averaging is done over 10

Yearly increases. Each comparison uses matched pairs with TCCON using locations which have at least 2 years of data for comparisons. See Table 3 for stations included. The start date and end date are averaged for the stations in each bin and are shown in the second to last column. Bold text shows one difference larger than predicted errors. The last column shows the average global yearly increase for the time period using Table 6.

The seasonal cycle amplitude is important for estimating source and sink
estimates and global distributions. Table 3 shows the seasonal cycle
amplitudes grouped by latitude. As described in Sect. 4, the error is
calculated by several different methods. The error in Table 3 is root mean
square of the end date choice, bootstrap error, averaging choice, and bin
standard deviation. The co-location error (only relevant for satellites) is
calculated as an average bias in the bin; this bias should be subtracted
from the satellite–TCCON differences to remove the effects of co-location
error. When

The significant findings from Table 3 are as follows. (1) In northern latitudes
(46–53

Findings from Sect. 3.5 which did not reach significance in Table 3 are as follows.
(1) SCIAMACHY should underestimate the seasonal cycle in 45–53

Figure 9 shows a global map of fits of the seasonal cycle amplitude of
SCIAMACHY, GOSAT, CT2013b, and MACC, with TCCON having at least 2 years of
matches shown by circles. This map shows how the results of Table 4 fit
into the global pattern (with the model fields matched to GOSAT locations
and times). Interestingly, the seasonal cycle amplitude varies
longitudinally; this pattern is seen in both satellite data sets and both
models. Since the amplitude is taken from the sampled harmonic there is no
extrapolation, although the seasonal cycle will be underpredicted at high
latitudes where there are significant data gaps. The model data in Fig. 9 is
co-located with GOSAT, so the same gaps will occur in GOSAT and the two
models, other than fit errors larger than 10 % of the amplitude, were
screened out. This map is consistent with Lindqvist et al. (2015), Fig. 8,
which also finds high values in the 45–50

The same fitting program in the above section, CCGCRV, also calculates a
yearly increase. In Table 4 we compare the fitted yearly increase for TCCON,
which ranges from 1.92 to 2.55 ppm yr

Top: cross correlation between TCCON and SCIAMACHY (top left)
and GOSAT (top right) with matches using dynamic criteria at Park Falls.
The

The left two numerical columns show standard deviation drop within

Reuter et al. (2011, JGR, Table 2) found agreement within the calculated
errors at Park Falls and Darwin for BESD-SCIAMACHY and CT2009 vs. TCCON.
However, older data sets were used for this result. Looking specifically at
Park Falls, we see 1.80

Top: cross correlation examples between TCCON and CT2013 (left) or
MACC (right). Each panel shows the correlation and second-order polynomial fits
(top) and standard deviation (bottom) vs. offset in days of TCCON vs. satellite data. The correlation should be at a maximum and standard
deviation at a minimum at days offset

This section looks at the time offset correlation and standard deviation between the test data sets and TCCON. This checks whether, for example, a seasonal cycle is delayed or ahead of the TCCON seasonal cycle, which has important implications for flux estimates (Keppel-Aleks et al., 2012), whether there are seasonally dependent biases that are affecting the seasonal cycle, and whether the data sets are seeing the same seasonal cycle.

Diurnal variability of CT2013b and MACC13.1 vs. TCCON in JJA
arranged by latitude. TCCON variability and maximum theoretical correlation
are shown, as well as actual correlation and slope for both models. The
slope is the mode vs. TCCON fit to a straight line.

To compare seasonal cycle amplitudes, all data sets have 2 ppm yr

Results of the seasonal cycle phase error are tabulated in Table 5. Stations
not shown have either too few match-ups (e.g., Sodankylä) or too little
variability compared to the noise (e.g., Wollongong) to have useful
comparisons. The GOSAT retrieval markedly improves over the prior seasonal
cycle phase vs. TCCON at 12 out of 13 stations. For the six stations that
are not locally influenced (with no

Table 5 also shows the phase differences for the models, which have closer
spatial/temporal matches and lower single-match-up errors. Model–TCCON phase
differences could result from errors in model flux distributions, seasonal
timing, or transport errors. Table 5 shows the phase differences, which vary
from

Izaña will be briefly discussed. As noted in Sect. 2.1, the TCCON station
is located on a small island at 2.37 km above sea level (about 770 hPa),
whereas the MACC and CT2013b models at

Another finding worth noting is the comparisons at Lauder. In 2010 the Lauder_125HR instrument began routine operation, while the Lauder_120HR instrument continued to take TCCON data through to the end of 2010. Both MACC and CT2013b show no seasonal cycle correlation with the 120HR time series at Lauder, but do show correlation with Lauder_125HR time series. We attribute this to the improved precision of the 125HR data, and an increase of the seasonal cycle amplitude in 2011 and 2012 as compared to other years (e.g., compare 2011 vs. 2007). The phasing error found in the CT2013b comparison with the Lauder_125HR may be due to CT2013b not modeling the drivers of the seasonal cycle amplification in 2011 and 2012.

At Bremen and Four Corners, local effects that are not resolved at 3

At the surface, CO

The CT2013b model in general shows more daily variability and higher
correlations, which are in better agreement with TCCON. Since the satellite
observations are coincident

We find standard deviations of 0.9, 0.9, 1.7, and 2.1 ppm vs. TCCON for
CT2013b, MACC, GOSAT, and SCIAMACHY, respectively, with the single target
errors 1.9 and 0.9 times the predicted errors for GOSAT and SCIAMACHY,
respectively. There is a correlation

Regarding the quality of the dynamic relative to the geometric coincidence
criteria, the coincidence error estimated by models is larger for dynamic
coincidences by about 0.3 ppm, as seen in Table 2. However, the coincidence
error (0.3 to 0.4 ppm for geometric criteria and 0.6 to 0.7 ppm for dynamic
criteria) is not the dominant error. As seen in Fig. 5, the dynamic
coincidence criteria average 0.1 ppm higher error for unaveraged satellite
comparisons. This is small compared to a total error of 2.0 and 2.2 ppm,
respectively, for stations in the Northern Hemisphere. With maximum
averaging, as seen in Fig. 5, the errors are lower for dynamic vs. geometric because
the dynamic criteria finds more observations to average. Figure 6 shows that at Lamont the average difference between geometric and
dynamic observations is 0.4 ppm for unaveraged satellite observations,
which is higher than average. This error difference reduces to less than 0.2 ppm
when all available observations are averaged, also seen in Fig. 6.
While coincidence error is an important error source, it is not the dominant
error source. Although dynamic coincidence criteria allow the inclusion of
more stations in analyses because of the larger number of coincidences,
comparisons to geometric coincidence results are done when possible. Biases
at individual stations have a year-to-year variability of

We focus on validating aspects of model and satellite data which may be
important for accurate flux estimates and CO

Biases vary by station (See Fig. 3); these station-dependent biases have a
standard deviation of

The seasonal cycle phase is a sensitive indicator of seasonally dependent biases in satellite data as well as issues with model fluxes or transport errors. The GOSAT root mean square (RMS) phase difference vs. TCCON across all sites is 16.1 days for the prior; this improves to 6.8 days for the GOSAT retrieved XCO2. The SCIAMACHY RMS phase difference vs. TCCON across all sites is 16.4 days for the prior; this improves to 13.2 days for the SCIAMACHY retrieved XCO2, reflecting the fact that SCIAMACHY data significantly improved the seasonal cycle phase at just two of the seven TCCON sites.

Model comparisons to TCCON are much less noisy as there are many more
matches. Most NH stations show the expected seasonal drop-off (e.g., see Fig. 11), with the peak correlation near 0 days, and an additional spike within

In comparing CO

Spatial and seasonal-dependent biases are obstacles to accurate and better
resolved CO

We estimate the 90 min average TCCON standard deviation error by
calculating the standard deviation of adjacent time points and model
standard deviation of CT2013b and MACC13.1 vs. TCCON by station. These
values are used to estimate theoretical maximum correlations for seasonal
cycle and diurnal correlations using Eq. (3).

Susan Kulawik set the direction of the research and did much of the
analysis. The following authors were involved with discussions of results
with specific knowledge in the listed areas: Debra Wunch, TCCON, Christopher
O'Dell, ACOS-GOSAT, Christian Frankenberg, ACOS-GOSAT, Maximilian Reuter,
BESD-SCIAMACHY, Tomohiro Oda, CarbonTracker, Frederic Chevallier, MACC,
Vanessa Sherlock, TCCON, Michael Buchwitz, BESD-SCIAMACHY, Greg Osterman,
ACOS-GOSAT, Charles Miller, CO

Funded by NASA Roses ESDR-ERR 10/10-ESDRERR10-0031, “Estimation of biases and errors of CO2 satellite observations from AIRS, GOSAT, SCIAMACHY, TES, and OCO-2”.

Maximilian Reuter and Michael Buchwitz received funding from ESA (GHG-CCI project of ESA's Climate Change Initiative) and from the University and state of Bremen.

Information about all TCCON sites and their sources of funding can be found
on the TCCON website (

Manvendra K. Dubey is grateful for the funding for monitoring at Four Corners by LANL-LDRD, 20110081DR.

Frédéric Chevallier received funding from the EU H2020 Programme
(grant agreement no. 630080, MACC III). NCEP Reanalysis data used in dynamic
coincidence criteria were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado,
USA, from their website at