Interactive comment on “ A simple empirical model estimating atmospheric CO 2 background concentrations ”

This manuscript describes a simple, empirical model,called SECM, which simulates the time and latitude dependence of the CO2 dry air mole fraction, XCO2. This model could be useful as a prior and first guess in retrievals of XCO2. The paper provides a good description of the derivation of the SECM model, and its validation against both the CarbonTracker model, upon which it is based, and the observations from TCCON.


Introduction
Our current knowledge about atmospheric CO 2 concentrations and surface fluxes at regional scales over the globe comes primarily from ground-based in situ measurements of air sampling networks and tall towers.These measurements are used by assimilation systems like NOAA's (National Oceanic and Atmospheric Administration) CarbonTracker (Peters et al., 2007(Peters et al., , 2010)), modeling global distributions of atmospheric CO 2 mixing ratios and surface fluxes.Therefore, within this publication, we consider CT2010 (Carbon-Tracker version 2010) as current knowledge and reasonable a priori estimate for atmospheric CO 2 concentrations.However, due to the sparseness of measurements, there are still large uncertainties especially on the surface fluxes (Stephens et al., 2007).Satellite and ground-based remote sensing measurements of column-average dry-air mole fractions of atmospheric CO 2 (XCO 2 ) are promising candidates to significantly reduce these uncertainties in the future (Rayner and O'Brien, 2001;Houweling et al., 2004).
Typically, a XCO 2 retrieval's sensitivity can deviate from unity within the atmospheric column.Broadly spoken, the retrieval "sees" only parts of the atmosphere and the "hidden" parts are complemented with the a priori knowledge.This means one would like to use an a priori as realistic as possible, because the retrieval result contains part of the a priori.For the same reason, one would like to use a simple, traceable a priori, so that one can always distinguish between features coming from the measurement and from the a priori.
We present a simple empirical CO 2 model (SECM), which addresses these needs but can be used for various other applications also (as discussed later).SECM is basically an empirical expression with coefficients determined by least squares fitting CT2010 XCO 2 background fields.
Describing the spatial and/or temporal distribution of atmospheric CO 2 through curve fitting or regression has a long tradition in the in situ measurement community.For individual measurement sites, e.g.Keeling et al. (1976) described the temporal evolution by a superposition of a trend component and a series of harmonic terms.Komhyr et al. (1985) applied the spline fitting technique to surface-based CO 2 measurements of NOAA's flask sampling network in order to analyze the latitudinal distribution and temporal evolution.The work of Masarie and Tans (1995) is the basis for NOAA's GLOBALVIEW (http://www.esrl.noaa.gov/gmd/ccgg/globalview/) product.They developed a spatial and temporal inter-and extrapolation scheme for NOAA's flask sampling network utilizing individual site records (and climatologies) as reference time series.SECM differs from earlier approaches as it (mainly) aims at global column averages (XCO 2 ) rather than boundary layer concentrations.Additionally, SECM is not based on reference time series but on an empirical expression only.
In the following section, a simple empirical equation estimating the global distribution of XCO 2 is given.Afterwards, an equation is presented to also estimate a simplified profile shape (Sect.3).The corresponding parameterized error covariance matrix is given in Sect. 4. This matrix describes the uncertainty (and correlation) of the a priori and correspondingly influences the weight assigned to the a priori information in an optimal estimation retrieval.In Sect. 5 SECM is validated with TCCON (total column carbon observing network) FTS (Fourier transform spectrometer) measurements.In order to prove SECM's usability as a priori information for state-of-the-art satellite and ground-based XCO 2 retrievals, we analyze the smoothing error introduced when using SECM instead of CT2010 (Sect.6).

XCO 2
Our first step aims at finding a simple empirical description of the global XCO 2 distribution.Given the coarse assumption that the longitudinal dependency of XCO 2 can statistically be neglected, we use CT2010 as an estimate for the true XCO 2 .In order to have a good estimate for background concentrations, we analyze a Pacific north/south transect being less influenced by local land sources and sinks (see Fig. 1).We then fit the parameters a 00 -a 14 of the XCO 2 estimation function X e so that the squared differences to the CT2010 transect are minimized: X e (t, l) = a 00 + a 01 t + a 02 tanh(a 03 l + a 04 ) + S(t, l). (1) As one can see, X e basically depends on the date t (in units of years since 2003) and latitude l.Geophysically, a 00 and a 01 account for a linear year-to-year increase mainly driven by anthropogenic CO 2 emissions.a 02 -a 04 define the background north/south gradient with typically larger values at northern latitudes due to anthropogenic emissions.S represents the seasonal component modulating the increase and north/south gradient depending on date and latitude.
S(t, l) = (a 05 tanh(a 06 l + a 07 ) + a 08 t) sin(2 π t + a 09 l) The seasonal component has a 12-month period with a latitudinal-dependent phase (a 09 l) and a 6-month period with a latitudinal-dependent phase (a 14 l) (see e.g.Baldocchi et al., 2001;Chamard et al., 2003).Keeling et al. (1976) showed that only little information is contained in higher harmonics.The amplitudes of both periods are defined by a 05 -a 08 and a 10 -a 13 , respectively.They can vary with latitude (e.g.due to more vegetation at northern latitudes (Conway et al., 1994)) and time (e.g.due to changing biospheric activity (Keeling et al., 1995)).shows the comparison of CT2010 and synthetic values using Eq. ( 1).
In order to quantify the quality of the estimates derived with Eq. ( 1), we used an independent data set of 10 000 globally, randomly chosen CT2010 CO 2 profiles in the period 2003-2009 from which we calculated XCO 2 .The standard deviation of the difference, referred to as standard error in the following, amounts to 0.99 ppm in total, 1.15 ppm in the Northern Hemisphere (30 • N-90 • N), 1.06 ppm in the tropics (30 • S-30 • N), and 0.92 ppm in the Southern Hemisphere (90 • S-30 • S).The correlation between both data sets is 0.97.

Profile shape
In the second step we try to find a simple empirical function x e defining the shape of a mixing ratio profile at given XCO 2 : (3) This equation estimates the mixing ratio for the pressure (height) p given in fraction of surface pressure p s , i.e. p [0, 1].The parameters c 0 and c 1 are determined similarly as a 00 -a 14 by least squares fitting to CT2010 mixing ratio profiles (see Table 1).At the pressure p t = 0.2 (also given in fraction of surface pressure), the simplified atmosphere is split into two differently handled parts (approximately troposphere and stratosphere).The first two lines of Eq. (3) only account for preserving XCO 2 , while the profile shape is defined in the last part of Eq. ( 3).The idea is to have a linear decrease (with decreasing pressure) in the stratosphere (p ≤ p t ).This accounts for slow mixing processes resulting in "older" air (with lower CO 2 mixing ratios) towards the top of the atmosphere.Within the troposphere, Eq. ( 3) approximates the profile also with a linear relation having a continuous transition to the stratosphere.In contrast to the stratosphere, the slope in the troposphere depends on the seasonal component S.This results in increasing values (with height) in the growing season, where lowest values can be expected near the surface.Figure 2 shows the estimated profiles for three examples and corresponding CT2010 profiles.Obviously, Eq. ( 3) can reproduce the CT2010 profile shape to some extent, but, especially in the lower boundary layer close to regional sources and sinks, distinct differences between SECM and CT2010 can be observed (see also Fig. 3).Additionally, the profile shapes could be improved if variations of the tropopause height were taken into account.In more complex future versions of SECM, one could realize this by, e.g., introducing additional model parameters accounting for latitudinal and/or seasonal variations of p t .

Error covariance matrix
We again use the randomly chosen data set of 10 000 CT2010 CO 2 profiles to derive the error covariance matrix of SECM in comparison to CT2010. Figure 3 shows the error correlation matrix and the corresponding profile of the standard deviation of the difference between SECM and CT2010.We now use a simple correlation model (also used as an example in the textbook of Rodgers, 2000) to parameterize the correlation matrix C: Here, p i and p j are the normalized pressure values of layers i and j , ξ is the correlation length.Least squares fitting of the measured and parameterized error correlation matrix results in an optimal correlation length of ξ = 0.30.The profile of standard deviations σ i was parameterized with (see Fig. 3) The elements of the parameterized error covariance matrix S can now be calculated with The parameters of Eq. ( 5) have been chosen in a way that they subjectively fit the profile of standard deviations.Additionally, the chosen parameters ensure that the XCO 2 variance which can be calculated from S is consistent with the variance directly calculated from the XCO 2 difference between SECM and CT2010.
It should be kept in mind that the parameterized covariance matrix only describes errors of SECM in respect to CT2010.The total error consists of an additional part because of differences between CT2010 and true atmospheric profiles.This means the parameterized covariance matrix can only be a reasonable approximation of the total error if the total error is dominated by the differences between SECM and CT2010.The differences between CarbonTracker and ground-based FTS measurements shown in the publications of, e.g., Reuter et al. (2011), Schneising et al. (2011), andKeppel-Aleks et al. (2012) indicate that this is probably not always the case.Therefore, a more realistic estimate of the total covariance structure could be determined by either deriving one covariance matrix from a comparison of SECM vs. truth or by combining two covariance matrices from a comparison of SECM vs. CT2010 (shown here) and a comparison of CT2010 vs. truth (similar to the work of Eguchi et al., 2010).

Comparison with TCCON
From Sect. 2 we already know that the synthetic XCO 2 generated with SECM follows CT2010 quite well statistically.In this section, synthetic XCO 2 values are compared with TC-CON measurements.From the results we can estimate how well SECM reproduces reality.For each "good" flagged TC-CON measurement in the period 2006-2010, we computed a corresponding SECM value (Fig. 4).SECM agrees with an average standard error of 1.39 ppm with TCCON (even though SECM has no diurnal component).This agrees reasonably well with the 0.99 ppm error obtained in comparison with CT2010 (Sect.2), given the fact that TCCON measurements have a single measurement precision of about 0.6 ppm (Toon et al., 2009).The station-to-station bias (standard deviation of all station biases) amounts to 0.47 ppm, which is comparable to the TCCON accuracy (1σ ) of about 0.4 ppm (Wunch et al., 2010).
Despite the overall good statistical agreement, one can find some small but systematic deviations at some of the TCCON sites.At Bialystok, Bremen, and Park Falls, one can find less pronounced seasonal amplitudes resulting in a too slow spring drawdown and fall increase.At Darwin, the curvature in the SECM time series does not agree well with TCCON.However, the seasonal cycle is less pronounced here and differences become more apparent.The reasons for these deviations can be found in the simplicity of SECM but also in shortcomings of CT2010 (e.g.Reuter et al., 2011;Schneising et al., 2011;Keppel-Aleks et al., 2012).

Smoothing error
The column averaging kernel (vector) of a XCO 2 retrieval describes its height-(or pressure-) dependent sensitivity to the true CO 2 mixing ratio.A perfect retrieval would have an averaging kernel which is unitary in every height under every measurement condition.Unfortunately, reality is different and averaging kernels vary from unity.This results in the so-called smoothing error, which is non-zero if the retrieval's a priori CO 2 profile differs from the true profile.In the following, we calculate the smoothing error profile x, which would be introduced when using SECM (x secm ) instead of CT2010 (x ct ) as a priori profile.
Here A is the diagonal column averaging kernel matrix, which is defined by the retrieval's column averaging kernel (vector).The column-average smoothing error X, i.e. the XCO 2 smoothing error, can be derived by integration of Eq. ( 7) over all (dry-air) pressure intervals p: Figure 5 shows typical averaging kernels of three state-ofthe-art satellite-based full physics retrievals and the TCCON FTS retrieval algorithm (Washenfelder et al., 2006;Wunch et al., 2011).The satellite retrievals are SCIAMACHY BESD (Bremen optimal estimation DOAS, Reuter et al., 2010), GOSAT RemoTeC (developed at SRON, Butz et al., 2009), and GOSAT UOL-FP (University of Leicester Full Physics algorithm, Connor et al., 2008;Bösch et al., 2011).The averaging kernels depend not only on the instrument but also on the retrieval technique.This explains the differences between the averaging kernels of GOSAT RemoTeC and GOSAT UOL-FP.We used the averaging kernels to calculate the smoothing error X, which would have been introduced when using SECM instead of CT2010 as a priori profiles.For this purpose, we analyzed the 10 000 profiles of the randomly chosen data set (used before) and corresponding SECM profiles.The results are summarized in Table 2, which also shows the smoothing error introduced by a constant 380 ppm mixing ratio profile (as benchmark) and a constant mixing ratio profile with XCO 2 calculated by SECM.Our results show that it is always better to use SECM profiles instead of a constant 380 ppm profile.Using SECM profiles with height-constant mixing ratios only slightly enhances the smoothing error.Reuter et al. (2011) estimated the single measurement precision of BESD with 2.5 ppm; they found station-to-station biases having a standard deviation of about 0.8 ppm.This means the smoothing error of 0.83 ppm resulting from constant a priori profiles is comparable to BESD's accuracy.In contrast to this, SECM reduces the smoothing error to 0.17 ppm being distinctively lower than BESD's accuracy and precision.
The averaging kernel of the GOSAT UOL-FP retrieval is similar to BESD's averaging kernel.Consequently, the resulting smoothing errors are very similar (0.62 ppm Compared to BESD and UOL-FP, the TCCON FTS retrieval and also the GOSAT RemoTeC retrievals have averaging kernels that are closer to unity.For this reason, the observed improvement by using SECM instead of a constant a priori profile is less pronounced.All corresponding smoothing error values are equal to or less than 0.08 ppm and, therefore, distinctively lower than the FTS instrument's accuracy (0.4 ppm, Wunch et al., 2010) and precision (0.6 ppm, Toon et al., 2009).
However, averaging kernels change, e.g., with the solar zenith angle, so that the effect can be more pronounced under other viewing geometries.In all four cases (Table 2), the SECM introduced smoothing error is significantly lower than the estimated model transport error of about 0.5 ppm (Houweling et al., 2010).This becomes important when doing surface flux inverse modeling.
Note: (i) Statistically, the smoothing error is not necessarily a systematic error, because the averaging kernel as well as the difference between SECM and truth can vary from measurement to measurement.(ii) The smoothing error becomes less important if XCO 2 retrievals are used in an inverse modeling framework, accurately employing soundingby-sounding averaging kernels within the assimilation process.However, in this case, the retrieval still profits from a well chosen first guess linearization point, which typically results in better convergence behavior.

Conclusions
We presented a simple empirical model (SECM), which can be used to simulate atmospheric CO 2 background concentrations in form of mixing ratio profiles and XCO 2 .We assumed that CT2010 represents our current knowledge on the global distribution of XCO 2 , which can be gained (mainly) from surface-based flask measurements.Therefore, we used CT2010 to determine the free parameters of the proposed empirical model.SECM is able to reproduce CT2010 with a standard error of 0.99 ppm and a correlation of 0.97.In other words, a simple empirical equation (depending only on date and latitude) explains more than 94 % of CT2010's variability within the analyzed time period (including, e.g., CO 2 weather), i.e. of our current knowledge on atmospheric CO 2 concentrations.
The atmospheric CO 2 profiles simulated by SECM have a linear pressure dependency with different slopes in troposphere and stratosphere.The standard error profile has values between 1 ppm and 2 ppm over large parts of the atmosphere, which means that SECM is able to roughly reproduce the profile shape.Larger deviations are found especially near the surface, where the influence of local sources and sinks is largest.In addition to SECM estimating XCO 2 and CO 2 profiles, we proposed a simple parameterization of the error covariance matrix, so that SECM can be used as a priori knowledge in an optimal estimation framework without additional external information.
We compared SECM XCO 2 not only with CT2010 but also with TCCON FTS measurements.The average standard error of 1.39 ppm agrees reasonably well with 0.99 ppm found when comparing SECM with CT2010.The standard deviation of all station-to-station biases amounts to 0.47 ppm, which is consistent with TCCON's accuracy of about 0.4 ppm.
The TCCON comparison goes one year beyond the fitting period 2003-2009.As we found no obvious problems in 2010, we conclude that SECM is also (at least to some extent) able to extrapolate into the future.In the case of extrapolating into a more distant future or past, it would be advantageous to replace the linear increase of Eq. ( 1) by an exponential or polynomial term.This, however, could require a longer fitting period to produce stable results.Additionally, one could think of rejecting the time dependency of the seasonal amplitude (a 08 and a 13 ) when extrapolating from a short fitting period.
We analyzed the smoothing error introduced by using SECM instead of CT2010 in order to assess the usability of SECM as a priori profiles.For this purpose, we used typical averaging kernels of four state-of-the-art XCO 2 retrieval algorithms.Our analysis basically shows two things: (i) Using SECM instead of constant a priori profiles reduces the smoothing error; (ii) The smoothing error due to SECM is distinctively lower than the expected retrieval error and typical model transport errors.Therefore, one can conclude that SECM is well suited to be used as a priori information for the analyzed (or comparable) retrieval techniques.Using SECM also as first guess linearization point furthermore has the potential to enhance the convergence behavior of an iterative retrieval.
Of course, SECM cannot compete with physics-based models like CarbonTracker, because it is only a coarse statistical description of the past.Under no circumstances will it be able to capture any event deviating from this statistic, i.e. it is not possible to learn anything new from SECM.However, SECM has some distinct benefits: (i) SECM is extremely simple and can be implemented with minimal ef-fort; (ii) SECM results are easily reproducible without the need for significant disk space or computing power; and (iii) SECM is always available.
Beyond the application for a priori information, SECM can be used for several other applications.Due to its availability, SECM can be used in a near real-time environment or for observing system simulation experiments especially for future satellite missions (e.g.Bovensmann et al., 2010).Its accuracy meets the requirements to be used as XCO 2 background in "CO 2 proxy" methods for XCH 4 retrievals (e.g.Frankenberg et al., 2005;Schneising et al., 2009).

Fig. 5 .
Fig. 5. Typical column averaging kernels of four different XCO 2 retrieval systems: FTS TCCON, SCIAMACHY BESD, GOSAT Re-moTeC, and GOSAT UOL-FP.The FTS TCCON column averaging kernel is typical for a solar zenith angle for 50 • .The averaging kernels of SCIAMACHY BESD and GOSAT RemoTeC represent global mean averaging kernels for August 2009.The averaging kernel of GOSAT UOL-FP is the global mean averaging kernel for September 2009.