Validation of five years (2003–2007) of SCIAMACHY CO total column measurements using ground-based spectrometer observations

This paper presents a validation study of SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) carbon monoxide (CO) total column measurements from the Iterative Maximum Likelihood Method (IMLM) algorithm using ground-based spectrometer observations from twenty surface stations for the five year time period of 2003–2007. Overall we find a good agreement between SCIAMACHY and ground-based observations for both mean values as well as seasonal variations. Correspondence to: A. T. J. de Laat (laatdej@knmi.nl) For high-latitude Northern Hemisphere stations absolute differences between SCIAMACHY and ground-based measurements are close to or fall within the SCIAMACHY CO 2σ precision of 0.2× 1018 molecules/cm2 (∼10%) indicating that SCIAMACHY can observe CO accurately at high Northern Hemisphere latitudes. For Northern Hemisphere mid-latitude stations the validation is complicated due to the vicinity of emission sources for almost all stations, leading to higher ground-based measurements compared to SCIAMACHY CO within its typical sampling area of 8 ◦ × 8. Comparisons with Northern Hemisphere mountain stations are hampered by elevation effects. After accounting for these effects, the validation provides satisfactory results. Published by Copernicus Publications on behalf of the European Geosciences Union. 1458 A. T. J. de Laat et al.: Validation of five years (2003–2007) of SCIAMACHY CO At Southern Hemisphere midto high latitudes SCIAMACHY is systematically lower than the ground-based measurements for 2003 and 2004, but for 2005 and later years the differences between SCIAMACHY and ground-based measurements fall within the SCIAMACHY precision. The 2003–2004 bias is consistent with previously reported results although its origin remains under investigation. No other systematic spatial or temporal biases could be identified based on the validation presented in this paper. Validation results are robust with regard to the choices of the instrument-noise error filter, sampling area, and time averaging required for the validation of SCIAMACHY CO total column measurements. Finally, our results show that the spatial coverage of the ground-based measurements available for the validation of the 2003–2007 SCIAMACHY CO columns is sub-optimal for validation purposes, and that the recent and ongoing expansion of the ground-based network by carefully selecting new locations may be very beneficial for SCIAMACHY CO and other satellite trace gas measurements validation efforts.

For high-latitude Northern Hemisphere stations absolute differences between SCIAMACHY and ground-based measurements are close to or fall within the SCIAMACHY CO 2σ precision of 0.2 × 10 18 molecules/cm 2 (∼10%) indicating that SCIAMACHY can observe CO accurately at high Northern Hemisphere latitudes.
For Northern Hemisphere mid-latitude stations the validation is complicated due to the vicinity of emission sources for almost all stations, leading to higher ground-based measurements compared to SCIAMACHY CO within its typical sampling area of 8 • × 8 • . Comparisons with Northern Hemisphere mountain stations are hampered by elevation effects. After accounting for these effects, the validation provides satisfactory results.
Published by Copernicus Publications on behalf of the European Geosciences Union.  (2003)(2004)(2005)(2006)(2007) of SCIAMACHY CO At Southern Hemisphere mid-to high latitudes SCIA-MACHY is systematically lower than the ground-based measurements for 2003 and 2004, but for 2005 and later years the differences between SCIAMACHY and ground-based measurements fall within the SCIAMACHY precision.  bias is consistent with previously reported results although its origin remains under investigation.
No other systematic spatial or temporal biases could be identified based on the validation presented in this paper.
Validation results are robust with regard to the choices of the instrument-noise error filter, sampling area, and time averaging required for the validation of SCIAMACHY CO total column measurements.
Finally, our results show that the spatial coverage of the ground-based measurements available for the validation of the 2003-2007 SCIAMACHY CO columns is sub-optimal for validation purposes, and that the recent and ongoing expansion of the ground-based network by carefully selecting new locations may be very beneficial for SCIAMACHY CO and other satellite trace gas measurements validation efforts.

Introduction
The SCIAMACHY instrument (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY; launched March 2002) onboard of the ENVISAT satellite (Bovensmann et al., 1999) has provided over five years of carbon monoxide (CO) data based on reflected sunlight measurements in the short-wave infrared around 2.3 µm.
Validation of SCIAMACHY CO with Ground Based Spectrometer (GBS) observations is complicated by the need for spatio-temporal averaging to obtain an acceptable precision of the SCIAMACHY CO columns. Furthermore, for most geolocations SCIAMACHY measures only once every six days. The irregular temporal sampling of the ground-based measurements and the occurrence of clouds significantly reduces the number of truly collocated measurements. Combined with the sparse GBS network, validation of SCIA-MACHY CO observations with ground-based measurements has been quite limited so far. Dils et al. (2006) presented a first validation using 11 GBS stations for one year (2003) of SCIAMACHY CO columns. Their results clearly showed that validation with GBS observations was difficult. They concluded that the data set used was too small to make an honest assessment of whether monthly mean values over their collocation grid of 2.5 • × 10 • or 5 • × 10 • latitude-longitude do reach the target precision of 10% for CO. Furthermore, they found that the SCIAMACHY measurements for 2003 "exhibited clear flaws".
Other validation studies used GBS observations at a single location on a mountain top (Sussmann and Buchwitz, 2005) or GBS measurements from a measurement campaign on board of a ship (Warneke et al., 2005).
Results from the Iterative Maximum Likelihood Method (IMLM) retrieval algorithm -developed at the Netherlands Institute for Space Research (SRON) -were also used in the Dils et al. (2006) study. However, this algorithm has been improved since and the length of the observational record now covers five years (2003)(2004)(2005)(2006)(2007) including observations over both land and oceans (Gloudemans et al., 2009). The ocean observations greatly improve the spatial coverage of SCIAMACHY CO observations considerably and enhance the possibilities for validation of SCIAMACHY CO total column measurements with GBS measurements as a number of GBS stations are located on islands or close to sea.
In this paper we present a validation of five years (2003)(2004)(2005)(2006)(2007) of SCIAMACHY CO observations from the IMLM algorithm using twenty GBS stations. In previous studies we used the TM4 chemistry-transport model to quantify various effects that hamper the validation (de Laat et al., 2007(de Laat et al., , 2010Gloudemans et al., 2009). This approach is also used in this study. SCIAMACHY measurements do not provide any information about the vertical distribution of CO, making the GBS CO total column measurements the obvious observational data for validation rather than aircraft data and vertical CO profiles used for the validation of infrared satellite measurements of CO.
This paper is organized as follows: Sect. 2 briefly describes the IMLM retrieval algorithm, GBS measurements and the TM4 model. Section 3 shows the GBS observations and describes the choice of the sampling area. Section 4 presents the validation of the SCIAMACHY CO measurements using the GBS observations, and in Sect. 5 we investigate the sensitivity of the validation results to the sampling area size, instrument-noise error filter, and the target precision. Section 6 ends the paper with a summary and conclusions.

SCIAMACHY CO
For this study we use SCIAMACHY CO total columns retrieved with the IMLM algorithm version 7.4 in the shortwave infrared wavelength range between 2324.5-2337.9 nm (Gloudemans et al., 2008(Gloudemans et al., , 2009. This spectral region is sensitive to the whole column, with almost uniform sensitivity from 200 hPa down to the surface (Gloudemans et al., 2008). In this paper, we assume that the SCIAMACHY CO total column is the real total column. De Laat et al. (2010) estimated that the effects of the SCIAMACHY CO a priori and averaging kernel were of the order of only a few percent.
Single SCIAMACHY CO measurements have large instrument-noise errors -typically of the order of 10-100% of the total CO column value (de Laat et al., 2007). Hence, obtaining valuable information about CO from SCIA-MACHY requires averaging multiple measurements and  Table 1. The color coding of the stations is the same as in Fig. 2. weighing them with their corresponding instrument-noise errors. Several studies have shown that reducing the instrument-noise error by averaging multiple measurements yields useful information about CO (de Laat et al., 2006Gloudemans et al., 2006). De Laat et al. (2007 estimated the SCIAMACHY CO precision is approximately 1 × 10 17 molecules/cm 2 . In this study we use the averaging method introduced in Gloudemans et al. (2009) where observations for a selected area are averaged in time until a given threshold instrumentnoise error is reached. The standard threshold instrumentnoise error used in this paper is 1 × 10 17 molecules/cm 2 . We thus construct a time series of time-area average SCIA-MACHY observations for which the time intervals vary in length, but the averages all have the same instrument-noise error (rather than having averages for constant time intervals but with varying instrument-noise errors). This time series then is compared to the GBS observations. If multiple GBS observations fall within a SCIAMACHY CO time-interval, they are averaged arithmetically. We vary neither the area size nor the threshold instrument-noise error during the averaging procedure. However, we will test the sensitivity of our results to choices in area size and instrument-noise errors later on. Finally, we use SCIAMACHY CO observations over both land as well as ocean measurements over low altitude clouds between the surface and 800 hPa using the same selection criteria as in Gloudemans et al. (2009) andde Laat et al. (2010). This greatly improves spatio-temporal coverage as discussed in these papers. However, using measurements over low altitude clouds means that only the partial CO column above the cloud is observed. The effect this has on the validation is quantified by estimating the below-cloud CO partial column from TM4 model results.

Ground-based data
The ground-based CO observations used in this study are collected at twenty locations worldwide, mainly from Fourier Transform Spectrometers (Fig. 1). The locations and altitudes of the stations are summarized in Table 1. The GBS observations represent daytime solar absorption measurements under clear sky conditions. For most stations CO columns from thermal infrared spectra around 4.7 µm have been used, except for Darwin for which the short-wave infrared CO spectral features around 2.3 µm are used -the same spectral window as used for the SCIAMACHY CO retrievals. For the two Russian stations Zvenigorod and St-Petersburg CO total column amount are derived based on direct solar IR spectra in the 4.7 µm CO absorption band using grating spectrometers (spectral resolution ∼0.2-0.4 cm −1 ) equipped with a sun-tracking system (Dianov-Klokov, 1984;Dianov-Klokov et al., 1989;Mironenkov et al., 1996;Makarova et al., 2004). For ten of the stations data has been obtained from the public database from the Network for the Detection of Atmospheric Composition Change (NDACC; http://www.ndacc. org) (cf.  (2003)(2004)(2005)(2006)(2007) of SCIAMACHY CO station the measurement data used here were obtained directly from the University of Liège. Data from the Réunion station were provided by the Belgian Institute for Space Aeronomy (BIRA) (Senten et al., 2008;Duflot et al., 2010), and observations for Darwin have been kindly provided by the University of Wollongong (Paton-Walsh et al., 2010;. Observations from Garmisch-Partenkirchen were provided by the Karlsruhe Institute of Technology (IMK-IFU) in Garmisch-Partenkirchen (Borsdorff and Sussmann, 2009). Both Darwin and Garmisch-Partenkirchen are official TCONN sites (Total Carbon Column Observing Network Toon et al., 2009). Measurements from two Japanese stations, Rikubetsu and Moshiri, were provided by the Solar-Terrestrial Environment Laboratory (STEL) of Nagoya University in Japan. A description of both Japanese sites and analysis of the measured CO columns for the period 1997-2005 can be found in Nagahama and Suzuki (2007).
Typical reported errors for GBS columns are 5% or less, although this varies from station to station. Nevertheless, these errors are considerably smaller than the single SCIA-MACHY CO column measurements and also smaller than the estimated SCIAMACHY precision, hence we ignore GBS errors for the remainder of the paper.
The effect of the GBS averaging kernels -also referred to as "smoothing error" -is small. Barret et al (2003) reports a smoothing error of 0.6% for the Jungfraujoch measurements while Senten et al. (2008) reports a smoothing error of 0.3% for La Réunion. Both studies use infrared measurements. Paton-Walsh et al. (2005) reports a smoothing error of 5.8% for Darwin for near-infrared measurements around 2.3 µm, similar to the smoothing error reported for SCIAMACHY (de Laat et al., 2010) which also observes at the same wavelengths.

Global chemistry-transport model TM4
We use the TM4 chemistry-transport model for the years 2003 to 2007 to quantify various effects that are important for the comparison of SCIAMACHY and GBS measurements. This model was also used in de Laat et al. (2007Laat et al. ( , 2010 and Gloudemans et al. (2009) and is described in more detail in Meirink et al. (2006). The horizontal resolution of TM4 is 3 • × 2 • longitude-latitude, and vertically 25 levels are used for years prior to 2006 and 34 levels from 2006 onwards because of a change in the number vertical layers -from 60 to 91 -used by the European Centre for Medium-Range Weather Forecasts (ECMWF) for their operational data. Meteorological ECMWF analysis input fields used in TM4 are pre-processed as described in Bregman et al. (2003). Actual biomass burning emission estimates are taken from the Global Fire Emission Database (GFED), version 2 (van der . Anthropogenic emissions are based on the EDGAR v3 emission database (van Aardenne et al., 2001) and are modified to be representative of the year 2000 with a total of 331 Tg CO/year for fossil fuels . Oceanic and natural emissions are 40 and 75 Tg CO/year, respectively, as described in Houweling et al. (1998). Total biogenic emissions are 94 Tg CO/year .
De Laat et al. (2007Laat et al. ( , 2009 presented validation of this model simulation for two years of observations using in situ surface CO measurements from the Global Monitoring Division (GMD) database. The results showed that in the Southern Hemisphere (SH) average CO surface concentrations agree very well, whereas in the Northern Hemisphere (NH) the model underestimates surface CO by 10-20% for nearly all stations. The agreement was better for background stations than for stations close to large emission sources and the seasonal cycle of remote locations was closely matched by the model. These results suggest that the observed spatiotemporal CO variability is well reproduced by the model but that the model results contain a widespread Northern Hemisphere bias. This finding is consistent with Shindell et al. (2006) who drew similar conclusions based on a multimodel analysis of CO using both satellite and in situ measurements, and who attributed this bias to underestimated East Asian emissions in the TM4 model. , with a wintertime maximum and summertime minimum related to photochemical destruction by OH, which is strong during boreal summer and weak during boreal winter. This leads to accumulation of CO in the Northern Hemisphere during autumn and winter and a strong decrease of CO during spring. A detailed discussion of the CO seasonality as seen in GBS observations can be found in Yurganov et al. (2005).

GBS columns and seasonal cycles
The largest amplitudes occur for the Russian locations in Zvenigorod and St. Petersburg and the Japanese stations Moshiri and Rikubetsu (Fig. 2b). The Russian stations can be affected by nearby forest and peat fires and are close to isolated major industrial areas (Yurganov et al., 2008). The Japanese stations are located under the outflow of East Asian pollution (Koike et al., 2006) and can also be influenced by Siberian fires (Nagahama and Suzuki, 2007). The Bremen, Garmisch-Partenkirchen and Egbert stations show seasonal cycles more similar to the Northern Hemisphere highlatitude stations in Fig. 2b where the variability is dominated by photochemical destruction of OH. Figure 2c shows seasonal cycles of Northern Hemisphere mountain stations. The CO columns and amplitudes of the seasonal cycles are smaller than the Northern Hemisphere  (Table 1). high-latitude stations which is related to the missing lowest 2-3 km of the troposphere where large emissions and photochemical destruction occur. Note that for Kitt Peak no observations are available beyond 2005 (Fig. 2c). Figure 2d shows the Southern Hemisphere stations, where the seasonal cycle is shifted by 6 months compared to the Northern Hemisphere stations. Both Arrival Heights and Lauder are remote from CO sources and show little variation on short timescales. On the other hand, Wollongong and Darwin, Australia, are located close to emission sources and show large increases in CO related to near-by forest fires (Paton-Walsh et al., 2005. For Réunion limited observations are available but nevertheless the increase in CO during the tropical biomass burning season in the southern half of Africa is present, when Réunion is located under outflow of African biomass burning plumes (Duflot et al., 2010;Senten et al., 2008).

The area for comparison
Because of the large SCIAMACHY CO instrument-noise errors a direct comparison of individual SCIAMACHY measurements with GBS CO total columns is not valuable. As a result, spatial and/or temporal averaging of the SCIA-MACHY CO columns is required to reduce the instrumentnoise error. As explained in Sect. 2.1, we use spatiotemporal averaging where for a selected area around the ground-based station -the so-called sampling area, we average in time until a threshold instrument-noise error of 1 × 10 17 molecules/cm 2 is reached. A weighted average is computed using the SCIAMACHY instrument-noise errors as the scaling factor (cf. de Laat et al., 2007).
Two considerations are important for deciding on an optimal sampling area. The larger the sampling area, the more SCIAMACHY CO measurements are available, and thus the smaller the temporal resolution of the average. However, the larger the sampling area, the less representative the corresponding SCIAMACHY CO column may be of the true local CO column derived from ground based GBS measurements. There thus is a trade-off between the sampling area size and the time resolution. We calculated three statistics of the SCIAMACHY-GBS comparison for sampling square area sizes ranging from 1 • × 1 • to 20 • × 20 • latitude and longitude: the mean bias, root-mean-square (rms) difference -which is a measure for the representativeness of the selected area and the scatter in the measurementsand the total number of SCIAMACHY measurements used for the comparison. Figure 3a shows the mean SCIAMACHY-GBS difference as a function of sampling area size for each GBS location and Fig. 3b shows that the root-mean-square of the differences between SCIAMACHY and GBS CO total columns. The largest change in the absolute and rms difference occurs for small sampling area sizes. Beyond a sampling area size of 8 • × 8 • degrees differences remain nearly constant. This indicates that with increasing sampling area size the SCIAMACHY CO columns become less representative of the GBS locations. Figure 3c shows that the number of SCIAMACHY measurements used in the comparison increases with increasing sampling area size -as expected. For the best SCIAMACHY-GBS comparison one would like the rms differences to be as small as possible -i.e. a small sampling area (Fig. 3b) -yet the number of observations as large as possible -i.e. a large sampling area (Fig. 3c). Hence, the deciding factor is the change of the mean difference as function of the sampling area (Fig. 3a). Since beyond a sampling area of 8 • × 8 • the mean differences do not change much, we start by investigating results for the smallest area size beyond which the differences are more or less constant, which is a square area of 8 • × 8 • degrees around the GBS location. However, because of the weak dependence of rms differences on sampling area size we will later on also discuss validation results for larger sampling area sizes.  Figure 4 shows a scatter plot of all GBS CO total columns and corresponding SCIAMACHY CO total columns for the 8 • × 8 • degree areas, using the method described above. The scatter plot shows that the observations are close to the 1:1 line, but there is a considerable scatter and there are clear differences between locations. In the next section the difference for each station are discussed in detail.  Figure 5a shows the time series of GBS and SCIAMACHY CO total columns for the Southern Hemisphere locations Arrival Heights, Lauder, Wollongong, Darwin, and Réunion. The corresponding statistics can be found in Table 2. For these stations the 8 • × 8 • sampling area includes many measurements over clouded ocean scenes. For these measurements the part of the column below the cloud is estimated based on TM4 model results and is added to the measured SCIAMACHY column above the cloud (cf. Sect. 2.1; Gloudemans et al., 2009;de Laat et al., 2010). For Arrival Heights we only took SCIAMACHY observations over oceans because over land SCIAMACHY observes mainly over the high altitude interior of Antarctica causing an altitude difference. The SCIAMACHY and GBS observations show similar seasonal cycles, but SCIAMACHY underestimates the CO total columns on average by 0.1-0.49 × 10 18 molecules/cm 2 (   Fig. 4. For SCIAMACHY ocean measurements the estimated TM4 column below the cloud has been added to the SCIAMACHY CO partial column.

Southern Hemisphere locations
appears to slightly underestimate CO in 2006. Nevertheless, Table 2 shows that the average differences for both Réunion and Darwin are small and close to the estimated SCIAMACHY precision.
Differences for Wollongong are larger than for the other stations, but Wollongong is affected by local forest fires and orography that increase local CO amounts. As a result, Wollongong GBS CO total columns are less representative of the surrounding areas as measured by SCIAMACHY than the Arrival Heights and Lauder CO total columns.
For year to year changes, the comparison at Arrival Heights and Lauder shows that the bias is not constant over time (Fig. 5e). Annual mean differences between SCIA-MACHY and GBS for 2003 and 2004 are −0.34 × 10 18 and −0.43 × 10 18 molecules/cm 2 for Arrival Heights and −0.28 × 10 18 and −0.31 ×1 0 18 molecules/cm 2 for Lauder, respectively. For 2005 and 2007 the differences are −0.12 × 10 18 , −0.02 × 10 18 and 0.07 × 10 18 molecules/cm 2 for Arrival Heights and −0.11 × 10 18 , −0.12 × 10 18 and 0.05 × 10 18 for Lauder, respectively. These differences are considerably smaller than the differences for the years 2003 and 2004 and are close to or within the estimated SCIAMACHY precision. A similar behavior is not found for other locations. The origin of the SCIAMACHY Southern Hemisphere middle and high latitude bias is currently under investigation.
The rms differences are larger than what is expected based on the instrument-noise error. This may to some extent be related to representation differences, i.e. SCIAMACHY averages are representative for a larger area than the GBS averages. As a result, it can be expected that for larger comparison areas the rms differences increase, and that larger rms differences occur for GBS locations that are more affected by local emissions. Stations affected by local emissions like the European continental stations or the Australian stations Darwin and Wollongong have larger rms differences than more remote high latitude European and Southern Hemisphere stations like Lauder and Arrival Heights ( Fig. 3b and Table 2). For the mountain stations Jungfraujoch and Zugspitze SCIAMACHY columns are on average 0.76 × 10 18 and 0.62 × 10 18 molecules/cm 2 larger than GBS CO columns. However, the comparison for Garmisch-Partenkirchen (see also Table 2, Fig. 5c) -located at 745 m above sea level at the foot of the Zugspitze mountain -shows no significant bias. The Jungfraujoch and Zugspitze measurement sites are located at approximately 3600 and 3000 m altitude, respectively. The SCIAMACHY measurements are more representative for the low altitude area north of the Alps as the average elevation within the 8 • × 8 • sampling area around Zugspitze and Jungfraujoch which is only about 500 m which is comparable to the altitude of Garmisch-Partenkirchen. The mean difference in CO total columns between collocated Garmisch-Partenkirchen en Zugspitze measurements is 0.64 × 10 18 molecules/cm 2 , which is nearly similar to the SCIAMACHY-Zugspitze differences. The larger bias for Jungfraujoch compared to Zugspitze is related to the higher altitude of Jungfraujoch compared to Zugspitze: the CO columns for Jungfraujoch are clearly lower than those for Zugspitze (cf. Fig. 2c) whereas the SCIAMACHY measurements within the comparison areas round both stations are comparable. Note that only taking SCIAMACHY observations over the Alps with ground scene altitudes similar to that of Zugspitze or Jungfraujoch results is not possible due to insufficient SCIAMACHY collocations. Kitt Peak (Arizona, USA) is located at 2100 m altitude surrounded by a high dry plateau remote from large CO sources. The SCIAMACHY 8 • × 8 • sampling area has an average altitude of 1000 m, but since the SCIAMACHY observations are weighted with the instrument-noise error which is smaller for dry locations because of the higher surface reflectance, and since dry locations have more cloud-free observations, the effective altitude of the SCIAMACHY observations within the sampling area is about 1500 m, close to that of the Kitt Peak station. Hence, the mean SCIA-MACHY CO column should be representative for the Kitt Peak measurements. Indeed for Kitt Peak the differences between SCIAMACHY and GBS (0.02 × 10 18 molecules/cm 2 ; 1%) are well within the precision of the SCIAMACHY data.

Northern Hemisphere mountain stations
For Izaña differences also fall within the precision of the SCIAMACHY data: 0.09 × 10 18 molecules/cm 2 (6%) because only SCIAMACHY observations over clouds with cloud top heights comparable to the Izaña station heights have been taken into account, and the location is remote of any large CO emission regions.
Mauna Loa shows a larger difference of 0.21 × 10 18 molecules/cm 2 (20%) between SCIAMACHY and GBS, but this is still relatively small (twice the estimated SCIAMACHY precision). Given the limited number of correlative observations available for Mauna Loa (5) this larger difference may be a spurious result.

Northern Hemisphere mid-latitude low altitude stations
In this section we analyze observations from the low altitude stations Zvenigorod (near Moscow), St. Petersburg, Egbert (Canada), Garmisch-Partenkirchen and Bremen (Germany), Moshiri and Rikubetsu (Japan). For Zvenigorod, St. Petersburg, and Egbert GBS columns are larger than SCIAMACHY columns by 0.53 × 10 18 , 0.44 × 10 18 and 0.43 × 10 18 molecules/cm 2 , respectively (∼20%). All three stations are located close to large industrial areas or cities, which in case of the Russian locations are rather isolated CO sources. Furthermore, GBS measurements at Zvenigorod may also have been affected by local peat fires (Yurganov et al., 2009). The corresponding GBS measurements are thus likely affected by local emissions and therefore less representative for a larger SCIAMACHY sampling area around these locations. Note that for Zvenigorod the SCIAMACHY CO columns are unrealistically low in 2006. To a lesser extent this is also seen for St Petersburg and Bremen as well as for Jungfraujoch and Izaña. At the moment an explanation for this behavior is lacking.
However, approximately 150 km further south east at the location of Rikubetsu the difference is only −0.10 × 10 18 molecules/cm 2 . Given the sampling area size of 8 • × 8 • there is considerable overlap in the SCIAMACHY measurements used for the comparisons for these stations, hence the variation in differences is unexpected. For both stations many ocean measurements are used in the comparison, but a check with clouds between the surface and 900 hPa rather than 800 hPa indicates that the differences between SCIAMACHY and GBS columnsincluding the filling of the SCIAMACHY columns with the TM4 values below the cloud -do not change significantly. Also, the differences hardly depend on the sampling area size (see Fig. 2), the bias difference is relatively small compared to the seasonal cycles, and there is an excellent agreement between SCIAMACHY and GBS seasonalities. This all suggests that the differences between Rikubetsu and Moshiri are robust. One possible explanation could be that Moshiri is slightly affected by local pollution, but this requires further investigation.

Northern Hemisphere high latitude stations
Harestua, Kiruna and Ny Alesund all show similar GBS-SCIAMACHY differences and seasonal cycles ( Fig. 5d and Table 2). All three stations are located close to or within oceans, hence mostly ocean measurements are used in the comparisons. After addition of the partial CO column below the cloud using TM4 results the differences become small compared to the 2-σ SCIAMACHY noise error: −0.11 × 10 18 molecules/cm 2 for Harestua, −0.02 × 10 18 for Kiruna and −0.03 × 10 18 molecules/cm 2 for Ny Alesund. For the latter two stations part of the seasonal cycle cannot be observed for these locations as there are little or no SCIAMACHY observations available during Northern Hemispheric winter because of the high solar zenith angles. Nevertheless, the results show that the large springtime decrease in CO in the Northern Hemisphere due to photochemical destruction is well captured in the SCIAMACHY observations. Note that the 8 • × 8 • sampling area for Harestua has a considerable overlap with the 8 • × 8 • sampling area of Bremen, hence it is not surprising that the SCIAMACHY-GBS differences between both stations are comparable ( Table 2). The sampling areas for both these stations include a significant amount of clouded ocean measurements for which the modeled column below the cloud may be slightly underestimated (de Laat et al., 2010). Kiruna and Ny Alesund are located further north and remote from large Northern Hemisphere emission regions. The reported Northern Hemisphere model biases are smaller at high Northern latitudes (Shindell et al, 2006). Thus, the model bias can explain why the differences between SCIAMACHY and GBS are less negative for Kiruna and Ny Alesund compared to Harestua and Bremen.  Table 2). The dashed bars indicate the average differences around the mean, the error bars indicate the root-meansquare of the differences and the solid red bars indicate the differences after adding the estimated TM4 column below the cloud for SCIAMACHY ocean measurements (not applied for Northern Hemisphere mountain stations). Figure 6 summarizes the results for all stations. It can be seen that in case of SCIAMACHY clouded ocean measurements a substantial part of the total column can be located below the cloud and that the difference between SCIAMACHY and GBS is significantly reduced when including an estimate of the column below the cloud based on TM4 results. The SCIAMACHY CO bias south of 45 • S reported by de Laat et al. (2010) is significant in 2003 and 2004 but is close to or within the SCIAMACHY precision for later years (Fig. 5e). Stations with strong local influences on GBS measurements such as Wollongong, Egbert, Moshiri, Zvenigorod, and St. Petersburg clearly show a significantly lower SCIAMACHY columns compared to the GBS measurements and thus are not representative for the sampling area used for validating the SCIAMACHY CO columns. The mountain stations Zugspitze and Jungfraujoch are surrounded by Table 2. Absolute ( ) and root-mean-square (σ ) differences -in 10 17 molecules/cm 2 and percentage of the average measured total column -between SCIAMACHY and GBS 2003-2007 average CO total columns, the number of GBS observations used (N), the contribution of the TM4 filling below ocean cloud pixels (TM4) and the relative contribution of ocean pixels to the mean (OCE). The second column indicates the type of location: Southern Hemisphere (SH), island mountain (IM), land mountain (LM), Northern Hemisphere (NH and Arctic (AR). "n.a." stands for "Not Available." SCIAMACHY observations are sampled within 8 • × 8 • degrees surrounding the GBS grid location and averaged, weighted by their respective instrument-noise errors. For the averaging one day at a time is added until the threshold instrumentnoise error of 1 × 10 17 molecules/cm 2 is reached. If multiple GBS observations fall within the time range of the average SCIAMACHY CO total column then the GBS observations are averaged as well. If the sampling area includes clouded ocean measurements the results presented here include the SCIAMACHY below low-altitude cloud filling based on TM4 results, except for Izaña and Mauna Loa. As a result, for these locations no estimate is required for the missing ocean below-cloud partial column. In addition, Kitt Peak, Egbert and Zvenigorod are located too far away from oceans to have any ocean pixels contribute to the mean for an 8 • × 8 • sampling area. low lying land regions, hence the SCIAMACHY and GBS measurements sample significantly different columns and thus these stations are not very appropriate for validating SCIAMACHY CO columns, as long as a robust correction method is not available to reproduce the low tropospheric CO columns in the Alpine region. The remaining stations show differences close to or within the estimated measurement precision of SCIAMACHY CO. The standard deviation of the differences as shown by the error bars in Fig. 6 is quite large for most stations and in particular for the stations with local influences. These standard deviations are larger than the typical GBS precision of <5%. The larger standard deviations are likely related to representation differences, which will be discussed in more detail in the following paragraph.

Global validation results
Finally, Table 2 shows that the model contribution to the SCIAMACHY total columns due to the filling of SCIA-MACHY ocean pixels for the FTIR locations can be as large as 20%. This contribution depends on the missing below cloud partial column as well as the weighted averaging and the number of ocean pixels used for calculating the mean. On a global scale the below-cloud partial column are 16 ± 8% (2σ ) (de Laat et al., 2010;their Fig. 5a).

Sampling area, instrument-noise error and precision
The SCIAMACHY columns used in the comparisons so far are based on spatio-temporal averaging of single measurements until a precision of 0.1 × 10 18 molecules/cm 2 is reached.
In this section we briefly discuss the effect of filtering single SCIAMACHY measurements on different instrumentnoise errors and the effect of using different precision thresholds in the spatio-temporal averaging on the validation. Three SCIAMACHY instrument-noise error thresholds (1.5 × 10 18 , 0.5 × 10 18 , and 0.2 × 10 18 molecules/cm 2 ) and three SCIAMACHY precision thresholds (0.2 × 10 18 , 0.1 × 10 18 , and 0.05 × 10 18 molecules/cm 2 ) are investigated, which results in nine different parameter combinations (Table 3). For each parameter set we calculate a skill score for the SCIAMACHY and GBS comparison for sampling areas ranging from 1 • × 1 • to 20 • × 20 • . The skill score is defined as (Taylor, 2001): With S the skill level (varying between 0 and 1), R is the correlation coefficient and σ f the ratio of the standard deviations of two datasets. In cases where the standard deviations of both data sets are comparable and the correlations are high (R close to 1) the skill level will be close to 1 and the two CO datasets are very similar. A skill level 0 indicates no resemblance between the two data sets. Note that the skill level is not sensitive to systematic biases. Figure 7 shows the skill value for three stations for all these combinations, which are numbered according to the combinations listed in Table 3. Similar plots for all stations can be found in the supplementary information. For each test, increasing sampling area sizes are represented going from small sizes on the left to large sizes on the right. Note that experiment No. 2 represents the parameters values for the results discussed in Sect. 4. These three stations show very different behavior. For Lauder skill levels decrease with increasing sampling area size, but skill levels increase with a stricter instrument-noise error filter and for smaller precision thresholds. For Kitt Peak there is no change in skill for any of the three parameters. For Harestua, skill levels increase with increasing sampling area size, a stricter instrument-noise error filter and smaller precision thresholds.
In general, we found that for a stricter instrument-noise error threshold the skill levels remain similar for most stations, although for some stations a slight increase is found (compare variations among parameter sets 1-4-7, 2-5-8 and 3-6-9). This increase appears to be restricted to stations with some SCIAMACHY outliers (see Fig. 5), which occur over European stations Zvenigorod, St. Petersburg, Bremen, and Jungfraujoch, Zugspitze and Garmisch-Partenkirchen.
Smaller precision thresholds increase the skill levels for most stations (compare variations among parameter sets 1-2-3, 4-5-6 and 7-8-9). This is related to the reduction of short-term variability in the CO column measurements when averaging more measurements over a longer period resulting in smaller precision thresholds. Short term variations in CO columns are related to weather variability and air masses with different CO characteristics. They manifest themselves as random variations on top of the seasonal cycle. These random variations will differ between the SCIA-MACHY measurements averaged over the sampling area and GBS measurements, which as a result reduces the skill when comparing both. For longer time averages -required to reduce instrument-noise errors -these random variations average out. As a result, both the mean of SCIAMACHY and GBS CO columns become more representative of the actual long term mean and the seasonal cycle of CO, and as a consequence skill levels improve when using a stricter precision threshold.
For a number of GBS locations the comparison also improves by changing the sampling area size. However, the optimal choice for the sampling area size remains station dependent.
An example of the SCIAMACHY -GBS comparison as shown in Fig. 5 but for a different parameter set (number 6 in Table 2 for a 20 • × 20 • degree area) can be found in supplementary Fig. 2.
These results do not imply that a stricter instrument-noise error filter and smaller precision threshold should be used. Rather, it indicates that the signal-to-noise ratio of individual SCIAMACHY measurements is insufficient to derive useful information on short synoptic timescales. However, the results show that on monthly timescales or longer SCIA-MACHY observations do contain useful information.  Table 3, also indicated by the colors. The skill score, on a scale from zero to one, is a measure of the agreement between two series and is based on comparing the correlation between the two series and the root-mean-square error of the difference series (see text). The higher the skill score, the better the agreement. For each combination the skill scores are ordered from left to right from the smallest to the largest sampling area size. In addition, we excluded skill scores when less than 25 SCIAMACHY-GBS comparison values could be calculated.

Summary and conclusions
This paper presents a detailed validation of SCIAMACHY CO total columns with independent ground based CO total column observations from twenty GBS stations worldwide for the five-year period 2003-2007. For all stations the seasonal cycle of SCIAMACHY and GBS agree well. For stations not affected by local emissions or altitude effects, differences between SCIAMACHY and GBS are close to or within the SCIAMACHY CO total column precision of 0.1 × 10 18 molecules/cm 2 (∼5-10%) of the SCIAMACHY CO columns. Stations with strong local influences, such as Wollongong, Egbert, Zvenigorod, and St. Petersburg show significantly lower SCIAMACHY columns compared to the GBS stations. Because of the large SCIAMACHY sampling area of 8 • × 8 • , local CO enhancements as seen in the GBS measurements do not show up in the SCIAMACHY average. Note that also the Moshiri station may be affected by some local influences.
For the Northern Hemisphere mountain locations Jungfraujoch and Zugspitze SCIAMACHY columns are significantly larger than those of the GBS stations. This can be explained by the specific geographical location of both stations. Mauna Loa also shows a bias but this may be a spurious result as there are relatively few measurements at this location.
The Southern Hemisphere stations Arrival Heights and Lauder show a clear bias for the years 2003 and 2004, which is not present for later years. The bias found is consistent with the Southern Hemisphere bias south of 45 • S mentioned in de Laat et al. (2010) and its origin is under investigation. No other time dependent biases were identified, indicating that for now degradation of the SCIAMACHY CO channel seems to have only a minor effect on the retrieved columns -if any.
For most GBS locations a better agreement between SCIAMACHY and GBS is found when a stricter precision threshold is used, which is a consequence of the spatiotemporal averaging: when averaging CO column measurements over longer time periods the effect of short-timeoften local -variability is reduced and the SCIAMACHY and GBS CO columns become more representative of the long-term CO column variability within the sampling area, and as a consequence the skill improves. This indicates that -because of the large instrument-noise error of single SCIAMACHY measurements -there is little information on timescales shorter than a month. However, it also shows that SCIAMACHY observations can be used to study seasonal and interannual CO total column variability.
Using a stricter instrument-noise error filter results in fewer outliers in the SCIAMACHY CO columns for some stations, suggesting that SCIAMACHY observations with larger instrument-noise errors may lead to anomalously small CO total columns. A. T. J. de Laat et al.: Validation of five years (2003)(2004)(2005)(2006)(2007) of SCIAMACHY CO Finally, although the validation of SCIAMACHY with GBS observations yields satisfactory results, there are clear limitations to this validation. The spatial coverage of GBS locations is limited so that many important regions of the world are still missing, and SCIAMACHY measurements must be averaged over larger areas to lower the measurement noise. As a result, biases related to certain spatio-temporal surface parameters cannot be detected using the current set of available GBS measurements. The recent and ongoing strong deployment of new GBS instruments as part of TCCON Darwin and Garmisch as examples -will fill many gaps in the current GBS network.