Comparison of the regional CO 2 mole fraction ﬁltering approaches at a WMO/GAW regional station in China

. The identiﬁcation of atmospheric CO 2 observation data which are minimally inﬂuenced by very local emis-sions/removals is essential for trend analysis, for the estimation of regional sources and sinks, and for the modeling of long-range transport of CO 2 . In this study, four approaches are used to ﬁlter the atmospheric CO 2 observation records from 2009 to 2011 at one World Meteorological Organiza-tion/Global Atmosphere Watch (WMO/GAW) regional station (Lin’an, LAN) in China. The methods are based on the concentration of atmospheric black carbon (BC), on a statistical approach (robust extraction of baseline signal, REBS), on CH 4 as an auxiliary tracer (AUX), and on meteorological parameters (MET). All approaches do suitably well to capture the seasonal CO 2 cycle at LAN.


Introduction
Carbon dioxide (CO 2 ) is the most important greenhouse gas in the atmosphere.It contributes more than 60 % of total radiative forcing (RF) of the long-lived greenhouse gases (AGGI, 2014).The large increase of atmospheric CO 2 of nearly 120 ppm above preindustrial levels has been unequivocally attributed to human-caused emissions (Keeling, 1993;WMO, 2015).Using atmospheric CO 2 observations, the source/sink estimations can be constrained through inverse models, which is an important way to understand the carbon cycle in the land biosphere (Chevallier et al., 2011;Thompson et al., 2009).For this purpose, lots of groundbased stations have been set up to monitor the CO 2 mole fractions around the world.So far, there are more than 150 sites worldwide where greenhouse gas mole fractions are measured (Artuso et al., 2009;Dlugokencky et al., 1995;Necki et al., 2003;Sirignano et al., 2010;Tans et al., 1990;WMO, 2015).Due to technical and logistical constraints like access to the measurement site, power supply or internet connection, very few monitoring stations are sufficiently remote to be permanently exposed to pristine air masses, while many Global Atmosphere Watch (GAW) stations are occasionally too frequently affected by local sources or sinks (Tsutsumi S. X. Fang et al.: Comparison of the CO 2 data filtering approaches et al., 2006;Riley et al., 2005).The measurements at the majority of sites cannot fully represent the well-mixed CO 2 conditions in the regions.Hence data filtering is an essential part for the analysis of data from those sites when trying to retrieve representative trends, for the estimations of sources and sinks, or for the modeling of long-range transport of trace gases (Greally et al., 2007;Novelli et al., 2003;Prinn et al., 2001;Ryall et al., 1998).
Several methods have been applied in the past for the extraction of background mole fractions (or regionally representative) from ground-based measurements: (1) filters based on specific trace gases or ratios of trace gas : trace gas (Brunke et al., 2004;Tsutsumi et al., 2006;Zanis et al., 2007).For example, Tsutsumi et al. (2006) used carbon monoxide (CO) as an indicator to filter the observed CO 2 mole fractions at the station Yonagunijima located in east Asia.Zanis et al. (2007) used the total reactive nitrogen (NO y ) to CO ratio to distinguish different regimes at the high-altitude station Jungfraujoch in central Europe.Brunke et al. (2004) used radon and CO to classify their observations at Cape Point, South Africa.(2) Meteorological filters are the most commonly used method (Artuso et al., 2009;Chmura et al., 2008;Collaud Coen et al., 2011;Zellweger et al., 2003).This method considers various factors such as the local wind speed, wind direction, boundary layer height, information on the atmospheric stability, solar input, or general weather conditions and others.Sometimes it further considers the diurnal CO 2 variation as it can be closely linked to the above parameters (Zhou et al., 2005).(3) Statistical methods is an approach that generally uses the variations (e.g., a low standard deviation) of observed data in certain time windows as a threshold to select the regional values (Cunnold et al., 2002;Morimoto et al., 2003;Pickers and Manning, 2015;Zhang et al., 2007).( 4) Numerical transport methods use atmospheric dispersion modeling (e.g., air mass back trajectories) to study the advection regimes with subsequent distinction in periods with potential influence of local or regional source/sink and uninfluenced conditions (Cape et al., 2000;Manning et al., 2011;Ryall et al., 2001).Some of the studies combined two or more methods above to select the best well-mixed CO 2 mole fractions (e.g., Thoning et al., 1989).However, due to the different characteristics of each station such as location (e.g., continental sites, coastal sites), proximity to biosphere and diffuse or point sources etc., the best data filtering approach has to be carefully selected for each station (Ruckstuhl et al., 2012).One method can be also more useful than another at the same station depending on the parameter of interest.In brief, there is not a standard method for selecting the background mole fractions from a continuous data series.
With the rapid development of its economy, China became the largest fossil fuel CO 2 emitter in 2006 and emitted 1.8 Pg carbon in 2011 (LeQuéré et al., 2013;Marland, 2012).The Yangtze Delta area is one of the most developed regions in China and is one of the largest global CO 2 emission regions (Gregg et al., 2008).The total population in this area was ∼ 159 million in 2010 (National Bureau of Statistics, 2011).Moreover, this area is a highly productive region for paddy rice and winter wheat in China, with a potentially large influence on the atmospheric CO 2 concentrations in this region.For example, winter wheat and rice production in this region represent 20 % of the total Chinese wheat harvest and 5 % of the entire Chinese grain production (Colby et al., 1992;Yan et al., 2003).To understand the character and the abundance of greenhouse gases in this region, the Chinese Meteorological Administration (CMA) built the Lin'an (LAN) station in the center of the Yangtze Delta area in 1983.The station has been included in the World Meteorological Organization/Global Atmosphere Watch (WMO/GAW) as a regional station and is named after the small town of Lin'an, which is approximately 6 km southwest of the station.There was no in situ CO 2 measuring system until January 2009, when a cavity ring-down spectrometer (G1301, Picarro Inc.) was installed to continuously monitor atmospheric CO 2 and CH 4 mole fractions.Since then we have been acquiring the first-hand greenhouse data at this station (Fang et al., 2013;Pu et al., 2014).
We have found that the CO 2 mole fractions at LAN were the highest out of the four WMO/GAW stations in China (Liu et al., 2009;Fang et al., 2011).Based on the meteorological methods, we filtered the CO 2 records from 2009 to 2011 and estimated that 16.6 % of the data are likely to be regionally representative (Fang et al., 2014).However, in a different study, Pu et al. (2014) used black carbon (BC) as a chemical tracer to identify the influence of anthropogenic emissions and found that 27.3 % of the data were regionally representative.In a previous study (Fang et al., 2011), we also applied a purely statistical method to filter the data which resulted in regionally representative conditions during 63.5 % of the time in 2009.The different data filtering approaches have produced different results in terms of the regional CO 2 mole fractions.Our previous study (Fang et al., 2014) obtained an average regional CO 2 mole fraction of 404.2 ± 3.9 ppm in 2011 at LAN by using a meteorological filter, while Pu et al. (2014) estimated a corresponding average value of 407 ± 5.3 ppm during the same period by using a different filtering approach.The difference (∼ 3 ppm) between these two methods can induce biases in the estimation of CO 2 abundances at the regional scale as well as in the calculation of sources/sinks by inverse models.In this paper, we applied four approaches to filter the observed data from 2009 to 2011 at LAN station, and studied the applicability of them.The four methods use black carbon as a tracer, a statistical method, methane (CH 4 ) as a tracer, and meteorological parameters.

Measurement system
The LAN station (119 • 44 E, 30 • 18 N; 138.6 m a.s.l.) is about 50 km from Hangzhou (capital of the province of Zhejiang) and 150 km from Shanghai (the largest economic center and the second largest population city in China) (Fig. 1).North of the station (1.4 km away) is a small factory where charcoal is manufactured from bamboo wood.The town of Lin'an (with a population of ∼ 100 000) is approximately 6 km southwest of the station.The observatory is built on the top of a small hill (160 m a.s.l.) and is surrounded with hilly lands and farming areas, with dense vegetation coverage.The site is located in a humid subtropical monsoon climate zone, with a mean annual precipitation of 1480 mm and a mean temperature of 15.3 • .
A cavity ring-down spectrometer (CRDS; Picarro Inc., model G1301) is used for continuous measurements of atmospheric CO 2 and CH 4 .This type of instrument has been proven suitable for making precise measurements of CO 2 and CH 4 mole fractions, since its response is both highly linear and very stable (Chen et al., 2010;Crosson, 2008).The factory reported that the precision of the instrument is 50 ppb for CO 2 and 0.7 ppb for CH 4 (1 standard deviation, 1σ ) in 5 min.Sample air is drawn regularly from about 10 m a.g.l (above ground level).At the end of 2010, a new sampling tower (50 m a.g.l.) was built and another sampling port was installed at 50 m.The Picarro system then switched the air sample stream between the 10 m and the 50 m intake every 5 min.The sample air is filtered and temperature stabilized and dried to meet the high-quality target of the WMO/GAW network.Details of the system are described in Fang et al. (2013).Two standard gases are used to calibrate the measurements and a target gas is used to check the precision of the system routinely.All of the standards are linked to the WMO X2007 scale (Zhao and Tans, 2006).The CRDS system responds quickly to the sample and reports data with a frequency of 0.3 Hz.For the long-term time series, ambient air data are recorded as 5 min averages.Excluding the periods of system maintenance and calibration, more than 97 % of the total 5 min average data points were retained.After computing the CO 2 mole fractions, the data were manually inspected to flag any analytical or sampling problems.More than 97 % of the 5 min data remained after this filtering step.Then the data were aggregated to hourly averages for further study.Except when noted differently, the averaged values in this study are reported with 95 % confidence intervals (CI).The CO 2 concentrations are atmospheric CO 2 dry air mole fractions.The black carbon particles in this study were measured by an aethalometer (AE-31, American Magee Scientific, USA).Seven wavelengths are used for the observations, with values of 370, 470, 520, 660, 880, and 950 nm.

Data filtering approach
There are significant CO 2 diurnal variations at LAN in all seasons, indicating the strong influence of biological activity near the site including absorption by photosynthesis and emissions by plants and soil, and the variation of boundary layer height.The CO 2 differences between the parallel measurements (10 and 50 m a.g.l.) at LAN also showed distinct diurnal variations with the most stable and lowest differences (less than 0.2 ± 0.2 ppm) occurring from 10:00 to 16:00 (local time, LT).At remote sites where the local sources and sinks are negligible, the CO 2 diurnal variations are generally weak (e.g., Keeling et al., 1976;Zhou et al., 2005).Thus, in this study, the CO 2 data from 10:00 to 16:00 LT were first selected to represent better mixed conditions.Then the selected data were used for the study of different filtering approaches.We adopted four data selection methods to filter the selected CO 2 mole fractions.They are described as follows.
-Black carbon (BC) tracer: we adopt a similar routine as used by Pu et al. (2014).The observed CO 2 was filtered based on the observation of both black carbon concentration and meteorological parameters.Pu et al. (2014) found a correlation coefficient of 0.53 (R) between black carbon concentrations and CO 2 mole fractions and concluded that both CO 2 and black carbon have some common sources such as fossil fuel combustion and biomass burning within this area.Thus, first we excluded the episodes when the black carbon concentration exceeded 5000 ng m −3 .During the wet season precipitations when the black carbon concentration was very low, we used air mass back trajectory analysis to further flag the data which were likely influenced by anthropogenic emissions from cities nearby.Finally, we studied the average standard deviations (σ ) of hourly CO 2 mole fractions as a function of wind speed using all data from 10:00 to 16:00 LT from 2009 to 2011.As shown in Fig. 2, the average σ decreased sharply when local surface wind speed was faster than 1.5 m s −1 .Clearly, higher local surface wind caused better mixed conditions and consequently more stable CO 2 mole fractions.Thus the remaining data were further flagged when surface wind speed was below 1.5 m s −1 to minimize the influence of very local sources and sinks.
-Statistical method (REBS): here we applied the robust extraction of baseline signal (REBS) to extract the regional CO 2 mole fractions, which was similar to those used in the Global Atmospheric Gases Experiment/Advanced Global Atmospheric Gases Experiment (GAGE/AGAGE) network (Ruckstuhl et al., 2001) to filter halocarbons and other non-CO 2 gases.Ruckstuhl et al. (2012) suggested that a meteorological filtering of the data should be applied prior to the application of the REBS method, as the polluted conditions might induce a bias on the background classification.In this study, this was taken into consideration by using data from 10:00 to 16:00 LT.The REBS method is a purely nonparametric technique and assumes that the background signal varies very slowly relative to contributions of the regional signal.The observed concentrations Y (t i ) are defined by a regional concentration g(t i ), plus a polluted concentration m(t i ), plus the measurement errors E i .The measurement errors E i are assumed to be independent and Gaussian-distributed with mean 0 and variance σ 2 .If the regional signal m(t i ) is zero in a time period around t 0 , the baseline signal g(t 0 ) can be estimated, even when the form of the curve g is unknown.
Hence the curve g(t i ) is approximated as being linear in a sufficiently small neighborhood around any given time point t 0 .Details of the method are described by Ruckstuhl et al. (2012).A bandwidth of 60 days was used in this study, while other bandwidths of 90, 120, and 180 days were also tested.The bandwidth choice did not considerably influence the retrieved averages, and trends of the regionally representative CO 2 mole fractions were similar.In comparison with other methods, this approach did not have to be considerably adapted to the conditions at the individual measurement site.
-Auxiliary (AUX) tracer: this tracer uses CH 4 as an auxiliary indicator to filter the CO 2 time series.Many previous studies found positive correlations (mostly in winter) between the atmospheric CH 4 and CO 2 mole fractions (Conway et al., 1989;Tohjima et al., 2014;Wong et al., 2015;Worthy et al., 2009) as well as the respective fluxes from ecosystems (Jamali et al., 2013;Repo et al., 2007).For the data series of CO 2 and CH 4 at LAN, we also observed an apparent correlation between them during the observing period (Fig. 3).The correlation coefficient (R) is higher than 0.5 for all seasons, which indicates that there are similar patterns of CO 2 and CH 4 sources.This phenomenon is more distinct in spring (R = 0.7) and winter (R = 0.8) when photosynthetic activity of the vegetation, i.e., the CO 2 uptake, is weak.In summer and autumn, the active absorption of CO 2 by terrestrial ecosystems may partly alter the CO 2 -CH 4 correlation.Indeed, the positive coefficients still suggest that the anthropogenic emissions dominate the carbon cycle at the LAN station.In remote areas, an uncorrelated or negatively correlated relationship is generally observed (e.g., Necki et al., 2003).As described above, we also used the robust extraction of baseline signal (REBS) method to filter the CH 4 data because it has proven to be suitable for extracting the background mole fractions of CH 4 at remote sites (Cunnold et al., 2002).By doing so, we flagged the hourly CO 2 records which correspond to the periods of locally influenced or regionally representative events of CH 4 filtered by the REBS method.Moreover, although there were correlations between the atmospheric CH 4 and CO 2 in all seasons at LAN, they were not perfectly correlated, meaning that some CO 2 events could not be determined by CH 4 mole fractions, especially for data points which were far away from the linear fit in Fig. 3. To reduce this influence, we calculated the standard deviations (1σ ) of the differences between measured CO 2 mole fraction and linear fit in each season.If the absolute difference was larger than 1σ , the data point was considered as locally influenced and was flagged as an outlier.This additional filter excluded most events with poor CH 4 -CO 2 correlation (Fig. 3).
-Meteorological (MET) method: as used in previous studies, the diurnal variation of CO 2 mole fractions, local surface wind direction, local surface wind speed, and information on nearby sources were all considered for the CO 2 data filtering (Fang et al., 2014;Zhou et al., 2004Zhou et al., , 2005)).According to the nearby potential contamination sources (nearby villages, industry etc.), the data when local surface winds were from SSW-SW and N sectors were excluded.Then the data were further flagged by discarding the events when the local surface wind speed was lower than 1.5 m s −1 to minimize the influence of very local sources or sinks as discussed above.
After the multiple steps filter, there were still a few discrete data points remaining with high/low CO 2 mole fractions in the BC, AUX, and MET methods.These odd outliers unlikely represented regional CO 2 conditions as they should not peak within a few hours.Thus we used a mathematical method to further flag the remainder of data in the BC, AUX, and MET methods.The standard deviation (σ ) of hourly CO 2 data in a 60-days bandwidth (similar to the REBS method) was calculated.The differences between every data point and the 60-day average were calculated.Data were flagged and excluded if the difference exceeded 3σ .After that, the remaining data in the BC, AUX, and MET were considered as the least influenced by local sources or sinks.The closed blue circles represent the filtered regional events.The open gray circles represent local events which are influenced by very local sources or sinks.The red lines are results fitted to the filtered regional events using the curve-fitting method by Thoning et al. (1989).

Filtered regional CO 2 mole fractions
Figure 4 illustrates the filtered CO 2 results in the four approaches.From top to bottom are the results of the BC, REBS, AUX, and MET methods, respectively.The filtered regional mole fractions account for ∼ 12.2 % in BC, 15 % in REBS, 12.8 % in AUX, and 16.5 % in MET of the total valid hourly data.The low proportions of regional CO 2 in the four methods reflect the strong influences of local sources and sinks.The overall seasonal patterns of regional CO 2 retrieved in the four approaches are similar with peaks in winter and troughs in summer, which is consistent with other observations in the Northern Hemisphere (Nevison et al., 2008).Table 1 compares the annual regional CO 2 mole fractions in the four methods.The annual mole fractions in the REBS method are apparently higher than the others with the smallest discrepancy of 1.7 ± 0.2 ppm in 2010.As the REBS method uses a purely mathematical method, the high CO 2 mole fractions induced by local sources may enhance the σ values and enlarge the regional CO 2 band, and subsequently introduce higher

Mean seasonal cycles and trends
We used the curve-fitting method described by Thoning et al. (1989) to derive the seasonal CO 2 trends based on the filtered regional mole fractions from 2009 to 2011 (Figs. 4 and 5).The data were fitted into a function with three polynomial terms for the long-term trend and four annual harmonic terms.To minimize the influence of inconsistent records in the four methods, the data were fitted and interpolated with the same time interval (2 h).For comparison, surface CO 2 mole fractions at similar latitudes to the marine boundary layer (MBL) reference computed by NOAA/GMD (sine of 0.5, i.e. 30 • N) are also presented (NOAA, 2015).The monthly CO 2 mole fractions in the four methods are apparently higher than the MBL values during the whole year, with an average difference of 10.9 ± 0.1 for BC, 13.9 ± 0.2 for REBS, 11.2 ± 0.2 for AUX, and 11.5 ± 0.2 ppm for MET.
The large difference indicates that the Greater Yangtze Delta area is an important net source of atmospheric CO 2 .The monthly CO 2 variations in the four approaches show similar patterns, with minimum values in August and maximum values in December.The appearance of the lowest values matches with the minimum of the MBL reference.As reported in a previous study (Fang et al., 2014), the highest CO 2 difference between the Northern Hemisphere and the LAN was in December, and was due to the lower boundary layer in winter, as well as the increase in fossil fuel consumption (partly for domestic heating) and cement burning, as well as plant respiration in this season.In general, these four approaches do well in capturing the seasonal CO 2 cycles at the LAN station.However, there are also differences between the monthly CO 2 mole fractions.The monthly CO 2 values in the REBS method are always higher than in the other methods (Fig. 5a).This is because the REBS method uses variations of the raw data (standard deviation) as a threshold to flag the locally influenced CO 2 mole fractions.The "noisy" CO 2 mole fractions (mostly high outliers) may draw the trend of regional events upward and subsequently induce higher regional values.This result also indicates that the REBS method may be less suitable for CO 2 data filtering for a site without a well-defined background as in the case of LAN.In fact, this method is mostly used at remote sites with few local sources and sinks (e.g., Zhang et al., 2013).
During the winter-spring period, the regional CO 2 mole fractions retrieved with the BC method are apparently lower than those retrieved with the other methods.This is because the BC method mainly refers to the measured black carbon concentrations.Emissions of black carbon and CO 2 from fossil fuel and biomass burning occur at both the local and regional scale (Baumgardner et al., 2002), and the BC method robustly flags the CO 2 mole fractions when black carbon concentrations exceed the threshold value (5000 ng m −3 ).However, it is difficult to distinguish the local emissions of black carbon from the regional contents.Especially during the winter-spring seasons, the regional black carbon and CO 2 concentrations are both high due to the increase of fossil fuel consumption and cement burning (Feng et al., 2014).These high concentrations should still represent the volumes at regional scale and should not be flagged.Thus in the BC method, the flagging of higher CO 2 mole fractions as local representatives is probably the reason for the lower regional CO 2 values during the winter-spring period.It should also be mentioned that the use of a constant threshold in the BC method may lead to slight errors in the estimation of regional CO 2 mole fractions.Secondly, the BC method is mainly geared towards polluted air masses altered by anthropogenic sources, whereas the influence of the land biosphere during daytime remains unconsidered.Moreover, the large difference of the lifetime, which is 4-12 days for black carbon (Cape et al., 2012) and more than decades for CO 2 (Moore and Braswell, 1994), may also contribute to the bias of this method.
The monthly CO 2 mole fractions in the AUX method are lower than in the other methods in summer.During this time of the year, a large amount of CH 4 is emitted from wetlands (e.g., from rice paddy fields) on the eastern China Plain (Lu et al., 2000;Zhang et al., 2010).Thus, these high values should represent regional conditions rather than local events (Fang et al., 2013).On the other hand, the active processes of photosynthesis by local and regional vegetation in summer reduce the observed CO 2 mole fractions, especially by local vegetation, which has a strong negative influence on the CO 2 mole fractions during afternoon.This influence can also be seen from the frequently lower CO 2 mole fractions at 10 m a.g.l.than at 50 m a.g.l. in the daytime.These different source/sink regimes may cause some CO 2 mole fractions being flagged as regional even though they are actually influenced by the absorption of local vegetation, subsequently leading to lower regional values in summer.
Compared with the BC, REBS, and AUX methods, there is no apparent disadvantage of the MET method, which attempts to eliminate the influence of local sources and sinks using meteorological information.Figure 5b illustrates the detrended seasonal cycles of CO 2 in the four approaches.The peak to trough amplitudes of regional CO 2 are 14.4 ± 0.1, 18.6 ± 0.1, 22.7 ± 0.1, and 20.4 ± 0.1 ppm for the BC, REBS, AUX, and MET methods, respectively.The amplitude for the BC method is the lowest, which is ascribed to the lower CO 2 mole fractions during the winter-spring period, and the higher values than AUX and MET method in summer.The higher CO 2 mole fractions in the BC method in summer may be due to the lower black carbon concentrations (Feng et al., 2014).The highest CO 2 amplitude is observed in the AUX method, which is ascribed to the lowest CO 2 mole fractions in summer.
In addition to providing seasonal cycles, the method of Thoning et al. (1989) also provides an estimate of the trend over the full measurement period 2009-2011.The regional CO 2 mole fractions all show positive trends with annual growth rates of 1.8 ± 0.01 for BC, 2.8 ± 0.01 for REBS, 3.2 ± 0.01 for AUX, and 3.1 ± 0.01 ppm yr −1 (standard error) for MET.According to the statistics from the WMO Greenhouse Gas Bulletins (2011Bulletins ( , 2012)), the global CO 2 growth rate exceeded 2 ppm yr −1 from 2009 to 2012.The growth rate for the BC method is lower than the global average, which may partly be caused by the different CO 2 to black carbon ratios in the considered years.As the regional CO 2 value in the BC method is based upon the black carbon concentrations, the increasing fossil fuel standards (upgraded from Chinese national stage 3 to 4 since 2010) and exhaust efficiency may induce different ratios between CO 2 and black carbon concentrations and hence extract a smaller CO 2 growth rate.Similar to the annual CO 2 mole fractions (Table 1), the annual growth rates of the AUX and MET method are close.It should be mentioned that only 3 years of data are used to evaluate the annual CO 2 growth rate.The relatively short time series here may inevitably induce bias on the growth rate estimation, which needs to be treated with caution.

Comparison of local CO 2 events
The benefit of a successful extraction of regional values is twofold.The identified regional values can be used e.g., for the determination of regionally representative trends.In addition, the data considered to be locally influenced can be used to learn more about the sources and sinks in the vicinity of the station.Fig. 6a displays the seasonal variations of local CO 2 mole fractions in the four approaches.Due to the considerable proportion of local events below the regional band in REBS (blue dots Fig. 3), the local CO 2 mole fractions were separated into "REBS-P" events (in or above the regional band) and "REBS-D" events (below the regional band).The data were also fitted and smoothed by the method of Thoning et al. (1989).The local CO 2 events all reveal a broad spring maximum peaking in May, and a distinct winter maximum with the highest value in December.Minimum values are all observed in August.The peak in December and trough in August agrees with the seasonal pattern of the regional data.However, there is another distinct peak in May in all the approaches.Feng et al. (2014) investigated the black carbon measurements in Shanghai, China (150 km from LAN), and found three peaks in January-February, April-June, and November-December from 2010 to 2011.As the anthropogenic emissions of black carbon and CO 2 in the Yangtze area have a similar spatial distribution (Qin and Xie, 2012), the peaks in May and December are probably due to enhanced anthropogenic emissions in these months.However, the peak in May is blurred due to the dampening effect caused by the CO 2 uptake with the onset of the growing season.
The local CO 2 mole fractions in REBS-P are always higher than for the other methods, as most of the events identified by REBS-P are observed during nighttime (00:00-08:00 LT) when local emissions are strong and the boundary layer becomes lower.Contrarily, the CO 2 mole fractions in REBS-D are mostly observed at midday (12:00-16:00 LT) and are apparently lower than the other methods, reflecting the strong absorption by local vegetation.Thus it can be seen that at LAN station, the REBS method tends to define a median band of the CO 2 record as regional representative.Except from December to January, the local CO 2 mole fractions in the BC method are higher than those in the AUX and MET methods.As discussed above, the tendency of flagging higher CO 2 mole fractions as locally representative is probably the main reason for the higher CO 2 values.This result also indicates that the BC method induces bias on the local CO 2 estimations.
Meteorological data (such as surface wind direction and speed) could help to understand the greenhouse gas emissions and transport (Dlugokencky et al., 1995;Massen and Beck, 2011).Figure 6b shows the wind rose distribution patterns of local CO 2 mole fractions in the four methods.The distributions are similar with the highest CO 2 values in the SW-SSW sectors, including in REBS-P and REBS-D.This is due to the anthropogenic emissions from the town of Lin'an, located at approximately 6 km southwest of LAN (Fig. 1).The local CO 2 mole fractions in the WSW to SSW sectors in the BC method are apparently higher than in the other methods.It is also probably due to the tendency of flagging higher CO 2 mole fractions emitted from the town.The local CO 2 Figure 7.The filtered regional and local CO 2 mole fractions from 28 to 31 December 2010 (LT).The black dots represent the regional events and the gray dots denote the local events.The phase 1 and phase 2 represent periods from 06:00 to 16:00 LT in 29 and 30 December 2010, respectively.mole fractions in these sectors in the MET method are also higher than those of the REBS and AUX method.Actually, we studied the wind-rose CO 2 distributions in different seasons and found most of the discrepancies occurred in summer.This phenomenon can also be seen from Fig. 6a with higher values in the MET method than in the AUX and REBS methods in August.As discussed above, the lower local CO 2 mole fractions in the AUX and REBS methods in summer are probably due to the local CO 2 mole fractions (below the regional band in Fig. 3) being flagged as absorption by local sinks (e.g., by photosynthesis of local vegetation).

Case analysis
To further investigate the difference of the four data filtering approaches, we used a period in winter as a case study.In winter, CO 2 mole fractions at LAN are pretty high due to strong emissions and weak absorption by the regional terrestrial ecosystems.Here we selected two time periods which were from 06:00 to 16:00 LT on 29 December 2010 (period 1) and 30 December 2010 (period 2), respectively (see Fig. 7).Period 1 features elevated CO 2 mole fractions, while period 2 reveals "normal" CO 2 values.We compute the 3-day back trajectories with 500 m a.g.l. for the period with elevated CO 2 (28 December 2010 19:00 to 29 December 2010 06:00) using the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) dispersion model (Draxler and Rolph, 2003).The model is based on NCEP/NCAR Re- analysis data and the trajectories were calculated for every hour (01:00, 02:00, 03:00 LT. . .). Figure 8 shows all calculated trajectories in period 1.It can be seen that almost all of the air masses reaching LAN were transported over the cities of Hangzhou (province of Zhejiang, ∼ 50 km east of the station) and Nanjing (province of Jiangsu, ∼ 230 km north of the station).The black carbon during this period also displays an increased concentration.Since black carbon is mainly emitted by fossil fuel combustion and biomass burning (Penner et al., 1993;Cooke and Wilson, 1996), the enhanced CO 2 mole fractions should be mainly caused by the transport of emissions from these cities.Both the BC and AUX method flag all data in period 1 as locally influenced (Fig. 7).In the BC method, due to the increased black carbon concentration, all the CO 2 data are flagged because the black carbon concentrations are apparently higher than the yearly average.The meteorological conditions from 10:00 to 16:00 LT in period 1 favored dilution, i.e., average surface wind speed were 2.6 m s −1 ; the σ of the hourly mole fractions was less than 1.3 ppm.Although the CO 2 mole fractions increased in this period, it was more likely influenced by regional sources (e.g., from Hangzhou and Nanjing) rather than local sources.Thus, the BC and AUX method may erroneously assign local conditions.In period 2, some data points from 10:00 to 16:00 LT are flagged as regional except in the REBS method.As discussed above, this method is a purely statistical method.The existence of frequently high mole fractions in winter may enlarge the σ and may consequently deviate the regional events from the real trend.As a result, some low regional CO 2 mole fractions as in period 2 may not be identified.

Discussion and conclusions
The main purpose of data filtering at a regional station is to identify the data which are least influenced by local sources and sinks (Tsutsumi et al., 2006).However, due to the unique conditions for each station (i.e.topography, air mass trans-port, economic development level, etc.) and the complex influences of local sources and sinks, there is no ultimate way to rigorously distinguish the locally influenced CO 2 from the original data series.Thus data filtering at this type of regional station is a relatively empirical work.In this study, four data filtering approaches are used to flag the observed data from 2009 to 2011 at Lin'an (LAN) station in the Yangtze Delta area, China.Each of the methods applies multiple steps to flag the observed CO 2 mole fractions.The strong diurnal variations of observed CO 2 mole fractions and the discrepancy between the parallel measurements (10 and 50 m a.g.l.) indicate that selecting daytime data only is the first and critical step to study CO 2 mole fractions at this kind of station and this step is therefore applied as a first filtering step in all four approaches.The four methods in this study are all suitable to capture the main features of the seasonal cycle of regional CO 2 at LAN, but the different regimes in these methods also induce bias on the regional or local mole fraction evaluations.
The BC method may be treated with caution, as it is difficult to distinguish the local emissions of black carbon from the regional contents.Especially during the winter-spring seasons, this method may underestimate the regional CO 2 mole fractions at LAN.Moreover, it mainly gears to the polluted air masses altered by anthropogenic sources, and does not consider the influence of the land biosphere.Additionally, the different lifetime between atmospheric CO 2 and black carbon may also introduce errors on the estimation, especially during rainy periods.In this study, the annual mole fractions, the annual growth rate, and the local CO 2 values in the BC method are different from the other three methods.It should be mentioned that the BC method in Pu et al. (2014) is different from the one in the present study.Besides using different wind speeds as the filter criterion (2 m s −1 ) and excluding outliers, they used all data including both from daytime and nighttime.The emissions from local vegetation and accumulation in the shallow boundary layer in the night definitely enhanced the filtered CO 2 mole fractions and induced higher annual values than those in our study.
The REBS method is based on a purely statistical method.This method is appealing as it requires no additional information (site specific criteria, additional observations).However, it may also induce errors when evaluating the regional CO 2 mole fractions, e.g., overestimating the regional values.In a previous study (Fang et al., 2011), we estimated an annual average of 405.3 ppm in 2009 by using REBS, which was apparently higher than the averages in this study (Table 1).Due to the "noisy" CO 2 mole fractions at the regional sites like LAN, the filtered regional trend may be drawn upward or pulled down from the real variation.
Although there are correlations between CH 4 and CO 2 at LAN, the different source/sink regimes may induce bias on the regional CO 2 estimation in the AUX method, typically in summer.Further, the atmospheric CH 4 and CO 2 at LAN are not perfectly correlated, meaning that some CO 2 events cannot be determined by the CH 4 mole fractions.
In comparison to these approaches, there are fewer disadvantages in the MET method for the data selection.As this method mainly focuses on the influence of potential local sources and sinks and considers diurnal variations and meteorological conditions, it is reasonable to identify the influence of local sources and sinks and it is suitable to be applied at other regional stations.We also studied the results at one other station in China (Longfengshan, 127 • 36 E, 44 • 44 N; 330.5 m a.s.l.) using different BC, AUX, and REBS, but a similar MET method.The results also indicated that the MET method was the most favorable (Fang et al., 2015).However, we have to mention that due to the intake height (only 10 m a.g.l.) in this study and the complex influence of the land biosphere, the data selected at LAN may not fully represent the volume at the regional scale.Although we selected the data from 10:00 to 16:00 LT when the boundary layer was the highest and the surface wind speed was faster than 1.5 m s −1 , the influence of local land biosphere could not be fully eliminated.This influence can be seen from the frequently lower CO 2 mole fraction at 10 m a.g.l.than 50 m a.g.l.during daytime in summer.
However, it cannot be concluded that the other three methods are not suitable for the CO 2 data selection.For example, the four methods were also applied to process the observed CO 2 record at Mt. Waliguan station (100 • 54 E, 36 • 17 N; 3816 m a.s.l.) in China without excluding the nighttime data.As this station is a WMO/GAW global site and is located in a remote area, we found there were no distinct differences between the filtered CO 2 mole fractions, including the seasonal cycles, annual growth rates, and background mole fractions (data not shown); the results agreed very well with the background information in the Northern Hemisphere.In this study, we selected daytime data only for the four approaches to exclude the influences of very local sources/sinks (e.g., vegetation).However, for sites without strong local sources/sinks and indistinct diurnal CO 2 variations, the nighttime data may also represent the regional background and can be used.Atmospheric black carbon is mainly from fossil combustion and biomass burning, Thus the BC method may be applied at some remote sites to identify the anthropogenic influence on the observed CO 2 records.The theory of REBS assumes that the background signal varies very slowly relative to contributions of the regional signal.The results in this study prove that the REBS is not suitable for the CO 2 data filtering at regional stations like LAN.Instead, it may be applied at some remote stations like Mt. Waliguan.The AUX can be applied at sites where the atmospheric CH 4 and CO 2 are subject to the same sources, and it can also be applied at some remote stations.Moreover, due to the different characteristics and source/sink regimes of various gas species, the suitability of a particular filtering method may even differ when looking at different trace gases at the same sampling site.This needs to be studied separately.

Figure 1 .
Figure 1.Geographic map of the LAN station.The red stars denote cities or towns near the station.The yellow star indicates the Qingshan Lake nearby.

Figure 2 .
Figure 2. Average standard deviations of hourly CO 2 mole fractions versus wind speed based on all data from 10:00 to 16:00 LT from 2009 to 2011.The standard deviations of hourly CO 2 data were calculated based on the 5 min segments.

Figure 3 .
Figure 3. Correlations between CH 4 and CO 2 mole fractions based on all data from 10:00 to 16:00 LT at LAN station.Spring: March-May; summer: June-August; autumn: September-November; winter: December-following February.The red lines show linear fits between the CH 4 and CO 2 mole fractions.The blue lines in each chart bracket the CO 2 values within ±1σ of the data from 10:00 to 16:00 LT in each season.

Figure 4 .
Figure 4. Filtered CO 2 mole fractions in the four approaches (BC: black carbon as tracer; REBS: Robust Extraction of Baseline Signal; AUX: CH 4 as auxiliary tracer; MET: meteorological filter).The closed blue circles represent the filtered regional events.The open gray circles represent local events which are influenced by very local sources or sinks.The red lines are results fitted to the filtered regional events using the curve-fitting method byThoning et al. (1989).
regional CO 2 values.The annual mean CO 2 mole fractions all display increasing trends.Although the annual CO 2 determined by the BC method increases from 402.0 ± 0.1 in 2010 to 402.7 ± 0.2 ppm in 2011, the absolute increase (∼ 0.7 ppm) is too small compared to increases from the other methods, as well as the global means.For example, the global average CO 2 increases were 2.3 ppm in 2009-2010 and 2.0 ppm in 2010-2011 based on WMO/GAW's statistics(WMO, 2011; 2012)  and were 2.39 ppm in 2009-2010 and 1.71 ppm in 2010-2011 based on observations from NOAA's network(Dlugokencky and Tans, 2015).Considering the global CO 2 growth rate and the increasing emissions of CO 2 in China(CDIAC, 2015;Tohjima et al., 2014), it is unlikely that the regional CO 2 mole fractions at LAN almost remained constant from 2010 to 2011.In fact, the long-term trend in the BC method (polynomial part of the curve-fitting function) decreases after February in 2011 until end of the year (data not shown), which is opposite to the variations of total CO 2 emissions (or black carbon) expected from fossil fuel emissions in China with increasing value from 0.15 Pg C in 2009-2010 to 0.21 Pg C in2010 - 2011  (CDIAC, 2015)).As the BC method uses a fixed black carbon concentration (5000 ng m −3 in this study) as a threshold to filter the CO 2 record, a large proportion of high regional CO 2 mole fractions in 2011 may be flagged as local events, and consequently, a decreasing long-term trend was acquired.The absolute CO 2 increases from the other three methods indicate smaller increases from the 2009 to 2010 period and larger increases from 2010 to 2011, which better corresponds with the trend in total CO 2 emissions in China.

Figure 5 .
Figure 5. (a) Variations of monthly CO 2 mole fractions in the four methods, also compared to the surface values at similar latitudes (30 • N) from the MBL reference (MBL, 2015).The data in this figure are values smoothed by the curve-fitting method of Thoning et al. (1989).(b) The detrended seasonal CO 2 cycles in the four methods.This is the smoothed curve minus the trend.Error bars indicate confidence intervals of 95 %.

Figure 6 .
Figure 6.(a) Variations of locally influenced CO 2 mole fractions in four methods.The data are CO 2 mole fractions smoothed by the curve-fitting method of Thoning et al. (1989).REBS-P denotes local events in or above regional band in REBS in Fig. 3, and REBS-D denotes local events below the regional band.(b) Wind-rose distribution of locally influenced CO 2 mole fractions by the four approaches at LAN.The REBS-P and REBS-D show similar distributions.Thus the local events are averaged together as REBS to avoid a jump in the scale (red dots and line).Error bars indicate confidence intervals of 95 %.

Table 1 .
The annual regional CO 2 mole fractions by the four methods from 2009 to 2011.