Identiﬁcation of spikes associated with local sources in continuous time series of atmospheric CO, CO 2 and CH 4

. This study deals with the problem of identifying atmospheric data inﬂuenced by local emissions that can result

Abstract.This study deals with the problem of identifying atmospheric data influenced by local emissions that can result in spikes in time series of greenhouse gases and longlived tracer measurements.We considered three spike detection methods known as coefficient of variation (COV), robust extraction of baseline signal (REBS) and standard deviation of the background (SD) to detect and filter positive spikes in continuous greenhouse gas time series from four monitoring stations representative of the European ICOS (Integrated Carbon Observation System) Research Infrastructure network.The results of the different methods are compared to each other and against a manual detection performed by station managers.Four stations were selected as test cases to apply the spike detection methods: a continental rural tower of 100 m height in eastern France (OPE), a high-mountain observatory in the south-west of France (PDM), a regional marine background site in Crete (FKL) and a marine cleanair background site in the Southern Hemisphere on Amsterdam Island (AMS).This selection allows us to address spike detection problems in time series with different variability.Two years of continuous measurements of CO 2 , CH 4 and CO were analysed.All methods were found to be able to detect short-term spikes (lasting from a few seconds to a few minutes) in the time series.Analysis of the results of each method leads us to exclude the COV method due to the requirement to arbitrarily specify an a priori percentage of rejected data in the time series, which may over-or underestimate the actual number of spikes.The two other methods freely determine the number of spikes for a given set of parameters, and the values of these parameters were calibrated to provide the best match with spikes known to reflect local emissions episodes that are well documented by the station managers.More than 96 % of the spikes manually identified by station managers were successfully detected both in the SD and the REBS methods after the best adjustment of parameter values.At PDM, measurements made by two analyzers located 200 m from each other allow us to confirm that the CH 4 spikes identified in one of the time series but not in the other correspond to a local source from a sewage treatment facility in one of the observatory buildings.From this experiment, we also found that the REBS method underestimates the number of positive anomalies in the CH 4 data caused by local sewage emissions.As a conclusion, we recommend the use of the SD method, which also appears to be the easiest one to implement in automatic data processing, used for the operational filtering of spikes in greenhouse gases time se-

Introduction
Continuous measurements of long-lived greenhouse gases (GHG) such as carbon dioxide (CO 2 ) and methane (CH 4 ) at ground-based monitoring stations are commonly used in atmospheric inversions for the estimation of surface fluxes.The variability of GHG continuous time series reflects atmospheric transport processes and surface fluxes.One difficulty in matching these measurements with atmospheric transport model simulations is that they exhibit variability at a wide range of timescales, which is imperfectly captured by transport models due to their limited spatial resolution and to uncertain surface emission inventories.In particular, local emissions in the vicinity of stations can have a major influence on concentrations, generating brief but intense positive perturbations, hereafter referred to as "spikes".Every measurement has a specific spatial representativeness, and knowledge of this information allows for a much finer interpretation of the observation.It is desirable, in continuous GHG time series, to separate those data strongly influenced by local emissions (fluxes within less than few kilometres) from those influenced by regional (few tens of kilometres) and large-scale (hundreds or thousands of kilometres) fluxes and transport.The influence of local fluxes, in particular of nearby point sources of emissions, should be filtered out prior to the use of the time series in inversion models if the models do not have the ability to represent it.For instance, a road near a station can emit CO 2 , causing spikes in the time series, because this road is not accounted for in the emission inventory used in an inversion.
Having empirical information on the representativeness of continuous GHG time series, e.g. a logbook available for each station, allows for more precise interpretation of the atmospheric measurements in terms of the processes involved in the observed variability.It is interesting, for example, to assign the contribution of specific sources (e.g. point sources of fossil CO 2 emissions or biomass burning events) within the local vicinity of the station.Several methods have been proposed to account for local to regional influences in greenhouse gas observations according to other observables, such as wind speed and direction (Perez et al., 2012a) and tracers like radon-222 or black carbon (Biraud et al., 2002;Fang et al., 2015;Williams et al., 2016).Air-mass trajectory information is also frequently used (Ramonet and Monfray, 1996;Ferrarese et al., 2003;Maione et al., 2008;Fleming et al., 2011;Perez et al., 2012;Gerbig et al., 2006).Other methods based on a statistical treatment of time series (Giostra et al., 2011;Ruckstuhl et al., 2012) are easier to generalize because they require no additional observable.A commonly used strategy by modelers, using transport models of a typ-ical resolution from 10 to 50 km, consists of systematically removing some periods of the day (e.g.nighttime for surface stations or daytime for mountain sites) in order to filter the influence of non-resolved mesoscale circulations or vertical transport processes poorly represented by models (e.g.sporadic turbulence in stable or neutral nighttime boundary layers).
The development of regional networks for monitoring GHG and related tracer concentrations leads to an increasing number of continuous measurement stations, especially in continental areas.For example, the European ICOS (Integrated Carbon Observation System) Research Infrastructure is developing a network of tall towers for very precise GHG measurements across the European continent.It is thus important to characterize the representativeness of each individual measurement in order to separate spikes from local emissions that should not be used in studies aiming at constraining regional fluxes.In this study, our objective is to compare methods that could be used operationally to remove the contaminations from local sources at continuous measurement stations.Local contamination may be due, for example, to fossil-fuel-based power generation at the station facility and local traffic.The short-term variations (few seconds to minutes) of GHG associated with those of local sources have rarely been analysed, and they have generally been time averaged with consecutive data.Some studies, however, have been focusing on local emissions on the basis of the detection of short-term "spikes" (Monster et al., 2015).Here, "local" refers to emissions at less than a few kilometres from the station causing positive short-term spikes of a few seconds to a few minutes superimposed on the signal resulting from boundary layer mixing, synoptic transport and regional fluxes.Other methods such as the Fourier transform filters (Savitkzy Golay, 1964) and the wavelet transforms (e.g.Wee et al., 2008) have been considered at the beginning of this study, but these methods require continuous time series and smooth signals.Considering that the measurements are regularly interrupted due to different reasons (e.g.calibration, flushing time after switching from sampling level to another, power or internet outage), we had to select a method that handles time series with data gaps.Moreover, applying a Fourier transform method on continuous measurements provides a signal composed by frequencies only, and all information that varies with time will be lost.In other words, we can analyse what happens (spikes to be filtered out) without knowing when this happens, which is essential information to better understand the sources of contaminations.We compare here spike detection algorithms for local sources in greenhouse gases (CO 2 and CH 4 ) and long-lived tracer time series (CO).The algorithms chosen in this study have been applied to air pollution data (e.g.ultrafine particles, particulate matter and nitrogen dioxide NO 2 ) which have shorter lifetimes than CO 2 , CH 4 and CO (Brantley et al., 2014).In the case of GHG, spikes can be caused by local sources but also by the fast transport of remote emissions.Compared to short lifetime species, spikes in GHG are not always larger than the variability associated with synoptic scales.For CO 2 , uptake by local vegetation may occasionally lead to negative spikes, which will not be evaluated in this study (only positive spikes are considered).
The three spike detection algorithms -coefficient of variation (COV), robust extraction of baseline signal (REBS) and standard deviation of the background (SD) -are described in Sect.2, then applied to 2 years of continuous measurements of CO 2 , CH 4 and CO at four stations representative of the European network of GHG monitoring stations.The study will focus more on the SD and the REBS method, since they are fully automatic and they do not require any a priori information for the implementation.The results are discussed in Sect.3. Wherever possible, the ability of an algorithm to successfully detect and remove the effects of local sources and transport is verified using independent information about the presence and position of known local emissions.

Methodology
We selected four contrasting atmospheric GHG measurement sites operated by LSCE (Laboratoire des Sciences du Climat et de l'Environnement): a tall-tower station in France, a high-mountain station in France, a regional marine background site in Crete and a marine clean-air site in the Southern Hemisphere, which provided continuous data from 2013 to December of 2015 (Table 1).Continuous measurements used in this study are averages with 1 min time resolution and are processed in near real time (NRT) by the ICOS Atmospheric Thematic Centre (Hazan et al., 2016).The four stations have been used in regional and global atmospheric inversions to estimate GHG surface fluxes at regional and global scales (e.g.Bergamaschi et al., 2018;Le Quéré et al., 2007;Saunois et al., 2016).

Measurement sites
Amsterdam Island (AMS; 37 • 48 S, 77 • 32 E).This marine background station is operated since 1980 to monitor trends of trace gases in the southern hemispheric mid-latitude cleanair atmosphere.The observatory is located on the coast of a small island (55 km 2 ) covered by short grasslands, in the middle of the Indian Ocean 3000 km south-east of Madagascar.Measurements are performed at the Pointe Bénédicte site located north of the island, on the edge of a 55 m cliff above sea level.The air is sampled at the top of a 20 m high tower.The station contributes to the Global Atmospheric Watch program (WMO/GAW).The data used to feed the WMO/GAW database and estimate the long-term trends are filtered according to local wind measurements to avoid the influence of CO 2 emissions from the island itself (Ramonet and Monfray, 1996).Finokalia (FKL; 35 • 20 N, 25 • 40 E).This coastal station is located on the northern coast of Crete, 350 km south of mainland Greece.The nearest city is Heraklion with a population of about 150 000 inhabitants, 50 km west of the station.There are no significant anthropogenic emissions within a circle of 15 km around the station (Kouvarakis et al., 2000).The station is on the top of a 230 m hill above sea level, and the air is sampled at the top of a 15 m mast.The dry season from April to September is associated with strong winds from north and north-west (central Europe and Balkans), and the wet season from October to March is associated with air masses from North Africa (south and south-west winds) in addition to the dominant north-westerly winds.The station is operated by the Environmental Chemical Processes Laboratory (ECPL) at the University of Crete, which also collects aerosol and reactive gases (Hildebrandt et al., 2010;Pikridas et al., 2010;Bossioli et al., 2016;Kopanakis et al., 2016).
Pic du Midi (PDM; 42 • 56 N, 0 • 08 E).This highmountain site is located at 2877 m a.s.l. on the north-west side of the Pyrenees range in south-west France, 150 km east of the Atlantic Ocean and 200 km west of the Mediterranean Sea.Due to its high elevation, the station often samples tropospheric air from the Atlantic Ocean, as well as air masses from continental Europe during high-pressure conditions over France (north-easterly winds) or from the Iberian Peninsula under southerly winds.Upslope winds and mesoscale circulations are frequent especially in summer and early autumn, bringing boundary layer air mostly from southwest France (covered by intensive croplands and forests; Gheusi et al., 2011;Tsamalis et al., 2014;Fu et al., 2016).
Observatoire Pérenne de l'Environnement (OPE; 48 • 33 N, 5 • 30 E).This 120 m tall tower is located in a rural area at 395 m a.s.l. in the north-east of France (250 km east of Paris).It is located in a transition zone between oceanic westerly regimes and easterly winds advecting air from eastern Europe.The station continuously measures air quality and greenhouse gases since September 2011 as part of the European ICOS.Every hour, ambient air is sampled for 20 min alternatively at heights of 10, 50 and 120 m on the tower (Table 1).

Measurement methods
The gas analyzers used at the four stations are cavity ringdown spectroscopy instruments (CRDS; O'Keefe and Deacon, 1988), namely Picarro/G2401 analyzers at FKL, OPE and PDM with CO 2 , CH 4 and CO and Picarro/G2301 at AMS with CO 2 and CH 4 (Table 1).The measurement protocols used at the four stations are similar and based on ICOS specifications (https://www.icos-ri.eu/documents/ATCPublic).A calibration using four reference gases is performed every 3 to 4 weeks.Two more reference gases are analysed regularly for quality control purposes.The raw data (0.2 to 0.5 Hz) are transferred once per day to a central server and NRT datasets are available within 24 h.The NRT data processing (Hazan et al., 2016) includes automatic filtering of raw data based on the physical parameters of the analyzers (e.g.cavity temperature and pressure) and threshold values for rejection of outliers.This last filter aims to reject aberrant values from the NRT dataset.It may happen that it rejects an extreme but real event, for instance due to an urban pollution plume.In such cases, the data will be validated afterwards by the station manager.After the automatic processing, the station managers are invited to validate or invalidate data manually using a specific software developed by the ICOS Atmospheric Thematic Centre.For data manually flagged as invalid, a reason must be provided (e.g.leakage, maintenance, local traffic).This procedure does not ensure the systematic rejection of spikes in the data from local or regional processes.Meteorological measurements are also performed at the four stations with barometric pressure, temperature, wind speed, wind direction and relative humidity.Wind speed and direction are measured using 2-D or 3-D ultrasonic sensors installed at the same height of the greenhouse gas measurements.The sensors are adapted to the local weather; for instance at PDM (2877 m a.s.l.) the sensor is heated to avoid icing.

Spike detection algorithms
Three algorithms were tested to detect positive short-duration GHG spikes lasting from a few seconds to a few minutes, using time series of 1 min averaged mole fractions of CO 2 (as illustrated in the Supplement, Fig. S1), CH 4 and CO.The three methods presented in this section are commonly based on the calculation of the local standard deviations of measurements.A spike is detected when the difference between a previously determined background and the current data value is above a defined threshold.We will present in this section the corresponding threshold for the three methods.

Coefficient of variation method
The COV method (Brantley et al., 2014) is a modified version of the method presented by Hagler et al. (2010).It was developed to analyse data from a mobile laboratory measuring ultrafine particle concentrations near a road transect (Brantley et al., 2014) for peak detection of carbon monoxide, which was used as an indicator of the passage of vehicles.In our application we calculate the COV coefficients for CO 2 , CH 4 and CO time series following two steps.First, the standard deviation of a moving 5 min time window (with one window for each 1 min data point) is calculated (2 min before and after each 1 min data point).Second, the standard deviation of each time window is divided by the mean value of the complete time series.The 99th percentile of the COV coefficients is used as a threshold above which 1 min data are considered to be part of a spike.We also identified as contaminated data all data recorded 2 min before and after each contaminated data.The COV method is sensitive to the choice of threshold percentile.In the Supplement we illustrate in Fig. S2a an example of spike detection using the COV method during a CO contamination episode known to be affected by a local fire.One important feature of the COV algorithm, compared to the other methods, is the a priori definition of the percentage of data to be filtered (threshold percentile), meaning that the number of spike data is not automatically detected.

Standard deviation of the background
The SD method (Drewnick et al., 2012) considers that a time series is a combination of a smooth signal superimposed with a fast variable signal.The variable signal component in our case is related to local emissions causing spikes.To determine the variability of background concentration levels we calculated the standard deviation (σ ) of data falling between the first and the third quartile of the entire dataset.A sensitivity test with various quantile ranges is presented in Sect.3.1.We then select the first available data point, called C unf (un-flagged data, example in the Supplement Fig. S2b), assuming that it is not in a spike.The next data point in the time series C i is evaluated with respect to C unf ; spikes are defined by data values higher than a threshold defined as C unf plus an additive value: α × σ + √ n × σ (e.g. the red data point in Fig. S2b), where α is a parameter to control the selection threshold, and n is the number of points between C unf and C i .The value of α depends on the time series variability.A sensitivity analysis on the influence of α is presented in Sect.3.1.We set a default value of α = 1 for CO 2 and CH 4 and α = 3 for CO (Drewnick et al., 2012).The lower value for CO 2 and CH 4 is justified in Sect.3.1.The integer n contains a temporal information about the evolution of the time series.Indeed, while identifying a spike C i , the next data point is evaluated against C unf using an increased threshold to take in consideration the variability of the baseline during the spike event.If C i is lower than the threshold from Eq. ( 1), it is considered as "non-spike" and becomes the new reference value C unf .The following data will then be compared to this updated C unf .
The SD method was applied over 1-week time windows, i.e. the standard deviation over σ a week is used for threshold calculation.Using a longer period (e.g. 1 year) would give more weight to the seasonal and long-term variabilities, which are not relevant to identify short-term spikes using the 1-year standard deviation.

Robust extraction of baseline signal
The REBS method (Ruckstuhl et al., 2012) is a statistical method based on a local linear regression of the time series over a moving time window (characterized by a duration called the "bandwidth"), to account for the slow variability of the baseline signal, while outliers lying too far from the modelled baseline are iteratively discarded.The bandwidth h must be wide enough to allow for a sufficiently low fraction of outliers within h.The REBS code used here is based on the rfbaseline application developed in the IDPmisc package (Locher, et al., 2012) in R software.It is a modified version of the robust baseline estimation method developed to delete baseline from chemical analytical spectra (Ruckstuhl et al., 2001).The REBS method was applied at the high-Alpine Jungfraujoch site (Switzerland, 3580 m a.s.l.) and has been proven robust to estimate the background measurements of GHG (Ruckstuhl et al., 2012).The REBS method considers that greenhouse gas time series are composed of a background signal plus a regional contribution which may also include local effects (spikes) and measurement errors.The main difficulty is to correctly define the baseline signal of the measured time series.To achieve this goal, the choice of the bandwidth value is important.In the Jungfraujoch study, the baseline signal was defined as the smooth curve retrieved from REBS technique (Ruckstuhl et al., 2012) using a bandwidth of 90 days, in order to distinguish the contribution of regional emissions that add to the slow seasonal variability.Since, in our study, the targeted spikes last a few seconds to a few minutes, we chose to calculate the baseline using a bandwidth of 60 min to detect spikes of a few minutes (maximum 5 min).The threshold for spike detection in REBS is based on the calculation of a scale parameter β, which represents the standard deviation of data below the baseline curve, called ĝ (t i ).All measurements Y (t i ) that satisfy Y (t i ) > ĝ (t i ) + β × γ are classified as locally contaminated (illustration in Fig. S2c).β is a parameter to adjust the filtering strength.Ruckstuhl et al. (2012) set β = 3 for the detection of polluted data.For our purpose, a sensitivity test with different values of β is carried out in Sect.3.1.

Results
3.1 Optimization of the SD and REBS methods

Sensitivity to the parameters of the SD method
We conducted sensitivity tests in order to evaluate the influence of the two parameters α and σ used in the SD method.For α we tested values ranging from 1 to 3. Here, we present only the results for α = 1 and α = 3.For σ we compared the results calculated with σ based on 50 % of 1-week data, data between the first and third quartile (scenario σ b ) and for all the data of the week (scenario σ t ).We studied four configurations (two values of α with σ b or σ t ) on 1 min data every week at the four stations.Figure 1 shows an example of spikes detected by SD at FKL on 16 December 2014, corresponding to a known waste-burning episode reported by the station manager.The station logbook mentions waste-burning occurring nearby the station between 06:30 and 08:30, shown by a purple bar in Fig. 1.The blue area in Fig. 1 shows the CO data between the first and third quartiles, leading to a σ b = 3.6 ppb.Considering all the data, we obtain a 3 times higher standard deviation: σ t = 12.5 ppb.The SD method with α = 3 and σ b = 3.6 ppb selects two 1 min data points as spike as illustrated by the orange dots falling within the observed fire episode in Fig. 1.With α = 3 and σ t = 12.5 ppb, the method fails to detect any spike, indicating that the threshold value was too high.With α = 1 and σ b the SD method selects 44 additional 1 min spikes compared to α = 3 (data not reported as contaminated by the station manager).In both cases (α = 1 or α = 3) and σ t lead to a very high threshold and an underestimation of the number of spikes detected, since σ t includes the spike variabilities.
Based on this sensitivity test against a known local emission episode, we definitively rejected the use of σ t scenario.Table 2 represents the percentage of contaminated data detected over 1 year at the four sites, in the four tested configurations.As can be seen, using all 1 min data to calculate σ t leads to a higher threshold and consequently to less data detected as contaminated.On average over the four stations  and the three species, switching from σ b to σ t decreases the percentage of spikes by a factor 15 ± 16 (Table 2).Setting α = 3 increases the threshold and also decreases the number of spikes by on average a factor of 5 ± 7 (Table 2).The parameter α is related to the variability of the time series.Since our study aims to provide recommendations for automatic data processing of a monitoring network like ICOS in Europe, we would ideally keep the same set of parameters for all the stations of the network for each species.However, all the tests conducted in the present study have shown that it was not optimal to use the same parameter for the CO time series as for the CO 2 and CH 4 time series.Setting a lower α for CO leads to the overestimation of the number of spikes in the time series.This must result from the different variabilities of those trace gases.For instance, the ratio between hourly and minute-scale variabilities (characterized by standard deviations) for the sites used in this study is on average 2  2012), we decided to keep α = 3 for CO and set α = 1 for CH 4 and CO 2 because of their lower variability.

Sensitivity to the parameters of the REBS method
In order to evaluate the sensitivity of spikes to the parameter, β, we tested values of β ranging from 1 to 10.In this study, we present the REBS method using the default value β = 3 as proposed by (Ruckstuhl et al., 2012) in Jungfraujoch, compared with the optimal value for our purpose, β = 8.The resulting spike selection at FKL (during a local fire episode) is shown in Fig. 2. By setting β = 3, the REBS method detects the spike during the episode but it also finds other events which do not appear to be associated with evident contaminations (Fig. 2).With β = 8, the REBS correctly detects spikes during the fire episode (orange points in Fig. 2).We further compared these two values of β at the four stations every week for the year 2015 (from January to December) and report spike detection statistics in Table 3.About 10× more spikes for CO, and 5 to 7 times more for CH 4 and CO 2 , were detected by the REBS method with β = 3 compared to β = 8.Using β = 3, we detected more than 2 % of spikes for all species and up to 7 % for CO 2 at AMS.Using β = 8 these percentages are reduced to 0.2 and 1.5 %, respectively (Table 3).Indeed, β = 8 provides results in better agreement with spikes manually reported by site managers.Spike detection statistics for β ranging between 1 and 10 are presented in Table S1 in the Supplement, and additional illustrations for β = 1, 4, 8 and 10 are in Fig. S3.
Based on these sensitivity tests for the SD and REBS parameters, and the a prior estimation of the percentages of spikes manually detected by site managers, we apply the SD

Statistics of the three spike detection methods
The statistics for local spike detection with the three methods are given in Table 4. Due to the lack of completeness of the reports by the staff about potential local contaminations, we cannot compare those average statistics to the manual spike detection.With COV we detect an average of about 2 % of spikes with the 99th percentile threshold for all stations and species (Sect.2.2).With the methods SD and REBS, more variable percentages of spikes are found depending on the trace gas variabilities at each station.The percentages of contaminated data range from 0.1 % for CO 2 at AMS to 7 % for CH 4 at PDM.The value of 7 % detected for CH 4 at PDM is higher than at all other sites and species and reveals the influence of a source of methane on a site (see below and next paragraph).For OPE, we found a significant percentage of spikes (between 1 and 2 %) for all species, which may be explained by the higher number of local emission sources compared to other stations located in more pristine environments.
At FKL and AMS we obtain different percentages of spikes between SD and REBS for CO 2 .In fact, we assume that this difference can be related to the sea-land circulation when winds turn, leading to a fast change in atmospheric concentrations.For FKL, AMS and PDM, the percentage of spikes found with the SD and REBS methods vary by around 1 % except for CH 4 at PDM, where both SD and REBS detect high percentages of spikes (7 % for SD method and 2.3 % for REBS method).This is not expected for a high-mountain station.The results of a field campaign organized at PDM in Generally, the methods SD and REBS automatically detect spikes.However, the COV method requires a prior knowledge of datasets and the approximate number of data to be filtered.Because of this limitation for automatic spike detection we have discarded the COV method from further tests for the selection of the most reliable method for spike detection.

Comparison of SD and REBS methods to detect CH 4 spikes at the PDM clean-air mountain station
In this section we use field campaign data involving two instruments at PDM to study the efficiency of the SD and REBS methods.As noted above, the SD method detects 20× more spikes for CH 4 than for CO 2 at PDM ICOS site (Table 5).Looking for all possible local methane emissions at the site, we identified a small sewage treatment facility located about 20 m below the air intake of the analyzer (called AN-1) to be responsible for local CH 4 production.A test campaign was then organized between July and August 2015 with a second analyzer (called AN-2) installed 200 m away from of AN-1 (Fig. S4).The two analyzers were installed to measure simultaneously CH 4 and CO 2 molar fractions from 1 July to 31 August, as presented in Figs. 3 and 4. We applied the SD and REBS methods to the CH 4 and CO 2 time series from both analyzers.For CH 4 , analyzer AN-2 shows much fewer spikes than AN-1.For instance, between early July and late August 2015 there is more than 12 % of contaminated data with the SD method and 3 % with the REBS method in the AN-1 record, compared to only 0.8 % with SD and 0.7 % with REBS for the AN-2 instrument (Table 5).Considering that the two analyzers are measuring ambient air sampled 200 m apart, this large difference is clearly due to the local emission from the sewage facility.Interestingly, for CO 2 we detect more spikes in AN-2 than in AN-1 (Fig. 4).More than 1 % of CO 2 spikes were found in the AN-2 record compared to 0.5 % for AN-1 (Table 5, Fig. 4).This is explained by the proximity of a diesel generator to AN-2, used for a few hours during electrical storms.Both SD and REBS detect the same CO 2 spikes in both AN-1 and AN-2 time series (Fig. 4).
For CH 4 , SD and REBS methods confirm the frequent contamination of the AN-1 time series since 2014 and show a good ability to detect the spikes, yet with significant differences regarding the percentage of data detected as contaminated.Considering that the AN-2 analyzer provides a less contaminated CH 4 time series, we have used this experiment to compare between the two methods and select which one performs better for CH 4 spikes at PDM. Figures 5  and 6 represent the CH 4 and CO 2 measurements of AN-1 and AN-2.For AN-2, CH 4 concentrations (black data point in Fig. 5) rarely exceed 1950 ppb, whereas for AN-1 it exceeds 2000 ppb (black data point) and occasionally reaches almost 2200 ppb.SD and REBS methods both detect all contaminated data that range between 1980 and 2200 ppb for AN-1.The differences between the two automatic methods are more important for data that are below 1980 ppb. Furthermore, the filtered data (green data point) using the SD method better fit the 1 : 1 correlation line with the less contaminated analyzer than the REBS method (Fig. 5).The REBS method underestimates the lower part (foot) of the spikes (contaminated data that range between 1900 and 1980 ppb; Fig. 3b AN-1).However, for CO 2 the two methods detect nearly the same spikes (Fig. 4) and provide a similar filtered time series (Fig. 6).How can we explain the insufficient performance of the REBS method to detect the lower part of the CH 4 spikes?This method defines spikes using the estimated baseline (Ruckstuhl et al., 2012).When the population of contaminated data is high, the baseline is flawed due to the influence of spikes, and the baseline determination will be overestimated.In Fig. 5, we notice the missed detection of a number of contaminated data when using the REBS method due to the high values of the baseline.The SD method detects most of the local spikes at PDM, even if a slight underestimation of contaminated CH 4 data remains even after data filtering (Fig. 5).

Comparison between automatic and manual spike detection
In this section we analyse how SD and REBS methods detect spikes of CO 2 , CH 4 and CO that were independently identified by the station staff and related to a known local source of contamination at FKL and PDM.At FKL the contamination events reported by the site manager are associated with local fires nearby the station.The technical staff recorded dates of burning which could lead to significant emissions of trace gases, especially CO and CO 2 .It should be noted, however, that this information is not exhaustive in the sense that the person in charge does not necessarily have information on all burning events.We have matched the trace gas time series with the logbook information showing 17 days with local burning events between 2014 and 2015.We applied the SD and the REBS methods over 1- week time windows containing each burning event.First, we run the algorithms separately on the three species (CH 4 , CO 2 , and CO).Then, if the algorithm detects a spike in at least one species, we consider data for all other species as spikes as well.Figure S5 shows an example of the SD method applied on a fire episode between 15:00 and 16:00 on 6 November 2014.The spike occurred for the three species CO, CO 2 and CH 4 , with a similar pattern (spike also identified by the station manager).
Figure 7a represents the number of contaminated data detected by the automatic methods and manual flagging by the station staff.The numbers of selected data are split into three concentration ranges.The two automatic methods and the manual flagging detect the same number of contaminated data for CO classes higher than 400 ppb.We have an ex-  cellent agreement for the spikes with the highest concentrations.For the low-concentration spikes (< 400 ppb), the automatic methods are less selective than the manual flagging.In Fig. 8 we show an example of contaminated data detected by automatic and manual flagging methods at FKL.When the difference between uncontaminated (identified as reference) and spike data is not significant compared to a certain standard deviation threshold, the methods may thus fail.The data highlighted by the blue circle in Fig. 8 give an example of when spikes identified by automatic methods diverge from the manual identification.Such data are either close to the baseline REBS selection (Fig. 8c) or close to the C unf value for the SD method (Fig. 8b).At this point it is important to note that the person in charge of data flagging selects spikes using a known period (from a starting to an ending time).
A second comparison study between automatic methods and manual detection has been performed at PDM using the CO time series from December 2014 to February 2015.During winter, the station experienced several snowfall episodes and snow was removed with a diesel-powered snow blower.This operation influenced the GHG concentra-tions and leads to sharp spikes easily observed in the CO time series (Fig. S6).Most of the spikes are successfully detected by the SD and the REBS methods.Figure 7b represents the number of contaminated data detected by SD in red and REBS in green and data manually eliminated by the site manager in blue.Similar to the FKL local fires, the SD and the REBS methods detected the same number of spikes as the manual selection for high concentrations; 857 contaminated data points are detected by the SD method (same as the principal investigator, or PI) for concentrations higher than 400 ppb, and 828 data points are detected by the REBS method.The main difference between the automatic and the manual flagging methods are related to the lower part of the spikes.For 2861 data (CO < 400 ppb) flagged manually by the PI station, the SD method detects 2270 data points whereas the REBS method detects only 1799 data points.In fact, for moderate spikes the SD method selects 70 % of contaminated data according to the PI whereas the REBS method retrieves only 60 %.We have also calculated the number of events not considered by the manual flagging and considered by the automatic methods.For a total of 3402 data detected Table 6.Classification of the number of hours in which the SD method filtered at least 1 min data point for CO, CO 2 , and CH 4 at the four sites.The intervals represent the differences between filtered and the non-filtered time series averaged at a hourly scale in ppm for CO 2 and (ppb) for CO, and CH 4 .The values in brackets represent the percentages of the impacted hours on the whole time series.by the SD method, only 211 data were not considered by the PI, which represents 0.25 % on the whole period.For the REBS method, 133 data out of 2981 were not detected by the PI (nearly 0.15 %).However, these statements should be used with caution since the manual spike detection information is not exhaustive, and the person in charge does not necessarily have information on all contaminated events.

Influence of the spike detection on hourly averages:
In this section we estimate the impact of the spike detection on data used for atmospheric inversions, which are typically hourly or half-hourly averages.For this purpose we have calculated the differences between the hourly averages of the filtered and non-filtered time series.In Table 6, we present the number of hours in which at least 1 min data for each species was filtered.We classified the results into three intervals.For CO 2 , the first interval represents the values lower than 0.5 ppm, the second interval is for differences between 0.5 and 1 ppm and the third stands for the higher differences (values more than 1 ppm).For CH 4 and CO we set the first interval for values lower than 5 ppb, the second interval represents the data between 5 and 10 ppb and the third for differences higher than 10 ppb.Most of the differences between filtered and non-filtered hourly data vary between 0 and 0.5 ppm for CO 2 and between 0 and 5 ppb for CH 4 and CO.For CO 2 at the AMS station, the SD method detects 1454 1 min data points (Table 4), which occur in 104 h during the 3 years of measurements.Of those hours, 62 % are characterized by a difference up to 0.5 ppm, and 18 % show more than 1 ppm of difference.For CH 4 measurements in AMS, the 8801 contaminated data points detected by the SD method (Table 4) occur during only 21 h, this modifies the hourly averages by 5 ppb as a maximum.For four sites, we notice similar effect on the hourly averages.Most of the impacted hours are characterized by a difference within the first interval (0.5 ppm for CO 2 ; 5 ppb for CH 4 and CO).However, for OPE we observe higher differences with 53, 36 and 47 % of the impacted hours in the highest interval, respectively, for CO 2 , CH 4 and CO.This feature is probably related to the higher number of the nearby local emission sources nearby OPE site compared to the other stations, which are located in more pristine environments.Figure S7 shows a decrease of the number of impacted hours for higher intervals (the same pattern as the three other stations).Overall, the aggregation of filtered measurements at the hourly timescale showed a relatively weak impact of the filtered data for background sites, but more significant effect for stations located closer to local sources.

Conclusions
The recent increase in the number of studies that have been applied to study the spatial representativeness of GHG observations demonstrates the need to define efficient and reliable methods for the identification spikes related to local contamination sources.Three methods based on the standard deviation calculation were compared in order to provide an objective algorithm for the GHG data spike detection.
We addressed the problem of identifying concentration spikes of a few minutes duration in GHG continuous time series by applying automatic detection methods (COV, SD and REBS) previously used for atmospheric pollution but not systematically for GHG time series.Stations with different regimes of variability where local emission sources are identified without ambiguity (engines/waste near the station buildings, or fires nearby) are chosen to evaluate the performance of the automatic methods against spikes manually identified by station managers.The COV algorithm can be considered as a semiautomatic method since it requires an a priori choice of a percentage of data rejected as spikes.We tested the COV method with a percentage of 1 % of spike data for all species and for all stations.This limitation made the COV method less flexible and informative for universal automatic spike detection across different sites.For the two fully automatic methods (SD and REBS) we performed sev-eral sensitivity tests in order to recommend the best set of parameters for our four chosen stations, which are considered to be representative of most ICOS stations (disregarding those located in suburban environments).
The application of the automatic methods on contaminated time series at the Pic du Midi observatory showed the ability of SD and REBS to detect real spikes on the CH 4 time series caused by the sewage treatment facility of the observatory.Nevertheless, significant differences regarding the rejection percentage were noticed between the methods.Both methods have a tendency to unduly keep a certain fraction of the spike base (lowest concentrations in spikes).REBS is worse than SD in this respect.In the REBS method, when the percentage of spikes is high, the baseline determination is biased toward high concentrations, leading to underestimate spike anomalies above this baseline.However, the SD method correctly detects most of the contaminated data.The comparison between SD, REBS and the manual flagging methods showed good agreement with an overall percentage of 70 % of successful spike data detection for SD and 60 % for REBS, at two stations (FKL and PDM) where local contaminations are well identified by the local staff.These two automatic algorithms detect short-term spikes, allowing for a more consistent and automatic filtering of the time series even if they identify less contaminated data than by manually flagging.The estimation of the impact of the spike detection on data used for atmospheric inversions showed a relatively weak impact of the filtered data for background sites and a more significant effect for stations located closer to local sources.However, even if the implementation of an automatic algorithm can successfully identify short-term spikes due to local contaminations, it is important to note that the priority in the selection of a background site should be to avoid as much as possible the occurrence of such spikes.In the case where the spikes can not be totally avoided, it is then important to try to understand their cause and look for possible actions to minimize them.The modification of the air inlet at the Pic du Midi, described in this study, is a very good example of what can be done once the origin of spikes is understood.
The SD method is found to be efficient and reliable for the purpose of spike detection.It has been proposed for operational implementation in the ICOS Atmospheric Thematic Centre Quality Control (ATC-QC) software to perform daily spike detection of the near-real-time dataset of continuous ICOS stations.The first step will be to run the SD method in a test mode over all ICOS stations and compare with manual detection when available in order to set optimal values of parameters.This analysis can be complemented with wind speed and direction data in order to possibly attribute spikes to fixed local sources.

Figure 1 .
Figure 1.Comparison between two sets of α parameter for SD method.Red represents detected spikes for α = 1; orange data are the detected spikes for α = 3.The blue area shows the data between the first and the third quartile (q1= 0.25, and q2= 0.75).

Figure 2 .
Figure 2. Comparison between two sets of ß parameter for REBS method.Red represents detected data for β = 3; orange is the detected data for β = 8, applied on FKL measurement 6 of November 2014.

Figure 3 .
Figure 3. AN-1 CH 4 measurement at T55 building in panels (a) and (b) and AN-2 TDF building in panels (c) and (d).Black data points are the retained measurements; red points represent the flagged using SD method in panels (a) and (c) and REBS method in panels (b) and (d).

Figure 4 .
Figure 4. AN-1 CO 2 measurement at T55 building in panels (a) and (b) and AN-2 TDF building in panels (c) and (d).Black data points are the retained measurements; red points represent the flagged using SD method in panels (a) and (c) and REBS method in panels (b) and (d).

Figure 5 .
Figure 5. Plots of CH 4 measurements of AN-1 against AN-2.All data are in black, and the green points represent the retained data using SD method in panel (a) and REBS method in panel (b).

Figure 6 .
Figure 6.Plots of CO 2 measurements of AN-1 against AN-2.All data are in black, and the green points represent the retained data using SD method for (a) and REBS method for (b).

Figure 7 .
Figure 7. Number of flagged CO measurements using manual method (blue), SD method (red) and REBS method (green) for Finokalia (a) and Pic Du Midi (b).

Figure 8 .
Figure 8. Example of a spike detection using manual (a), SD (b) and REBS (c) methods during a known biomass burning event at Finokalia.

Table 2 .
Sensitivity of SD method spike detection for two sets of α (α = 1 and α = 3) and for two ranges of background data interval (σ b and σ t scenario) for the four stations and all species.

Table 3 .
Sensitivity of REBS spike detection method for two sets of β (β = 3 and β = 8) for the four stations and all species for the year 2015.
times smaller for CO compared to CO 2 and CH 4 .As recommended in Brantley et al. (2014) and Drewnick et al. (

Table 4 .
Percentage (rounded to one decimal) and number of contaminated data detected by SD, REBS and COV method overall stations (AMS, FKL, OPE and PDM) and for the three species CO, CO 2 and CH 4 .

Table 5 .
Percentages and number of contaminated data detected by SD and REBS methods for CO 2 and CH 4 at PDM.