Interlaboratory comparison of delta C-13 and delta D measurements of atmospheric CH4 for combined use of data sets from different laboratories

. We report results from a worldwide interlaboratory comparison of samples among laboratories that measure (or measured) stable carbon and hydrogen isotope ratios of atmospheric CH 4 ( δ 13 C-CH 4 and δ D-CH 4 ) . The offsets among the laboratories are larger than the measurement reproducibility of individual laboratories. To disentangle plausible measurement offsets, we evaluated and critically assessed a large number of intercomparison results, some of which have been documented previously in the literature. The results indicate signiﬁcant offsets of δ 13 C-CH 4 and δ D-CH 4 measurements among data sets reported from different laboratories; the differences among laboratories at modern atmospheric CH 4 level spread over ranges of 0.5 ‰ for δ 13 C-CH 4 and 13 ‰ for δ D-CH 4 . The intercomparison results summarized in this study may be of help in future attempts to harmonize δ 13 C-CH 4 and δ D-CH 4 data sets from Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
Methane (CH 4 ) is an important anthropogenic and natural greenhouse gas, and it also has a large role in atmospheric chemistry through its reaction with the hydroxyl radical. Since individual CH 4 source types have characteristic isotope signatures and loss processes are associated with specific kinetic isotope effects, carbon and hydrogen isotope ratios of CH 4 (δ 13 C-CH 4 and δD-CH 4 ) have been useful in constraining the global CH 4 budget. Dictated by global mass balance, the average isotopic composition of CH 4 in the atmosphere (δ 13 C-CH 4 or δD-CH 4 ) equals the flux-weighted isotopic composition of the sources, corrected for the total kinetic isotope effects of removal processes (e.g. Stevens and Rust, 1982;Cicerone and Oremland, 1988;Quay et al., 1991Quay et al., , 1999Miller et al., 2002;Turner et al., 2017;Rigby et al., 2017). It has been pointed out that assignment of representative isotopic signatures of various CH 4 sources remains uncertain due to their large spatial and temporal variability across the globe (e.g. Sherwood et al., 2017), which could result in large uncertainties of isotope-based estimates of the global CH 4 budget (Schwietzke et al., 2016). Nonetheless, the value of isotope measurements was amply demonstrated by recent studies which suggested shifts in the global CH 4 source over the last decades (Schaefer et al., 2016;Rice et al., 2016;Nisbet et al., 2016;Schwietzke et al., 2016); without isotopic analyses such conclusions would have been difficult to achieve. The isotopic ratios are commonly reported using the delta notation: where R represents the atomic ratio of the less abundant isotope over the most abundant isotope in the sample or the standard. Conventionally, measured values are reported relative to the international isotope-scale VPDB (Vienna Pee Dee Belemnite) for δ 13 C-CH 4 and VSMOW (Vienna Standard Mean Ocean Water) for δD-CH 4 in per mil. Given that the atmospheric lifetime of CH 4 is about a decade, its variation in background air is relatively small. For that reason, its mole fraction and isotopic measurements have to have high precision and accuracy. For δ 13 C-CH 4 and δD-CH 4 , researchers have achieved measurement reproducibil-ity of < 0.1 for δ 13 C-CH 4 and < 2 ‰ for δD-CH 4 . Incorporating δ 13 C-CH 4 and δD-CH 4 data sets in chemistry transport models is useful for quantitatively separating different CH 4 source categories and attempts have been made to reduce uncertainties in the global CH 4 budget (e.g. Hein et al., 1997;Mikaloff Fletcher et al., 2004a, b;Monteil et al., 2011;Kirschke et al., 2013;Ghosh et al., 2015;Rice et al., 2016;Schaefer et al., 2016;Schwietzke et al., 2016;Röckmann et al., 2016;Turner et al., 2017;Rigby et al., 2017). However, although an increasing number of δ 13 C-CH 4 and δD-CH 4 data have been reported over the last decades, significant measurement offsets among laboratories have been found for both δ 13 C-CH 4 (e.g. Levin et al., 2012) and δD-CH 4 . It is clear that both traceability to the standard scales and interlaboratory comparisons (intercomparisons) are indispensable for combined use of δ 13 C-CH 4 and δD-CH 4 data from different laboratories. Many such intercomparisons have already been made, either on an ad hoc basis or on a more organized scale. However, a systematic evaluation of the underlying calibrations and related measurement offsets among laboratories has been lacking. It is also noted that some measurement programmes for δ 13 C-CH 4 and/or δD-CH 4 have been discontinued, and maintaining access to such data sets including well-established interlaboratory offsets is important.
Here we combine and evaluate the existing comparison results to quantify interlaboratory measurement differences in order to facilitate the use of δ 13 C-CH 4 and δD-CH 4 data. This study therefore opens the possibility for merging historic CH 4 isotope data reported from multiple laboratories (i.e. synthesis analysis of the existing data sets) for a better understanding of the global CH 4 budget.
We first present a technical overview of atmospheric δ 13 C-CH 4 and δD-CH 4 measurements and potential causes of measurement offsets among currently available data sets (Sect. 2), and then we summarize measurement methods by the laboratories that have conducted δ 13 C-CH 4 and δD-CH 4 measurements for air and ice core samples (Sect. 3). In Sect. 4, we report new intercomparison exercises between some groups. We then link the intercomparison results through a survey of previously published intercomparisons and provide the current best estimates of measurement offsets among data sets from different laboratories (Sect. 5). Finally, we summarize the current status and briefly discuss possible causes of the measurement offsets as well as remaining issues that should be kept in mind when combining the use of currently existing data sets of isotopic composition of CH 4 (Sect. 6). Table 1. List of laboratories that conduct measurements of δ 13 C-CH 4 and δD-CH 4 . For each laboratory, measurement systems and relevant information that could have contributed, the interlaboratory measurement offsets are summarized. Brackets in the RM column indicate the laboratory from which the original standard scale was propagated. See Fig. 1 for overview of the past intercomparison exercises, Fig. 2 for intercomparison summary and the list of participating institution/project acronyms in the text for the laboratory names. No Allison et al. (1995), C2: Santrock et al. (1985), C3: Craig (1957), C4: Assonov and Brenninkmeijer (2003). b Raw ion current correction: the Kr interference was corrected by subtracting the Kr-caused anomalies in the raw ion current data. DI offset: the Kr interference was corrected by an offset relative to a DI-IRMS measurement. PCS: Kr was separated by a post-combustion separation column or cryogenically. See Sect. 2.5. c R1: Lowe et al. (1991), R2: Lowe et al. (1994), R3: Quay et al. (1999)  In the 1990s, atmospheric δ 13 C-CH 4 (δD-CH 4 ) was analysed using an offline technique in which CH 4 was separated from the sample air and converted to CO 2 (H 2 ) for subsequent offline δ 13 C-CH 4 (δD-CH 4 ) analyses by dual-inlet isotope ratio mass spectrometry (DI-IRMS; e.g. Stevens and Rust, 1982;Lowe et al., 1991;Quay et al., 1991Quay et al., , 1999Sugawara et al., 1996;Poß, 2003). The original methodology was based on the combustion of CH 4 in sample air, with interfering compounds such as CO 2 , H 2 O, N 2 O, CO and non-methane hydrocarbons being removed cryogenically, chemically or by gas chromatography before CH 4 combustion. The number of measurements was limited not only because of laborious and time-consuming laboratory procedures but also because large volumes of air sample were required (> 100 L STP for δD-CH 4 ). Later, a method based on a continuous-flow gas chromatography isotope ratio mass spectrometry (GC-IRMS) technique combined with combustion and pyrolysis furnaces became available (Merritt et al., 1995;Burgoyne and Hayes, 1998;Hilkert et al., 1999), which dramatically reduced time and effort in the laboratory and likewise the amount of sample air required (now typically 100 mL STP ). Such systems are now used in most laboratories worldwide to acquire δ 13 C-CH 4 and δD-CH 4 data in the current and past atmosphere (Rice et al., 2001;Miller et al., 2002;Sowers et al., 2005;Ferretti et al., 2005;Morimoto et al., 2006;Fisher et al., 2006;Umezawa et al., 2009;Brass and Röckmann, 2010;Sperlich et al., 2013;Schmitt et al., 2014;Bock et al., 2014;Brand et al., 2016;Röckmann et al., 2016). Although these systems use a similar measurement principle, they vary in the use of pre-concentration of CH 4 in sample air, GC separation and combustion/pyrolysis, data corrections and in the specific IRMS instrument among laboratories (see Schmitt et al., 2013, Sect. 3 and Table 1). Besides analysis by mass spectrometry, laser-based spectroscopy has also been developed for atmospheric δ 13 C-CH 4 and δD-CH 4 measurements (Bergamaschi et al., 2000;Eyer et al., 2016), but detailed discussion on the technique is beyond the scope of this study.

Standard scales
VPDB and VSMOW are the standard scales for δ 13 C-CH 4 and δD-CH 4 , respectively. To make measurements traceable to these standard scales, each laboratory needs to calibrate its laboratory reference gases against reference materials (RMs) with known values on the standard scales. In this study, the term "calibration" means to measure a laboratory gas (for instance a laboratory working standard gas that is routinely compared with samples) against a standard at higher hierarchy level and to assign to that working standard a δ 13 C-CH 4 or δD-CH 4 value traceable to the standard scale. In principle, all measurements at individual laboratories intend to ultimately anchor their working standards and sample gases to the VPDB or VSMOW scale using the RMs provided by the International Atomic Energy Agency (IAEA) or National Institute of Standards and Technology (NIST; Coplen et al., 2006;Brand et al., 2014). However, since RMs and recommended calibration methods for measurements of δ 13 C-CH 4 and δD-CH 4 in air have not yet been provided (Sperlich et al., 2012, individual groups have developed their own calibration strategies. Since the δ 13 C-CH 4 measurement by IRMS is taken by δ 13 C analysis in CO 2 oxidized from CH 4 in air, some laboratories use pure CO 2 gases as a working standard. In many laboratories, these internal CO 2 standard gases were calibrated against pure CO 2 produced from the primary anchor of the VPDB-scale NBS-19 or other RMs by using DI-IRMS (Table 1). Since the typical atmospheric δ 13 C-CH 4 value (about −47 ‰) differs considerably from the δ 13 C value of NBS-19 (+1.95 ‰), some laboratories have used other RMs with VPDB values close to atmospheric δ 13 C-CH 4 such as LSVEC (lithium carbonate reference material prepared by Harry J. Svec), IAEA-CO-9 and RM 8563 as a second anchoring point of the VPDB scale (see Table 1). This minimizes the risk of significant errors in realization of the standard scale (due to scale contraction or 17 O correction, described in the following sections). A standard scale established this way at an individual laboratory was often propagated to laboratory-internal CO 2 standard gases at lower hierarchy levels, and they were used as the reference in DI-IRMS or GC-IRMS measurement of CO 2 processed from CH 4 in sample air. Ideally, this accurately links δ 13 C-CH 4 of the sample to the international isotope scale. In contrast, it has been recommended that a measured value of a sample is determined against a reference gas that undergoes the all preparation steps in the sample measurement line in order to cancel out possible isotopic fractionations due to different treatment between the sample and reference gases (principle of identical treatment; Werner and Brand, 2001). This concept has been taken into account in some laboratories; a working standard is calibrated for δ 13 C-CH 4 and sample measurements are referenced by comparison with measurements of that working standard processed in the same manner (e.g. Brand et al., 2016). Despite intentions of best traceability to RMs, the variety of calibrations has resulted in diverse realizations of the VPDB scale across δ 13 C-CH 4 measurement programmes. As in Table 1, the different RMs that have been applied to δ 13 C-CH 4 calibration include NBS-19 (limestone), IAEA-CO-9 (barium carbonate), LSVEC (lithium carbonate) and RM 8562-8564 (CO 2 ); see Coplen et al. (2006), Brand et al. (2014) and Sperlich et al. (2016). It is also noted that uncertainties of assigned values for these RMs range up to a few tenths per mille and the assigned values have been revised over time , which might have complicated the realization of the standard scale at each labora-tory. Furthermore, most of these RMs are in different chemical forms, and different isotopic fractionations may have occurred during acid digestion to CO 2 , which could have biased calibrations at each laboratory. Lastly, the WMO (2016) has reported exhaustion of NBS-19 and instability of LSVEC, both of which are critical RMs for the VPDB scale. Associated possible revision of δ 13 C values of RMs in the future will affect the consistency of the data sets from different laboratories.
For δD-CH 4 , in the conventional offline measurements, CH 4 in sample air needs to be processed to H 2 O followed by reduction to H 2 for a subsequent DI-IRMS measurement. GC-IRMS requires pyrolysis of CH 4 to H 2 . Therefore, individual laboratories have prepared internal standards of H 2 O (liquid) or H 2 (gas), which were calibrated against primary RMs (water) or H 2 reference gases certified for δD (Table 1). Although the situation is less complicated compared to δ 13 C-CH 4 in terms of variety in chemical properties of RMs, the lack of RMs for δD-CH 4 forced laboratories to develop their calibration method standard scale individually. It is also noted that, similarly to δ 13 C-CH 4 , this principle of identical treatment has not been followed strictly at the all laboratories. If not followed, sample measurements are subject to subtle changes in conditions of the all preparation steps (e.g. conversion of CH 4 ), while such changes do not affect the measured value of a reference gas injected directly into the IRMS.

Scale contraction
It has been found that cross contamination between sample and reference CO 2 gases shrinks the δ 13 C distance measured on DI-IRMS (Meijer et al., 2000;Verkouteren et al., 2003a, b). This effect is known as the scale contraction or η effect, and the magnitude is specific to the IRMS instrument and its settings. Since the VPDB scale for δ 13 C-CH 4 has been realized and propagated via CO 2 calibrations by DI-IRMS at individual laboratories, the instrument-dependent scale contraction effect could have caused a significant difference in measurement values, especially at the low δ 13 C values of atmospheric CH 4 of about −47 ‰ (Wendeberg et al., 2013).

17 O correction
For measurement of δ 13 C-CH 4 by IRMS, CH 4 is first oxidized to CO 2 and the different isotopic variants of the produced CO 2 are then registered on Faraday cups with massto-charge ratios m/z of 44, 45 and 46. Since the raw ion beam intensity for m/z = 45 is the sum of 13 C 16 O 2 and 12 C 17 O 16 O, the final δ 13 C value is obtained by correcting for the contribution of the 17 O-containing isotopologue, known as 17 O correction (e.g. Assonov and Brenninkmeijer, 2003). Several algorithms such as Craig (1957) and Santrock et al. (1985) have been suggested (see Assonov and Brenninkmeijer, 2003 and references therein) and implemented into software/programmes of the IRMS companies and individual laboratories. Assonov and Brenninkmeier (2003) showed that the bias caused by different 17 O-correction algorithms could exceed general repeatability achieved by IRMS measurements. The 17 O-correction method of each laboratory is listed in Table 1.

Krypton interference in GC-IRMS
The transition from DI-IRMS to GC-IRMS analyses reduced the analytical effort, but also introduced complications that were initially not recognized and taken into account. It was recently found that atmospheric krypton (Kr) interferes with the δ 13 C-CH 4 GC-IRMS analysis if Kr is present in the ion source during the data acquisition of the CO 2 peak generated from CH 4 oxidation (hereafter CH 4 -derived CO 2 peak) . Thus the δ 13 C-CH 4 measurements on a GC-IRMS system can be biased if Kr is not sufficiently separated either from CH 4 or from the CH 4 -derived CO 2 peak after the CH 4 combustion. Schmitt et al. (2013) demonstrated that the doubly charged krypton isotope 86 Kr 2+ , produced in the ion source of an IRMS, can cause lateral tailing extending into the Faraday cups used for δ 13 C analysis (i.e. m/z of 44, 45 and 46), which compromises the measured signal of the CH 4 -derived CO 2 peak. This effect had not been recognized for more than a decade since the early years of GC-IRMS measurements (Merritt et al., 1995) and thus has not been taken into account in many data sets of atmospheric δ 13 C-CH 4 reported in the meantime (e.g. Miller et al., 2002;Morimoto et al., 2006;Fisher et al., 2011;Röckmann et al., 2011;Umezawa et al., 2012a, b). Furthermore, because the Kr effect is system dependent and variable with time , applying plausible corrections to past data may not be feasible. Likewise, several gas species including Kr can affect δD-CH 4 measurements, and this effect is also system dependent .
Several solutions have been suggested to eliminate or account for the Kr interference . Among them, three methods have been implemented at different laboratories (Table 1). Briefly, (1) after the CH 4 oxidation to CO 2 , Kr is separated from the CH 4 -derived CO 2 by using a post-combustion separation column (PCS) or cryogenically.
(2) An offset due to the Kr interference is estimated by comparison with a DI-IRMS measurement (DI offset). (3) The Kr interference peak is subtracted from the raw ion current time series of the IRMS acquisition (raw ion current correction). A more detailed description has been presented in .

Measurements of participating laboratories
In this section, we briefly document measurement systems of individual laboratories for ease of reference in the following intercomparisons (Sects. 4 and 5). For details, we refer to more dedicated publications listed in Table 1. The table also visualizes differences among laboratories in terms of possible causes of the measurement offsets described in Sect. 2.

NIWA
The National Institute for Water and Atmospheric Research (NIWA, originally INS (Institute of Nuclear Sciences) and later INGS (Institute of Nuclear and Geological Sciences) until 1992) successfully initiated systematic measurements of atmospheric δ 13 C-CH 4 by means of offline CH 4 separation and conversion followed by a DI-IRMS measurement in 1988 (Lowe et al., 1988(Lowe et al., , 1991. A suite of CO 2 working gases with δ 13 C-CH 4 values around −47 ‰ referenced to IAEA materials were utilized to calibrate the measurements. An overall reproducibility of the δ 13 C-CH 4 measurement was evaluated to be 0.02 ‰ (Lowe et al., 1991). The δ 13 C-CH 4 measurements at NIWA are ultimately calibrated against CO 2 produced from NBS-19, IAEA-CO-9 and LSVEC. The long-term δ 13 C-CH 4 records have been presented since then (Lowe et al., 1994(Lowe et al., , 1997(Lowe et al., , 2004Bergamaschi et al., 2001;Schaefer et al., 2016).  reported that repeated measurements of the two working reference gases and archived air indicated no detectable drift over 16 years since 1992. NIWA has also operated a GC-IRMS system since 2004 (Ferretti et al., 2005) with reproducibility of 0.1 ‰. The Kr interference on the GC-IRMS δ 13 C-CH 4 measurement has been identified and is corrected by an offset relative to the conventional DI-IRMS measurement (see Sect. 4.1).

IMAU
The GC-IRMS system at the Institute for Marine and Atmospheric research Utrecht (IMAU) has been described by Brass and Röckmann (2010). The measurement reproducibility is estimated to be 0.07 and 2.3 ‰ for δ 13 C-CH 4 and δD-CH 4 , respectively. Sample air is measured against reference air that is processed in the GC-IRMS system in the same manner as a sample. The IMAU δ 13 C-CH 4 standard scale is based on a set of assigned values for 13 firn air samples measured at Max Planck Institute for Chemistry (MPIC;Bräunlich et al., 2001) and they are ultimately referenced to a CO 2 gas produced from NBS-19 (Röckmann, 1998;Bergamaschi et al., 2000). The δD-CH 4 standard scale is based on a set of reference gases originally produced at the MPIC (see Sect. 2.3). These calibration details have also been documented by Sperlich et al. (2016). The IMAU system was originally affected by Kr but later modified to remove this interference. A correction was applied for data obtained before the system modification .

MPIC
The MPIC has reported δ 13 C-CH 4 and δD-CH 4 measurements at a baseline station (Bergamaschi et al., 2000) and for firn air samples (Bräunlich et al., 2001) based on an offline DI-IRMS measurement for δ 13 C-CH 4 (Bergamaschi et al., 2000) and a tunable diode laser-based absorption spectrometer (TDLAS) for δD-CH 4 (Bergamaschi et al., 1994). Some firn air measurements by Bräunlich et al. (2001) were performed by using a GC-IRMS system at the Laboratory of Glaciology and Geophysics of the Environment. As described in Sect. 3.2, the δ 13 C-CH 4 and δD-CH 4 standard scales of MPIC are based on that of IMAU. For the δ 13 C-CH 4 DI-IRMS measurement, the CH 4 -derived CO 2 was measured against a working standard (pure CO 2 ) that was calibrated against NBS-19 on a DI-IRMS system (Röckmann, 1998;Bergamaschi et al., 2000). The MPIC δD-CH 4 scale is based on measurements of standard gases at the Bundesanstalt für Geowissenschaften und Rohstoffe, Hannover, Germany. CH 4 was combusted to CO 2 and H 2 O, followed by reduction of H 2 O to H 2 for subsequent DI-IRMS analysis on H 2 . They were calibrated against VSMOW and SLAP (Bergamaschi et al., 2000). The measurements of atmospheric δ 13 C-CH 4 and δD-CH 4 at the MPIC were discontinued.

MPI-BGC
The Max Planck Institute for Biogeochemistry (MPI-BGC) set up a GC-IRMS system for δ 13 C-CH 4 and δD-CH 4 measurements, and it has been operated for air samples collected at baseline stations . The longterm (3 years) reproducibility was assessed to be 0.12 for δ 13 C-CH 4 and 1.0 ‰ for δD-CH 4 . Initially, the GC-IRMS measurements had been anchored to a working standard air calibrated by IMAU. The Kr effect was eliminated by a PCS column, and the initial calibration has in the meantime been replaced by a new primary calibration, where measurements are ultimately anchored to NBS-19 and LSVEC for δ 13 C-CH 4 and VSMOW-2 and SLAP-2 for δD-CH 4 . This calibration, termed JRAS-M16, is the basis for the δ 13 C-CH 4 and δD-CH 4 values from MPI-BGC reported in this paper.

UCI
The University of California Irvine (UCI) measured atmospheric δ 13 C-CH 4 by offline DI-IRMS and δD-CH 4 by GC-IRMS (Tyler et al., 1999(Tyler et al., , 2007Kai et al., 2011). The UCI GC-IRMS system for both δ 13 C-CH 4 and δD-CH 4 has been described in detail by Rice et al. (2001). The measurement reproducibility of the GC-IRMS system was estimated to be 0.05 and 1.5 ‰ for δ 13 C-CH 4 and δD-CH 4 , respectively, while that of the offline DI-IRMS δ 13 C-CH 4 measurement was 0.05 ‰. Samples were measured against laboratory working standard gases of pure CO 2 for δ 13 C-CH 4 and pure H 2 for δD-CH 4 . The δ 13 C-CH 4 calibration is based on a CO 2 reference gas provided by NIWA, which was compared with CO 2 produced from NBS-19 and IAEA-CO-9 (Lowe et al., 1999). The δD-CH 4 calibration is referenced to three H 2 gas cylinders purchased from Oztech Gas Company (Rice et al., 2001). The possible Kr interference on the GC-IRMS system is unclear (the laboratory is now closed), but it appears that the Kr effect had been avoided using liquid nitrogen cooling of the GC column as surmised by Schmitt et al. (2013).

TU
The GC-IRMS system at Tohoku University (TU) has been described by Umezawa et al. (2009). The measurement reproducibility is estimated to be 0.08 for δ 13 C-CH 4 and 2.2 ‰ for δD-CH 4 . Sample measurements are made against pure CO 2 and H 2 working standard gases for δ 13 C-CH 4 and δD-CH 4 , respectively. The δ 13 C-CH 4 calibration is based on a CO 2 primary gas produced from NBS-19. The H 2 working standard for the δD-CH 4 measurement is referenced to water laboratory standards that are calibrated against VSMOW and SLAP. Measured δD-CH 4 values are corrected so that the value of a laboratory test gas is kept constant over time to take into account fluctuations in the measured value due to the condition of the pyrolysis furnace (Umezawa et al., 2009(Umezawa et al., , 2012a. The Kr interference in the δ 13 C-CH 4 measurement was identified, but modification or correction has not been implemented. It has been documented that the δ 13 C-CH 4 measurement at TU shifted by +0.27 ‰ after July 2008 (the cause of this sudden shift has yet to be identified) and measurements afterwards were corrected for this value to keep the data consistent (Umezawa et al., 2012a, b). Note that TU made a rigorous re-evaluation of the long-term measurements of their working standard gas recently, and the TU δ 13 C-CH 4 data sets will be revised accordingly. Therefore, the comparison numbers presented here are not comparable to those for earlier publications (Umezawa et al., 2009(Umezawa et al., , 2011(Umezawa et al., , 2012a.

NIPR
The National Institute of Polar Research (NIPR) reported δ 13 C-CH 4 measurements at an Arctic site using a GC-IRMS system (Morimoto et al., 2006(Morimoto et al., , 2017. The measurement reproducibility was evaluated to be 0.06 ‰. The δ 13 C-CH 4 calibration follows same procedure as TU. By injecting different quantities of Kr, it was confirmed that ambient Kr does not significantly interfere with the δ 13 C-CH 4 measurements at NIPR.

UHEI
The University of Heidelberg (UHEI) carried out δ 13 C-CH 4 measurements by DI-IRMS (Levin et al., 1999(Levin et al., , 2012. The typical measurement reproducibility was evaluated to be 0.05 ‰ (Levin et al., 1999). The UHEI δ 13 C-CH 4 measurements are calibrated against CO 2 reference materials (RM 8562, RM 8563 and RM 8564;. Although reported previously only for signatures of source CH 4 (Levin et al., 1993), UHEI also took offline δD-CH 4 measurements of atmospheric samples by DI-IRMS and TDLAS (Poß, 2003). The δD-CH 4 measurements by DI-IRMS were taken for pure H 2 (H 2 O from CH 4 oxidation converted to H 2 with zinc as catalyst) and were calibrated against VSMOW and SLAP. Note that UHEI recently re-evaluated all their atmospheric δ 13 C-CH 4 and δD-CH 4 measurements rigorously, based on the history of laboratory standards used; therefore, comparison numbers published in earlier works are not comparable to the revised values presented here.

INSTAAR
The Institute of Arctic and Alpine Research (INSTAAR) of the University of Colorado, Boulder has measured δ 13 C-CH 4 and, intermittently, δD-CH 4 using a GC-IRMS system for flask air samples from the cooperative sampling network of National Oceanic and Atmospheric Administration (NOAA; Miller et al., 2002). Reproducibilities of the δ 13 C-CH 4 and δD-CH 4 measurements are evaluated to be 0.08 and 2 ‰, respectively (Miller et al., 2002;White et al., 2016). The IN-STAAR δ 13 C-CH 4 measurement currently follows the UCI calibration, while the δD-CH 4 measurement is not explicitly anchored to the VSMOW scale . The Kr interference in the δ 13 C-CH 4 measurement is significant, and a PCS column was therefore implemented into the system in May 2017. Correction of the data for the Kr interference (1998-present) is under evaluation. Of the data presented here, only the ice core intercomparison round robin (Sect. 3.4) and the INSTAAR-MPI-BGC comparison (Sect. 3.5) have not been interfered with by Kr.

UB
The University of Bern (UB) makes δ 13 C-CH 4 measurements from ice cores using a GC-IRMS system with an overall reproducibility of 0.15 ‰ Bock et al., 2017). The UB measurements are referenced to a whole-air working standard with a CH 4 mole fraction of 1508.2 ppb and an assigned δ 13 C-CH 4 value of −47.34 ± 0.02 ‰ (named "Boulder, CA08289" in Schmitt et al., 2014). This value is anchored to the standard scale used at INSTAAR (Sect. 3.10). UB also measures δD-CH 4 for ice core samples (Bock et al., 2010(Bock et al., , 2017. The overall measurement precision for ice core sample (including extraction of air from an ice sample) was evaluated to be 2.3 ‰. The UB δD-CH 4 measurement is referenced by using an ambient air cylinder (named "Air Controlé") with a δD-CH 4 value of −93.6 ± 2.8 ‰, which was cross-referenced to a high-pressure cylinder filled at the Alert Station ("Alert 2002/11" with δD-CH 4 of −82.2 ± 1.0 ‰) analysed on the scale maintained at UHEI (Bock et al., 2010. However, this value has to be corrected to −85.2 ± 1.0 ‰ to account for the recent re-evaluation at UHEI (Sect. 3.9). All UB data published after 2011 are free of Kr interference.

AWI
The Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research (AWI) reported δ 13 C-CH 4 measurements from ice cores using a GC-IRMS system Fischer et al., 2008;Möller et al., 2013). The measure-ment reproducibility was estimated to be 0.2 ‰. The δ 13 C-CH 4 measurements employed the UHEI standard scale via comparison of measurements of an air sample collected at Neumayer Station, Antarctica .

CIC
The Centre for Ice and Climate (CIC) of the Niels Bohr Institute has reported δ 13 C-CH 4 measurements from ice cores (Sperlich et al., 2015) using a GC-IRMS system with measurement reproducibility of 0.09 ‰ (Sperlich et al., 2013). CIC also set up an offline combustion system for samples with a large amount of CH 4 , which is combined with DI-IRMS for δ 13 C-CH 4 and with either a high temperature conversion/elemental analyser (TC/EA) coupled to IRMS or laser spectroscopy for δD-CH 4 (Sperlich et al., 2012); the measurement reproducibility is 0.04 for δ 13 C-CH 4 and 0.7 ‰ for δD-CH 4 . The CIC measurements are referenced to RM 8563 for δ 13 C-CH 4 and VSMOW-2 and SLAP-2 for δD-CH 4 . The combined uncertainty of this analytical system including the uncertainty of the entire traceability chain was estimated at 0.07 for δ 13 C-CH 4 and 0.7 ‰ for δD-CH 4 .

Intercomparison between UCI and IMAU
An intercomparison between UCI and IMAU was made by analysing six air samples at both laboratories; the air samples were collected along a flight track of commercial aircraft in the upper troposphere in the early phase of the CARIBIC (Civil Aircraft for the Regular Investigation of the atmosphere Based on an Instrument Container) project (Brenninkmeijer et al., 1999). The original samples were collected into large stainless steel cylinders (21 L) and aliquots of them were transferred into smaller stainless steel canisters (∼ 2.3 L) for storage after delivery to the MPIC laboratory. Different subsamples from identical original samples were sent to UCI and IMAU for analysis, and they were measured at UCI in 2008 and at IMAU in 2012 to 2013. The measurement results at both laboratories are summarized in Table 2. The result indicated significant differences of +0.42 ± 0.04 ‰ for δ 13 C-CH 4 (UCI value is higher than at IMAU) and of −10.7 ± 0.7 ‰ for δD-CH 4 (UCI value is lower than IMAU).

Intercomparison between TU/NIPR and IMAU
An intercomparison between TU/NIPR and IMAU was carried out during 2013-2015. The TU laboratory prepared four stainless steel canisters (∼ 1 L) filled with dried ambient air (canisters MD1 and MD2) and CH 4 -in-synthetic air gas (canisters MD3 and MD4) with CH 4 mole fractions ranging from 899 to 2117 ppb on the TU CH 4 scale (Aoki et al., 1992; Table 2. Result of intercomparison of δ 13 C-CH 4 and δD-CH 4 measurements between UCI and IMAU.    Table 3). The canisters were analysed at TU and then sent to IMAU, after which they were sent back to TU and reanalysed to confirm the stability of the air samples in the canisters during the intercomparison exercise. The measurements at TU before and after the transport to IMAU from April 2013 to July 2015 indicated that possible drifts during canister storage and transportation are small (< 0.1 for δ 13 C-CH 4 and < 3.5 ‰ for δD-CH 4 ). NIPR also measured the canisters for δ 13 C-CH 4 . The results indicate significant differences of +0.50 ± 0.07 ‰ for δ 13 C-CH 4 (TU value is higher than IMAU) and of −13.9 ± 0.9 ‰ for δD-CH 4 (TU value is lower than IMAU; Table 3). The measurements of the four canisters at NIPR were +0.48 ± 0.11 ‰ higher than IMAU. However, the differences of δ 13 C-CH 4 measurements are smaller for the ambient air samples (MD1 and MD2) than the CH 4 -in-synthetic air samples (MD3 and MD4). It is also noted that the δ 13 C-CH 4 difference between the laboratories is largest for the low CH 4 mole fraction (∼ 900 ppb) sample (MD3). The cause is unclear, but might be related to (1) deviation in δ 13 C-CH 4 of the latter samples from the typical atmospheric value, i.e. scale contraction effect; (2) difference in air matrix, i.e. natural versus synthetic air; and (3) difference in linearity with respect to CH 4 mole fraction.
This result therefore indicates that the measurement offset is not constant for a wide range of δ 13 C-CH 4 values and CH 4 mole fractions or for differences in the air matrix. Since we focus in this study on comparison of atmospheric samples, the intercomparison results for the ambient air samples are considered as interlaboratory measurement offsets. The average differences for ambient air are +0.40 ± 0.04 for TU and +0.31 ± 0.03 ‰ for NIPR relative to IMAU. Likewise, the δD-CH 4 offset of TU versus IMAU is considered to be −13.1 ± 0.6 ‰.

Intercomparison between UHEI and MPI-BGC
An intercomparison between UHEI and MPI-BGC was conducted in 2013 on six archived air samples from Neumayer station, Antarctica. These samples, collected in the time period from 1988 to 2008, had been analysed by UHEI for δ 13 C-CH 4 and δD-CH 4 by DI-IRMS (two samples were analysed for δD-CH 4 additionally by TDLAS) during 2003-2010 and were stored in high-pressure cylinders. The typical reproducibility for the measurements is between 0.02 and 0.05 ‰ for δ 13 C-CH 4 and between 1.6 and 2.6 ‰ for δD-CH 4 . In 2013, duplicate aliquots were sampled in 1 L glass flasks and analysed at MPI-BGC. The measurement results at both laboratories are summarized in Table 4. The results show insignificant measurement offsets of +0.02 ± 0.05 for δ 13 C-CH 4 and of +0.4 ± 0.6 ‰ for δD-CH 4 (with the MPI-BGC values being more negative than those from UHEI in both cases).

Round robin comparison of ice core analysis laboratories
A round robin cylinder exercise was initiated to facilitate intercomparison of laboratories that measure δ 13 C-CH 4 and δD-CH 4 in ice core and firn air samples. Part of this exercise has been presented previously ( Table 2 in Schmitt et al., 2013). Three high-pressure Al cylinders were filled with varying trace gas compositions to mimic present day, pre-industrial and last-glacial air mole fractions. The CH 4 mole fractions of these cylinders were 1830.6 (CA 03560), 904.0 (CC 71560) and 372.2 ppb (CA 01179) on the NOAA-2004 CH 4 scale , respectively. The cylinders were shipped to the laboratories listed in Table 5 for analysis of all constituents that each lab was capable of measuring at that time. In Table 5, we list the δ 13 C-CH 4 and δD-CH 4 results from each laboratory. The Kr interfering artefact associated with GC-IRMS δ 13 C-CH 4 analyses was taken into account in many of the analyses . In some cases, aliquots from the tanks were measured using offline combustion to CO 2 followed by δ 13 C-CH 4 analyses via conventional DI-IRMS. The cylinders were remeasured at PSU at the end of the round robin to verify that the isotopic composition had not shifted over the 9 years during the transportation of the cylinders. The difference between the 2007 and 2016 δ 13 C-CH 4 measured at PSU was less than 0.14 ‰ for two of the three cylinders, indicating that the isotopic composition of the cylinder air was stable throughout the intercomparison exercise. The third cylinder (CA 01179) was 0.58 ‰ off from the original measurement, which is just outside the analytical uncertainty associated with PSU measurements. There may have been a slight drift over the 9 years between measurements, although the cause has yet to be resolved. The results of the δ 13 C-CH 4 intercomparison showed agreement with the average standard deviation among all six participating laboratories better than 0.37 ‰ for the cylinders with high (CA 03560) and middle (CC 71560) mole fractions. δD-CH 4 results show more scatter due to the difficult nature of the measurements and the offset among the standard scales.    . b Uncertainties are standard errors of the mean for the repetitive measurements.

Intercomparison between INSTAAR and MPI-BGC
STAAR values being more positive than those from MPI-BGC). The measurements for the cylinder with low δ 13 C-CH 4 values were 0.60 ‰ off between both laboratories presumably due to the scale contraction effect. It is noted that the INSTAAR measurements without the Kr removal yielded a higher δ 13 C-CH 4 value (+0.44 ± 0.02 ‰ relative to the MPI-BGC measurement) for one cylinder (LOUI-001), which presumably reflects the Kr interference pronounced at a lower CH 4 mole fraction.

Intercomparison based on co-located samples through the NOAA cooperative sampling network
The Cooperative Flask Sampling Network, operated by the NOAA Global Monitoring Division, collects air samples from numerous sites around the world, and INSTAAR has analysed those air samples for δ 13 C-CH 4 since 1998. There are several sites at which air samples have been concurrently collected by other laboratories. RHUL has analysed air samples at Alert (ALT), Canada and Ascension Island (ASC), and NIWA has done at Baring Head (BHD), New Zealand. Although the individual laboratories do not measure the same sample air in these cases, these co-located air samples provide an opportunity for assessment of possible measurement offsets as examined previously (Levin et al., 2012).
(1) For the RHUL-INSTAAR difference, the δ 13 C-CH 4 data at ALT during 2009-2014 and at ASC during 2010-2015 were compared to each other if both air samples were collected within a 10 h interval. The ALT and ASC comparisons indicated that the INSTAAR measurement is +0.05 ± 0.16 (N = 350) and 0.00 ± 0.17 ‰ (N = 80) higher than RHUL, respectively. Note that, for this comparison, the RHUL GC-IRMS data were corrected by −0.20 ‰; the offset value was estimated from measurements of flasks filled from two different cylinders (CH 4 in air, both at ambient mole fraction level, one at ambient δ 13 C-CH 4 and the other at about −56 ‰ by spiking 13 C-depleted CH 4 ).
(2) For the NIWA-INSTAAR comparison, the δ 13 C-CH 4 data at BHD during 2009-2014 from both laboratories were compared if both air samples were collected within a 15 h interval. The result indicates that the INSTAAR measurement is +0.08 ± 0.11 ‰ (N = 45) higher than NIWA.

Measurement offsets among laboratories
Here we revisit intercomparisons published previously. Some laboratories employed a standard scale from another laboratory. Such intercomparisons and interlaboratory scale propagations reported in the literature are displayed in Fig. 1. In this section we review the previous and present intercomparison measurements and accordingly suggest plausible measurement offsets among different laboratories (Fig. 2). Relevant information is summarized in Table 1 and the subsections below correspond to those in Sect. 3. Since some laboratories focus on δ 13 C-CH 4 and δD-CH 4 measurements from ice core and firn air samples to elucidate changes of atmospheric CH 4 in the past, Fig. 2 also combines δ 13 C-CH 4 and δD-CH 4 data both for the modern and past atmosphere. It is, however, noted that Fig. 2 suggests the measurement offsets at the modern CH 4 mole fraction and isotopic ratios and that such values could be different for the past atmosphere (see Sect. 4.2,4.4 and 4.5).
In this study, we report δ 13 C-CH 4 offsets with respect to the conventional DI-IRMS measurement at NIWA (Lowe et al., 1991) because NIWA's δ 13 C-CH 4 measurements have been compared with those from the most laboratories to date (Table 1 and Fig. 1). In contrast, δD-CH 4 measurements from different laboratories have been limited. We report δD-CH 4 offsets of different laboratories with respect to the IMAU measurement. The uncertainties presented in this study are generally standard errors of the mean, but numbers in the literature are cited as is. It should be therefore noted that the uncertainties, in particular those calculated by error propagation, are not rigorously consistent in all places in the paper.

δ 13 C-CH 4
As listed in Table 1, the DI-IRMS measurement at NIWA has been repeatedly intercompared with other laboratories. Importantly for this comparison, Bromley et al. (2012) reported the long-term stability of the measurement over the years 1992-2007, and it is likewise confirmed until 2011. The NIWA GC-IRMS system, based on the methodology of Miller et al. (2002), has an offset relative to the DI-IRMS of −0.19 ± 0.26 ‰. Measurements on the GC-IRMS informing this instrument comparison are subject to the Kr interference. A Kr-correction has since been derived in an empirical equation from the round robin intercomparison results (Schmitt et al., 2013 and Sect. 4.4), accounting for differences in the CH 4 mole fraction and an exponential fit to the GC-IRMS versus DI-IRMS results. The GC-IRMS system is currently equipped with a PCS column to eliminate the Kr interference.

δ 13 C-CH 4
According to Schmitt et al. (2013), the IMAU measurement at the present CH 4 mole fraction level is in agreement with NIWA with an offset value of −0.04 ± 0.07 ‰ (no. 2 in Fig. 2a). This corresponds to the round robin comparison for the cylinder with a CH 4 mole fraction of 1830.6 ppb (CA 03560) in Table 5 (Sect. 4.4). The difference is −0.03 ± 0.05 ‰ for data analysed before the modification to remove the Kr interference (see Table 2 in . The intercomparison in this study (Sect. 3.4) also shows that the IMAU offset is −0.08 ± 0.11 ‰ for the cylinder with the CH 4 mole fraction of 904.0 ppb (CA 71560).

δD-CH 4
As listed in Table 1, IMAU has made the most intercomparisons with other laboratories so far. It is noted that the standard scale at IMAU was propagated from the MPIC (Bergamaschi et al., 2000; Sect. 2.2) and that it recently showed a reasonable agreement with the recent calibration at MPI-BGC .

δ 13 C-CH 4
As written in Sect. 3.3, the standard scale at the MPIC was transferred to IMAU (Brass and Röckmann, 2010;Sperlich et al., 2016). Since no direct comparison with NIWA is available, the MPIC offset relative to NIWA is estimated to be −0.04 ± 0.07 ‰, identical to the IMAU offset (no. 3 in Fig. 2a).

δ 13 C-CH 4
Intercomparison exercises of UCI with external laboratories have been made several times. The oldest intercomparison (Lowe et al., 1991) reported good agreement (< 0.02 ‰) between the former UCI laboratory (S. Tyler at NCAR) and NIWA (INS, IGNS at that time). Among the later measurements, there were two direct intercomparisons with NIWA.
(1) Tyler et al. (2007) reported an intercomparison result of UCI to be −0.01 ± 0.09 ‰ with respect to NIWA (top in no. 5, Fig. 2a). For this comparison, 16 air samples collected at Niwot Ridge, Colorado or Baring Head, New Zealand were exchanged between UCI andNIWA in 1998-1999. (2) This study (Sect. 4.4 and Table 5) shows that the UCI measurements are +0.14 ± 0.12 (bottom of no. 5 in Fig. 2a) and +0.04 ± 0.08 ‰ higher than NIWA for the cylinders with high (CA 03560) and middle (CC 71560) CH 4 mole fractions, respectively. (3) In contrast, the intercomparison in this study (Sect. 4.1 and Table 2) combined with the IMAU offset (Sect. 5.2) yields +0.42 ± 0.04 ‰ relative to NIWA (not shown in Fig. 2a), but is inconsistent with the above inter-comparison results. The determinate error has yet to be resolved.

TU
5.6.1 δ 13 C-CH 4 The intercomparison in this study (Sect. 3.2) and the IMAU offset (Sect. 5.2) give an offset of the TU measurements relative to NIWA to be +0.36 ± 0.08 ‰ (no. 6 in Fig. 2b). Measurements at TU have been regularly compared with those at NIPR and they are in agreement within reproducibility of both systems (Umezawa et al., 2009 and additional measurements since then). This is consistent with the previous intercomparison between NIPR and NIWA (Sect. 5.7) and indicates long-term intra-laboratory consistency of TU and NIPR measurements. It is reasonable that TU shares the offset level with NIPR, because both institutions use the same standard scale. As described in Sect. 2.6, it should be noted that the above offset value is not for the data sets currently available to the research community (Umezawa et al., 2011(Umezawa et al., , 2012a, for which +0.32 ± 0.08 ‰ (not shown in Fig. 2) is recommended. Correction of the data sets from the earlier publications is under evaluation.

NIPR
5.7.1 δ 13 C-CH 4 An intercomparison between NIPR and NIWA was conducted in 2004 (Morimoto et al., 2006). After the recent update of the NIPR standard scale, the NIPR offset is evaluated to be +0.33 ± 0.04 ‰ higher than NIWA (top in no. 7, Fig. 2a). The intercomparison in this study (Sect. 4.2) combined with the IMAU offset (Sect. 5.2) indicates that the NIPR measurement is +0.27 ± 0.08 ‰ with respect to NIWA (bottom in no. 7, Fig. 2a), which is consistent with the above value.

δD-CH 4
To our knowledge, no intercomparison exercises with UW have been reported.
5.9 UHEI 5.9.1 δ 13 C-CH 4 Levin et al. (2012) estimated the UHEI δ 13 C-CH 4 offset to be −0.169 ± 0.031 ‰ relative to NIWA (top in no. 9, Fig. 2a). The intercomparison between UHEI and MPI-BGC in this study (Sect. 3.3), together with the MPI-BGC offset (Sect. 5.4), also infers the UHEI offset to be −0.05 ± 0.13 ‰ (bottom in no. 9, Fig. 2a), which is consistent with the above value. Earlier measurements of three air samples at both UHEI and NIWA indicated that the UHEI offset is −0.04 ± 0.04 ‰ relative to NIWA (Poß, 2003;. It is also noted that, in an intercomparison presented by Nisbet (2005), the UHEI measurement was −0.07 ± 0.04 ‰ lower than NIWA. As these earlier comparison results have been published before the rigorous corrections of the UHEI measurements, these values are not included in Fig. 2a.

INSTAAR
5.10.1 δ 13 C-CH 4 Levin et al. (2012) estimated that the INSTAAR measurements have an offset of +0.132 ± 0.022 ‰ with respect to NIWA (top in no. 10, Fig. 2a). In an intercomparison exercise reported by Nisbet (2005), the INSTAAR measurement was +0.14 ± 0.06 ‰ higher than NIWA (not shown in Fig. 2a), which is consistent with the above value. This study (Sect. 4.4) indicates that the INSTAAR measurement is +0.15 ± 0.05 ‰ higher than NIWA for the cylinder with high CH 4 mole fraction (CA 03560; second top of no. 10 in Fig. 2a). The intercomparison between INSTAAR and MPI-BGC (Sect. 4.5) indicates that, combined with the MPI-BGC offset (Sect. 5.4), the INSTAAR offset is +0.21 ± 0.12 ‰ relative to NIWA (second bottom in no. 10, Fig. 2a). Lastly, the co-located sample intercomparison (Sect. 4.6) indicates the INSTAAR offset to be +0.08 ± 0.11 ‰ (bottom in no. 10, Fig. 2a). It is important to note again that only the round robin intercomparison measurements (Sect. 4.4 and second top of no. 10 in Fig. 2a) and the intercomparison with MPI-BGC (Sect. 4.5) were carried out with a PCS column to remove the Kr interference and that the data set currently available to the public from INSTAAR will be evaluated for future correction.
As described in Sect. 2.10, INSTAAR follows the standard scale of UCI. Tyler et al. (2007) reported that measurements of 10 air cylinders filled at Niwot Ridge, Colorado in 2000-2001 were analysed at both laboratories and that the result indicated an offset of INSTAAR to be +0.04 ± 0.12 ‰ relative to UCI. The collection of air samples at Niwot Ridge for the UCI-INSTAAR comparison continued until 2003. A revisit to the measurement record showed that the INSTAAR offset relative to UCI had shifted over the years; the average differences are +0.02 ± 0.08 for 2000 (N = 7), +0.12 ± 0.07 ‰ for 2001 (N = 2) and +0.26 ± 0.03 ‰ for 2002 (N = 12). This fact may suggest excursions of the internal calibration of either laboratory for these years, but the cause has yet to be resolved; this problem will be addressed in a subsequent paper from either group. It is noted that the offsets relative to NIWA for both laboratories inferred from the different intercomparison pathways are consistent with each other within the uncertainties (Fig. 2a).

RHUL
5.11.1 δ 13 C-CH 4 Nisbet (2005) reported that the RHUL DI-IRMS measurements agreed well with NIWA with an offset of 0.00 ± 0.02 ‰ (top in no. 11, Fig. 2a). At the same time, they indicated that the RHUL GC-IRMS measurement has an offset of +0.11 ± 0.13 ‰ with respect to NIWA, and later Nisbet et al. (2016) reported that the GC-IRMS system has an offset of about +0.3 ‰ relative to NIWA (not shown in Fig. 2a). Based on measurements of air in two cylinders exchanged between RHUL and NIWA in 2011 and 2014, RHUL applied an offset correction (−0.20 ‰) to all data (see Sect. 4.6), by which the RHUL offset has now been evaluated to be +0.12 ± 0.03 ‰ (middle of no. 11 in Fig. 2a). The intercomparisons based on the co-located air samples via INSTAAR (Sect. 4.6), combined with the INSTAAR offset (Sect. 5.10), infer that the RHUL offset is +0.10 ± 0.03 ‰ relative to NIWA (bottom in no. 11, Fig. 2a).

PDX
5.12.1 δ 13 C-CH 4 Rice et al. (2016) presented an offset of +0.024 ± 0.088 ‰ of the PDX measurements relative to UW by comparing coinciding measurements of archive air samples at PDX and δ 13 C-CH 4 records from Quay et al. (1999) from stations Mauna Loa, Hawaii and Tutuila, American Samoa (1995)(1996). With the UW offset with respect to NIWA (Sect. 5.8), it is indicated that the PDX measurement is +0.08 ± 0.09 ‰ higher than NIWA (no. 12 in Fig. 2a). This offset is consistent with the UCI offset with respect to NIWA within the uncertainties (note that PDX follows the UCI standard scale).

δD-CH 4
Since PDX follows the UCI standard scale (Teama, 2013;Rice et al., 2016), the likely offset is the same as that of UCI (no. 12 in Fig. 2b).

PSU
5.13.1 δ 13 C-CH 4 According to Schmitt et al. (2013), the PSU measurement has an offset of +0.03 ± 0.16 ‰ relative to NIWA after being corrected for the Kr interference. The measurements of the cylinder with a high CH 4 mole fraction (CA 03560) at PSU are +0.03 ± 0.16, +0.27 ± 0.16 and +0.13 ± 0.05 ‰ (no. 13 top, middle and bottom, respectively in Fig. 2a) higher than NIWA for different Kr corrections at different measurement times, these values being consistent with each other within the uncertainties.
5.14.2 δD-CH 4 Sapart et al. (2011) gives an intercomparison result between UB and IMAU, indicating the UB offset of 0.0 ± 1.6 ‰ relative to IMAU (top in no. 14, Fig. 2b). This value is consistent with the intercomparisons between UB and IMAU reported by Bock et al. (2010). Later UB modified the measurement set-up, but the measurements of same air samples before and after all modifications were in good agreement as presented by Bock et al. (2014). The intercomparison in this study (Sect. 3.4) shows that the UB measurement differs insignificantly by −0.8 ± 2.5 ‰ with respect to IMAU for the cylinder with high CH 4 mole fraction (CA 03560; bottom in no. 14, Fig. 2b).

CIC
5.16.1 δ 13 C-CH 4 Sperlich et al. (2012) reported measurements of an air cylinder at CIC, IMAU and UB. The CIC measurement insignificantly different by +0.01 ± 0.09 ‰ from IMAU, and the CIC offset with respect to NIWA is estimated to be −0.03 ± 0.11 (top in no. 16, Fig. 2a). They have also reported that the CIC measurement is in agreement with UB with difference of +0.00 ± 0.14 ‰. It is noted that, although the UB offset relative to NIWA is estimated to be significant (Sect. 5.14), the difference is still within uncertainties of the intercomparison exercises. Two pure CH 4 gases prepared by Sperlich et al. (2012) constitute crucial components of the reference gas series developed at MPI-BGC . This has provided a direct intercomparison between CIC and MPI-BGC. The CIC measurement is +0.09 ± 0.14 ‰ higher than MPI-BGC. Combined with the MPI-BGC offset (Sect. 5.4), the CIC offset with respect to NIWA is estimated to be +0.02 ± 0.18 ‰ (bottom in no. 16, Fig. 2a), which is consistent with the aforementioned value.

Summary and discussion
We carried out interlaboratory comparison exercises for atmospheric δ 13 C-CH 4 and δD-CH 4 covering many laboratories around the world. In addition, we reviewed previously published intercomparison results. The results indicated measurement offsets among laboratories, which range from −0.2 to +0.3 ‰ with respect to the NIWA DI-IRMS measurement for δ 13 C-CH 4 and up to −13 ‰ with respect to the IMAU measurement for δD-CH 4 . These offset values are larger than the measurement uncertainties from individual laboratories.
The significant δ 13 C-CH 4 measurement offsets among laboratories are obvious even though all laboratories ultimately refer to the VPDB scale. We have presented potential causes of the measurement offsets in individual laboratories (Sect. 2), with possible further causes being hidden in all preparation and measurement steps of standard materials.
(1) The scale contraction effect for DI-IRMS CO 2 analysis, which is instrument dependent, could be responsible for a considerable part of the observed offsets, given the fact that the atmospheric δ 13 C-CH 4 value (about −47 ‰) differs considerably from the primary anchor of the VPDB scale . (2) Individual laboratories have carried out calibrations against different RMs with different uncertainties of assigned values; such diverse calibration trajectories have also definitely contributed to the interlaboratory measurement offsets. Such RMs have different chemical properties and are processed to CO 2 at individual laboratories, at which different fractionation is possible. (3) Different algorithms for 17 O correction have been used for δ 13 C measurements at different laboratories, which could have caused biases among available data sets. (4) The Kr interference on a GC-IRMS system is in several cases a probable cause of the offsets, and unfortunately, this effect is system dependent and can vary with time and the instrument settings. Lastly, it is important to note that we summarized δ 13 C-CH 4 measurement offsets at the modern atmospheric CH 4 mole fraction level, but the offset may vary with the amount of CH 4 analysed (e.g. lower mole fractions in ice core analyses, see Tables 3, 5 and 6), because of a non-linear response of IRMS (Umezawa et al., 2009) and because the Kr interference is directly dependent on the Kr-to-CH 4 ratio . Furthermore, the intercomparisons presented here focus on modern atmospheric CH 4 of typically −47 ‰ and such comparisons for high and low δ 13 C-CH 4 values (e.g. CH 4 from ice cores or enriched/depleted source signatures) are to date very limited (Tables 3 and 6 in this study).
Concerning δD-CH 4 measurement offsets among laboratories, it is interesting that the listed laboratories can be roughly split into two groups whose δD-CH 4 measurements differ by ∼ 10 ‰. Some laboratories with higher δD-CH 4 values refer to an identical set of standards produced at the MPIC (MPIC and IMAU) or to the UHEI calibration (UHEI and UB), and measurements of these groups have been crossreferenced (see Sects. 2 and 4), thereby showing the reasonable agreements. The original calibrations were carried out using an offline CH 4 processing technique (cryogenic separation and conversion of CH 4 to CO 2 and H 2 O followed by H 2 O reduction to H 2 ) with subsequent analysis by DI-IRMS. The other laboratories with higher δD-CH 4 values recently developed their own primary calibrations independently (CIC and MPI-BGC). CIC used an offline CH 4 processing technique combined with DI-IRMS, whereas MPI-BGC adopted TC/EA coupled to continuous-flow IRMS. For the lower δD-CH 4 group, some laboratories carried out calibrations against Oztech H 2 gases (UCI, PDX and PSU) or have other calibration pathways (TU and INSTAAR; see Sect. 2). These laboratories used local H 2 working gas standards for GC-IRMS, which were calibrated with a separate DI-IRMS procedure. As is the case for δ 13 C-CH 4 , possible causes of the observed δD-CH 4 discrepancies could have arisen in all preparation and measurement steps. (1) The classical technique for DI-IRMS involves processing of H 2 O, and the associated steps in experimental lines are prone to surface adhesion and contamination of H 2 O; thereby considerable memory effect is possible (Bergmaschi et al., 2000).
(2) Similarly to δ 13 C-CH 4 , calibration for δD-CH 4 involves measurements of standards with different chemical properties (H 2 O and H 2 ), and such calibrations at different laboratories could contribute to the offset. (3) Difficulties in maintaining stable pyrolysis conditions for GC-IRMS (Bock et al. 2010) might have affected measurements against local H 2 working standards in the cases where the principle of identical treatment (Werner and Brand, 2001) was not strictly followed. Lastly, it is noted that the non-linearity of the IRMS in δD-CH 4 measurements (Brass and Röckmann, 2010) may also play a role for samples with low mole fractions such as ice core analyses.
The measurement offsets summarized in this study should be thoroughly taken into account when data from different laboratories are combined, and this study will be of help when incorporating merged δ 13 C-CH 4 and δD-CH 4 data sets into a state-of-the-art chemistry transport model. However, it is recommended that data users contact the data providers directly for the latest information whenever possible. The Kr interference is under evaluation at some laboratories and it will possibly involve an update of the data sets that are currently available. More importantly, it is imperative to have common reference gases with transparent and reproducible traceability (for instance, Sperlich et al. 2016) and to carry out a systematic intercomparison programme (flask or cylinder round robin) in the research community to attain the necessary but ambitious high-compatibility goals of 0.02 ‰ for δ 13 C-CH 4 and 1 ‰ for δD-CH 4 (WMO, 2016). Such thorough efforts will facilitate optimized use of δ 13 C-CH 4 and δD-CH 4 data sets in a combined way and maximize the number of isotope data sets (and thus their spatial and temporal coverage) usable for enhancing our understanding of the global CH 4 cycle.
We welcome collaborative works that analyse the multiple data sets from laboratories that participated in this study (see data availability listed in Table 1). Data users can examine the offset numbers (Table 1 and Fig. 2) and adjust the data sets at least for data points with values close to the modern atmosphere in δ 13 C-CH 4 and δD-CH 4 as well as the CH 4 mole fraction. For data with CH 4 mole fractions and isotopic ratios that are far from modern background values (e.g. sample air from ice core and stratosphere and those influenced by sources), more intercomparisons are needed to establish correction factors among data sets.
Data availability. All the interlaboratory comparison data presented in this study are included in the tables of this paper.