Development and evaluation of a suite of isotope reference gases for methane in air

. Measurements from multiple laboratories have to be related to unifying and traceable reference material in order to be comparable. However, such fundamental reference materials are not available for isotope ratios in atmospheric methane, which led to misinterpretations of combined data sets in the past. We developed a method to produce a suite of synthetic CH 4 -in-air standard gases that can be used to unify methane isotope ratio measurements of laboratories in the atmospheric monitoring community. Therefore, we calibrated a suite of pure methane gases of different methanogenic origin against international referencing materials that deﬁne the VSMOW (Vienna Standard Mean Ocean Water) and VPDB (Vienna Pee Dee Belemnite) isotope scales. The isotope ratios of our pure methane gases range between − 320 and + 40 ‰ for δ 2 H–CH 4 and between − 70 and − 40 ‰ for δ 13 C–CH 4 , enveloping the isotope ra-tios of tropospheric methane (about − 85 and − 47 ‰ for δ 2 H–CH 4 and δ 13 C–CH 4 respectively)


Introduction
Isotope ratios of CH 4 in the present and the past atmosphere (e.g. from ice cores) are a powerful tool to study the biogeochemical processes that cause the variation of CH 4 in the atmosphere (Stevens and Rust, 1982;Quay et al., 1991Quay et al., , 1999;;Lowe et al., 1994;Sapart et al., 2012;Möller et al., 2013;Sperlich et al., 2015;Schaefer et al., 2016).Recently, two conflicting publications highlighted (i) the interpretative power when data sets from multiple laboratories are combined for spatiotemporal analysis of CH 4 isotope ratios (Kai et al., 2011) and (ii) the pitfalls when differences due to laboratory offsets are misinterpreted as spatial variability of CH 4 sources (Levin et al., 2012).Levin et al. (2012) identified calibration offsets between three laboratories by comparing their long-term observations in Antarctic background air, where the δ 13 C of CH 4 is assumed to be free of spatial gradients.However, this technique is a temporary work-around that excludes the use of data sets from laboratories without a history of observations in Antarctica or a traceable link to P. Sperlich et al.: Development and evaluation of a suite of isotope reference gases Antarctic observations.This dilemma could be solved if suitable reference materials (RMs) were available to all laboratories that measure isotope ratios of atmospheric CH 4 .
Certified reference materials (CRMs) are provided by the IAEA, NIST and others for many analytes.The lack of CRMs for CH 4 isotope ratios has long been recognised in the literature, ranging from pioneering papers (e.g.Craig, 1953;Schiegl and Vogel, 1970) to recent publications on analytical systems to measure isotope ratios in atmospheric CH 4 (e.g.Sapart et al., 2011;Sperlich et al., 2013;Bock et al., 2014;Tokida et al., 2014;Eyer et al., 2015) as well as papers that present and interpret such data (e.g.Levin et al., 2012;Sapart et al., 2013;Schaefer et al., 2016).In the absence of CRMs for isotope ratios of CH 4 , many laboratories have developed methods to calibrate purified CH 4 against CRMs that were available as a "second-best solution", thereby accepting the shortcoming that those CRMs comprised of different physicochemical properties and are therefore not ideal (IAEA, 2003).For example, δ 13 C-CH 4 calibrations were made against NBS 20 (limestone) and NBS 21 (graphite) by Stevens and Rust (1982), against NBS 16 (CO 2 ) and NBS 20 (limestone) by Quay et al. (1991), against IAEA-CO-9 (Barium carbonate) by Lowe et al. (1994), against NBS 19 (limestone) by Quay et al. (1999) and against RM 8563 (CO 2 ) by Sperlich et al. (2012).Dumke et al. (1989) calibrated against the natural gas mixtures NGS 1, NGS 2 and NGS 3, which were not of the highest purity level with 81, 53 and 99 % CH 4 respectively (e.g.IAEA, 2003;Brand et al., 2014).It is furthermore important to understand the variation of uncertainties of the applied CRMs, ranging from assigned values of 0.00 ‰ (NBS 19, the only primary measurement standard for VPDB) up to 0.56 ‰ (NGS 2) (Brand et al., 2014).The situation becomes even more complicated because the δ 13 C values of some of the applied CRMs were revised and changed by as much as 0.4 ‰ over time (e.g.NBS 21; Brand et al., 2014).As a consequence, this would require the adjustment of dependent δ 13 C-CH 4 data.The use of different calibration methods, CRMs and the change of their assigned δ 13 C values have undoubtedly contributed to calibration offsets between laboratories.This fact highlights the importance that applied CRMs and their δ 13 C values are reported in the metadata of the measurement results and that their uncertainty is included in the uncertainty budget of the measurements.Fortunately the situation is more homogenous for δ 2 H-CH 4 calibrations, which were only made against CRM waters, such as VSMOW2, SLAP2 or their precursors (e.g.Schiegl and Vogel, 1970;Dumke et al., 1989;Quay et al., 1999;Sperlich et al., 2012).Brand et al. (2014) provide a comprehensive overview on the variation of δ 2 H-H 2 O values and associated uncertainties.Another common method for laboratories to anchor CH 4 measurements to the VPDB or VSMOW isotope scales is to get their working standard (WS) calibrated by an external laboratory (e.g.Behrens et al., 2008;Brass and Röckmann, 2010;Bock et al., 2014;Schmitt et al., 2014;Rella et al., 2015;Brand et al., 2016).It is important to keep in mind that propagating isotope scales between laboratories also requires inclusion and propagation of the uncertainty of the respective isotope scale anchor.
In summary, the absence of unique CRMs for δ 2 H-CH 4 and δ 13 C-CH 4 led to a diversity of calibration trajectories.Significant calibration offsets between laboratories on the order of 0.05-0.09‰ for δ 13 C-CH 4 were identified through co-located measurements by Levin et al. (2012) and Schaefer et al. (2016), while Bock et al. (2014) reported laboratory offsets of up to 15 ‰ for δ 2 H-CH 4 .Even though interlaboratory differences can be established experimentally, e.g. by co-located measurements or regular round robins, such comparisons are not intended to re-define local scale anchors to the VPDB and VSMOW isotope scales (WMO, 2014) and can therefore not replace a unifying scale anchor.
Until recently, a comparable problem existed for observations of isotope ratios in atmospheric CO 2 .Ghosh et al. (2005) established a method to produce synthetic CO 2in-air standards, comprising of isotopically calibrated CO 2 and CO 2 -free air.The concept of these CO 2 -in-air standards is to provide a matrix reference material (m-RM), which is defined as RM that is mixed with matrix material to match the composition of the samples (IAEA, 2003).Since 2005, the ISOLAB of the Max Planck Institute for Biogeochemistry (MPI-BGC) in Jena, Germany, distributes a suite of m-RMs, known as JRAS (Jena Reference Air Set), which is accepted as an isotope scale anchor by the community (WMO, 2012).Calibrating against the JRAS reduces laboratory offsets and has proven a successful method to reach and maintain the compatibility goal for isotope ratios in atmospheric CO 2 (Wendeberg et al., 2013).
This paper describes an analogue method to produce synthetic CH 4 -in-air standards for δ 2 H-CH 4 and δ 13 C-CH 4 , which we refer to as JRAS-M16 (short for JRAS-Methane 2016).We present new methods to calibrate a suite of isotopically different CH 4 gases, which span over a large isotopic range.We calibrate two CH 4 gases for δ 2 H and δ 13 C and compare our results to independent calibrations made at a partnering laboratory to demonstrate the comparability of our new methods, thereby fulfilling the requirement to use two independent analytical methods during the development of quality control materials (QCMs) when CRMs are not available (IAEA, 2003).We produce synthetic CH 4 -in-air standards by diluting aliquots of calibrated CH 4 with CH 4free synthetic air and include the full traceability chain in the uncertainty budget.Calibrated δ 2 H-CH 4 and δ 13 C-CH 4 values in our synthetic CH 4 -in-air standards bracket tropospheric values and enable two-point calibrations to account for scale compression effects (Coplen et al., 2006a).Our synthetic CH 4 -in-air standards can be tested by other laboratories in the community; alternatively, compressed air cylinders from other laboratories can be calibrated at MPI-BGC.Our long-term strategy is to establish JRAS-M16 as m-RM for δ 2 H-CH 4 and δ 13 C-CH 4 in the future.We hope that our efforts help the community to reach the scale anchor compat-

Materials and methods
Throughout this paper, we use the terminology of "calibration" and "measurement" with different intentions.We use calibration when samples are repeatedly compared against measurement standards of the highest possible hierarchy level (possible hierarchy levels include CRMs and WSs) in order to determine the isotopic composition of the analyte under consideration of the full traceability chain.In contrast, we use the measurement term when the analysis is not necessarily based on measurement standards of highest possible hierarchy level, when the achievable uncertainty of the analysis is not of primary importance or when the uncertainty does not necessarily include the full traceability chain.For example, we use the measurement term for the experiments to establish the dependence of isotope ratios in the analyte on reactor temperatures of the analytical system.
The aim of our method is to calibrate and prepare synthetic CH 4 -in-air standards, as outlined in the flow diagram of Fig. 1.Therefore, we calibrate two pure CH 4 gases for their δ 2 H-CH 4 and δ 13 C-CH 4 isotope ratios against CRMs and WSs, where the latter are of comparable chemical composition to the former.We refer to these two CH 4 gases as primary CH 4 gases.The primary CH 4 gases are then used to calibrate a suite of pure CH 4 gases, which we refer to as secondary CH 4 gases.The analytical methods we developed for δ 2 H-CH 4 and δ 13 C-CH 4 calibrations are based on well-established IRMS methods, thereby complying with the requirements to use established analytical systems for the production of QCMs when CRMs are not available (IAEA, 2003).Once calibrated, aliquots of both primary and secondary CH 4 gases are diluted with CH 4 -free air to atmospheric CH 4 mole fractions.We analyse the resulting synthetic CH 4 -in-air standards on a new analytical system that is designed to analyse atmospheric samples, thereby complying with the principle of identical treatment (PIT; Werner and Brand, 2001) during the analysis of the synthetic CH 4 -in-air standards.This enables us to determine the calibration difference between JRAS-M16 and the hitherto adopted method to reference δ 2 H-CH 4 and δ 13 C-CH 4 in atmospheric samples to the VSMOW and VPDB scales respectively.This difference represents the laboratory specific correction that has to be applied to anchor all measurements from MPI-BGC to the new JRAS-M16 scale.

Gases, reference materials and hierarchy levels of calibrations
Our study is based on a suite of CH 4 gases that differ in their methanogenic origin and therefore in their isotopic composition.We identify our CH 4 gases by names as shown in Table 1."Biogenic" and "Fossil" have been calibrated at the Centre for Ice and Climate (CIC), which is a department of the Niels Bohr Institute at the University in Copenhagen, Denmark (Sperlich et al., 2012).These gases allow testing and evaluating the performance of the analytical systems at MPI-BGC with independent methods, which is a required control mechanism for the development of QCMs when CRMs are not available (IAEA, 2003).Six other CH 4 gases were purchased from suppliers of commercial gases or laboratory equipment (Air-Liquide, Westfalen AG, Linde, Messer, Campro Scientific) and were used as purchased or as mixtures thereof.The purity level of all our CH 4 gases is ≥ 99.995 %.Our goals were to produce (i) a suite of CH 4 gases that encompasses the isotopic composition of tropospheric CH 4 , and (ii) CH 4 gases that closely match the isotopic composition of tropospheric CH 4 .For δ 2 H-CH 4 this was achieved by spiking fossil CH 4 gases with pure CH 3 D to yield "Martha-1", "Martha-2" and "Mike-1".Mike-1 was then mixed with a fossil CH 4 gas to produce "Mike-2" while Martha-1 was spiked with pure CH 3 D to produce "Martha-2".Martha-1 and Mike-1 were thereby transitional CH 4 mixtures.
We calibrated "Megan" and "Merlin" for δ 2 H-CH 4 and δ 13 C-CH 4 as primary CH 4 gases (Fig. 1) against CRMs and WSs to the VSMOW and VPDB isotope scales respectively.Applied WSs are identical or similar in chemical composition to available CRMs in most cases (Table 2).All secondary CH 4 gases were calibrated against the primary CH 4 gases and are therefore of lower hierarchy level in the calibration scheme (Fig. 1).Megan was used as primary CH 4 gas for all initial experiments and our first calibrations of secondary CH 4 gases, until it was accidentally vented to ambient in March 2015.In order to compensate for the loss, we calibrated Merlin against CRMs and WSs as primary CH 4 in replacement of Megan.
2.2 Referencing pure CH 4 for δ 2 H against VSMOW/SLAP and against other pure CH 4 gases We use a high-temperature conversion elemental analyser (TC/EA) coupled to an isotope ratio mass spectrometer (IRMS; Delta Plus XL, Thermo Finnigan, Bremen, Germany) via an open split (ConFlo III, Thermo Finnigan, Bremen, Germany).The system at MPI-BGC is operated for δ 2 H-H 2 O and δ 18 O-H 2 O analysis with high precision and negligible systematic errors since more than a decade (Gehre et al., 2004;Brand et al., 2009a) and is depicted in Fig. 2. Because TC/EA-IRMS systems are also used for δ 2 H analysis in hydrocarbons (e.g.Hilkert et al., 1999;Schimmelmann et al., 2016) It is recommended to measure samples against standards with identical material-specific properties (PIT; Werner and Brand, 2001).Under such conditions, measurement artefacts are likely to cancel when, for example, H 2 O samples are calibrated against H 2 O standards.However, great care has to be taken when chemically identical or similar CRMs are not available so that sample and standard comprise materials with different chemical properties, which is the case when P. Sperlich et al.: Development and evaluation of a suite of isotope reference gases calibrating CH 4 against H 2 O. Calibration errors may arise when only one or both materials are fractionated during analysis, where the latter is likely to occur with different fractionation factors.
We performed a range of experiments to test for systematic errors during H 2 O and CH 4 analysis.(i) System memory occurs during the isotopic analysis of H 2 O due to adhesion of H 2 O onto internal surfaces.System memory is sufficiently minimised by repeated H 2 O injections and rejection of the first sample in every sequence.Remaining memory effects are corrected for in the evaluation routine as shown by Gehre et al. (2004).System memory is not created by CH 4 injections but some δ 2 H-CH 4 analyses may be affected by desorption of H 2 O, stemming from previous injections.(ii) We observe a systematic effect of the septum temperature on the resulting δ 2 H-H 2 O and operate the system with a septum temperature of 130 The introduction of H 2 samples into the ion source of an IRMS leads to the formation of H + 3 ions that are registered on the HD + detector, which is accounted for by the so called "H 3 -factor correction" (Friedman, 1953;Sessions et al., 2001).The H 3 -factor correction is experimentally determined and assumed to be constant until re-determined.Determining the H 3 -factor correction is part of the daily preparation routine at MPI-BGC and shows only minor variation with time.Theoretically, the H + 3 formation could be dynamic during the experimental period with unknown variability.We matched the H 2 peak heights resulting from both CH 4 and H 2 O injections around 5.5 ± 0.5 V in order to minimise the impact of imperfect H 3 -factor correction.Peak widths ranged around 45 and 60 s for H 2 O-and CH 4 -derived H 2 peaks respectively.A typical chromatogram of the δ 2 H-CH 4 calibration including details on peak shape and background is shown in Fig. 3.The similarity between CH 4 -derived and the H 2 O-derived H 2 peaks allows the use of the standard integration software (ISODAT, Thermo Finnigan, Bremen, Germany).
Megan and Merlin (Table 1) were calibrated in three independent sequences during 3 days against the in-house working standards "WWW-J1" and "BGP-J1" with a wide δ 2 H range from −67.0 to −187.1 ‰ (Table 2).WWW-J1 and BGP-J1 are independently calibrated against international reference waters VSMOW2 and SLAP2 (Table 2).Other CH 4 gases were initially also measured against working standards (WWW-J1 and BGP-J1) but were finally calibrated against Megan or Merlin, which were co-analysed in the same measurement sequence in a one-point calibration.2.3 Referencing pure CH 4 for δ 13 C against LSVEC/MAR-J1 and against other pure CH 4 gases We calibrated δ 13 C-CH 4 in pure CH 4 gases after conversion to CO 2 using an elemental analyser (EA 1100, CE, Rodano, Italy) coupled to an IRMS (Delta Plus, Thermo Finnigan, Bremen, Germany) via open split (ConFlo III, Thermo Finnigan, Bremen, Germany).This system is routinely used for the analysis of 13 C and 15 N in samples with solid or liquid matrices (Werner et al., 1999;Brooks et al., 2003).We fitted a 1/16 in.tube of 70 / 30 % Cu / Ni alloy to the EA and used the previously described 10-port valve to inject the CH 4 samples into the EA with a 10 mL min −1 helium flow (Fig. 4).
The plumbing of the system is designed so that gaseous CH 4 and solid CRMs/WSs are applied to the same location inside the combustion reactor of the EA.All samples are combusted at a reactor temperature of 1020 • C (Werner et al., 1999) and experience identical analytical treatment thereafter.Following the combustion, each sample passes through a reduction reactor filled with elemental copper, which is kept at 650 • C to remove excess O 2 and to reduce NO x if present.The sample is dried by passing through a Nafion ™ membrane (Perma Pure LLC, Toms River, NJ, USA; not shown in  Measurement sequences to calibrate primary CH 4 gases to the VPDB isotope scale are created by alternating blocks of manual CH 4 injections and CRM/WS (Table 2) applications via autosampler.We applied one WS and one CRM (LSVEC) to calibrate the primary CH 4 gases in a two-point calibration.While MAR-J1 was used as WS in most experiments, ALI-J1 was used once, during a calibration of Merlin.Megan and Merlin were each calibrated on 3 different days to determine the external reproducibility of the δ 13 C results.Chromatograms resulting from CH 4 and from carbonate analyses using EA-IRMS are displayed in Fig. 5 and show very similar peak shapes for CH 4 and carbonates.Typical m/z 44 amplitudes and peak widths were ∼ 7.4 ± 0.2 V and 101 ± 1 s for both materials respectively.We connected a primary CH 4 and a secondary CH 4 gas to the 10-port valve to calibrate the secondary CH 4 gases (Table 1) for δ 13 C in a one-point calibration.All measurement results were corrected for scale compression based on the method suggested in Verkouteren and Klinedinst (2004), using an empirical, mass spectrometer specific correction factor of 1.0056.

Measurement uncertainty and error propagation
The fully propagated uncertainty for the primary CH 4 gases (U pCH4−tot ) is calculated as where u CRM , u WS and u pCH4 indicate the uncertainty of the CRM, the applied working standards and the respective primary CH 4 gas respectively.Both u WS and u pCH4 are calculated as the standard error of the mean of all measurements, multiplied by t, Student's factor for a 95 % confidence limit to account for the limited number of measurements.
The uncertainty for the secondary CH 4 gases (U sCH4−tot ) is then calculated as where u sCH4 is the standard error of the mean of all measurements of the respective secondary CH 4 gas, multiplied by t, Student's factor for a 95 % confidence limit.Therefore, U pCH4−tot and U sCH4−tot indicate the fully propagated uncertainty onto the VPDB or VSMOW isotope scales, representing the traceability chain.Note that the isotopic composition of LSVEC (Table 2) was recently found to show significant variability, most likely due to adhesion of H 2 O and reaction with air-CO 2 (e.g.Qi et al., 2016;Schimmelmann et al., 2016).Until this problem is solved, the IAEA, one of the providers of LSVEC, advised to increase the uncertainty of LSVEC, which was hitherto assigned to 0.00 ‰.We follow the recommendation by S. Assonov (Sergey Assonov, IAEA, personal communication, 2016) and Schimmelmann et al. (2016) and adopt an uncertainty of 0.15 ‰ for the δ 13 C of LSVEC.Note that the new 0.15 ‰ uncertainty of LSVEC represents the largest single contributor to the total uncertainty budget in our δ 13 C calibrations.As a consequence we present the combined uncertainty of the full traceability chain in two versions, the first being the hitherto adopted method using an uncertainty of 0.00 ‰ for LSVEC and the second being the method with uncertainty for LSVEC of 0.15 ‰.

Producing synthetic CH 4 -in-air standards from
pure CH 4 and CH 4 -free air (JRAS-M16) The MPI-BGC operates an analytical system (named ARAMIS) to dilute pure CO 2 with CO 2 -free air to atmospheric CO 2 mole fraction without isotopic fractionation (Ghosh et al., 2005).We use ARAMIS to dilute an aliquot of primary or secondary CH 4 with CH 4 -free air to atmospheric CH 4 mole fractions (∼ 2 ppm) in 5 L glass flasks with a final filling pressure of 1.8 bar absolute.The produced synthetic CH 4 -in-air standards represent the JRAS-M16 reference gases.The CH 4 -free matrix air has been target-mixed from ultra-pure constituents and contains N 2 , O 2 , N 2 O and Kr at atmospheric levels, so that the composition of the produced CH 4 -in-air standards is as close to ambient air as possible.Krypton was added to this matrix air to account for the measurement artefact during GC-IRMS analysis of CH 4 for δ 13 C (Schmitt et al., 2013).A blank analysis of the CH 4free air yielded a maximum CH 4 blank of 0.5 ppb.Because such a CH 4 blank is too small for accurate isotopic analysis on our atmospheric system (Sect.(Schmitt et al., 2013).MPI-BGC: a new system to measure δ 2 H-CH 4 and δ 13 C-CH 4 in-air samples was recently developed at the MPI-BGC and is described in greater detail in Brand et al. (2016).The system at MPI-BGC is referred to as iSAAC, in abbreviation for integrated System for Analysis of Atmospheric Constituents.iSAAC consists of a 16-port sample carousel to take two consecutive 100 mL aliquots of air from a glass flask or high-pressure cylinders for parallel analysis of δ 2 H-CH 4 and δ 13 C-CH 4 , respectively, by continuous-flow GC-IRMS.The two air samples are routed through two identical but independent pre-concentration lines, one for the analysis of δ 2 H-CH 4 and one for δ 13 C-CH 4 .In each line, CH 4 is cryogenically separated from the main air constituents in a Hayesep D-filled trap at −130 • C and cryo-focussed in a further Hayesep D-filled trap at −110 • C. Each of the two analytical lines is equipped with its own cooling compressor to avoid the use of cryogenic liquids.The separated and cryofocussed CH 4 sample is released into a GC column from where it is routed either through a pyrolysis furnace (kept at 1400 • C) to convert the CH 4 sample to H 2 for δ 2 H-CH 4 analysis or through a combustion furnace (kept at 1000 • C) to convert the CH 4 sample to CO 2 for δ 13 C-CH 4 analysis.A post-combustion GC column separates the CH 4 -derived CO 2 from Kr (Schmitt et al., 2013).CH 4 -derived H 2 and CO 2 samples are introduced via open splits into dedicated IRMS instruments, one each for δ 2 H-CH 4 and δ 13 C-CH 4 analysis.iSAAC has been operational since 2012 to measure air samples with a precision of 1.0 and 0.12 ‰ for δ 2 H-CH 4 and δ 13 C-CH 4 respectively.The precision is determined by the performance chart method (Werner and Brand, 2001), determined by the standard deviation (1σ ) of all quality control standard measurements, which has been analysed once in every measurement sequence (Brand et al., 2016).The reproducibility of δ 13 C-CH 4 analyses ranges around 0.06 ‰ over the course of 1 day.All measurements on iSAAC so far have been allocated to the VPDB and VSMOW scales using an inhouse WS that was calibrated against "Carina-1" (Table 1).

2.7
Histories to anchor δ 2 H-CH 4 and δ 13 C-CH 4 to the VSMOW and VPDB scales at IMAU and MPI-BGC It is the intention of all laboratories analysing δ 2 H-CH 4 and δ 13 C-CH 4 to reference their samples relative to the VSMOW and VPDB scales respectively.However, possible accuracy errors in the laboratory specific scale anchors often result in inter-laboratory offsets.In order to retrace the potential for calibration offsets between IMAU, MPI-BGC and JRAS-M16, we describe the history of the scale anchors for each laboratory.IMAU: the calibration strategy at IMAU, including traceability chain and long-term control, is different for δ 2 H-CH 4 and δ 13 C-CH 4 (Brass and Röckmann, 2010).(i) Three synthetic gas mixtures with CH 4 mole fractions of ∼ 9000 ppm were calibrated for δ 2 H-CH 4 at the Max Planck Institute for Chemistry (MPI-C) in Mainz, Germany, using a tunable diode laser absorption spectrometer (TDLAS) technique.The TDLAS is described by Bergamaschi et al. (1994) with a measurement precision for δ 2 H-CH 4 of 5.1 ‰ and an accuracy estimate of similar magnitude.The accuracy estimate is based on a comparison with the calibrations to the VSMOW scale by Dumke et al. (1989), which marks the origin of the isotope scale anchor for δ 2 H-CH 4 at IMAU.Aliquots of the gases from Bergamaschi et al. (1994) were diluted with synthetic CH 4 -free air at IMAU to yield reference gases ("Cal1", "Cal2", "Cal3") with the δ 2 H-CH 4 values initially assigned at MPI-C and atmospheric CH 4 levels.Improved measurement precision and inter-laboratory comparisons lead to a δ 2 H-CH 4 refinement in Cal1, Cal2 and Cal3 with recent values of +21.1, −19.0 and −164.9 ‰ respectively.Cal1, Cal2 and Cal3 represent the primary reference gases for δ 2 H-CH 4 at IMAU and were used to calibrate the δ 2 H-CH 4 in the working standard ("SiL") to the VSMOW scale.While Cal2 and Cal3 have become exhausted, Cal1 is still used in regular checks of the calibration scale, together with a set of firn air samples (see ii) that are used for δ 13 C calibration.(ii) IMAU's working gas SiL has also been calibrated for δ 13 C-CH 4 .This was achieved by co-analysing SiL with a suite of Antarctic firn gas samples, where the δ 13 C-CH 4 of the latter had been determined by two laboratories (MPI-C and the Laboratoire de Géologie et Géophysique de l'Environnement (LGGE), Grenoble, France), using two different techniques (Bräunlich et al., 2001).The δ 13 C-CH 4 scale anchors at LGGE and MPI-C are calibrated at MPI-C against a pure CO 2 WS, which itself has been calibrated against NBS 19 (Bergamaschi et al., 2000), which represents the ultimate link to the VPDB scale for the scale anchor at IMAU.Using that method, the suite of firn gas samples was treated as a set of working standards to calibrate SiL to the VPDB scale by propagation from MPI-C and LGGE to IMAU.It is important to note that Brass and Röckmann (2010) highlighted that the firn gas itself is a set of samples and not to be taken for a set of calibration stan-dards.The calibration strategy was revised during 2013 to account for the Kr interference (Schmitt et al., 2013).
MPI-BGC: all measurements on iSAAC use a natural air WS that was calibrated against Carina-1 at MPI-BGC.Carina-1 and Carina-2 are natural air samples that were calibrated for δ 2 H-CH 4 and δ 13 C-CH 4 at IMAU (Table 2), using the analytical setup described by Brass and Röckmann (2010) and Sapart et al. (2011).While the calibration results of Carina-1 and Carina-2 from IMAU show excellent agreement in CH 4 mole fractions (both 1910 ppb), in δ 13 C-CH 4 (within 0.01 ‰), their δ 2 H-CH 4 values differed by 2.8 ‰ (Table 2).Because both Carina cylinders were filled at the MPI-BGC with Jena air on the same day within a short period of time during stable meteorological conditions, and because their δ 13 C-CH 4 and CH 4 mole fractions are in excellent agreement, a true difference in δ 2 H-CH 4 between the two Carina cylinders seems unlikely.The magnitude of the δ 2 H-CH 4 offset was smaller than the former δ 2 H-CH 4 measurement precision at IMAU of ±4 ‰ (Brass and Röckmann, 2010) and was accepted as "agreement within measurement uncertainty" at the time.It is important to note that Carina-1 and Carina-2 were each calibrated on different days and in separate measurement sequences, which does not enable a direct comparison of the two gases.Therefore, a systematic calibration error in one of the two Carina gases is possible.In contrast, the superior measurement precision of iSAAC for δ 2 H-CH 4 of 1.0 ‰ can resolve a true δ 2 H-CH 4 difference of 2.8 ‰.However, both Carina-1 and Carina-2 appear indistinguishable in δ 2 H-CH 4 on iSAAC, as determined during several direct comparisons in independent measurement sequences.Therefore, the δ 2 H-CH 4 offset between Carina-1 and Carina-2 must be due to an artefact of the calibration at IMAU.Our experiments at MPI-BGC indicate that the calibration of Carina-1 is indeed flawed.The choice to use Carina-1 as scale anchor for all iSAAC measurements at MPI-BGC was made arbitrarily, before it was known that it's calibration was impacted by an artefact.In hindsight, Carina-2 would have been a better choice as VSMOW scale anchor for δ 2 H-CH 4 at MPI-BGC.This calibration offset will be furthermore addressed a future comparison with IMAU, where a new system has been developed with an improved precision in for δ 2 H-CH 4 (Röckmann et al., 2016).All iSAAC measurements are anchored to the VSMOW and VPDB isotope scales based on the described scale propagation from IMAU to MPI-BGC, until JRAS-M16 is established as new m-RM for δ 2 H-CH 4 and δ 13 C-CH 4 in air.

Comparison of the existing isotope scales at MPI-BGC with new, synthetic CH 4 -in-air standards
The synthetic CH 4 -in-air standards produced in this study (Sect.2.5) were analysed at MPI-BGC using iSAAC (Sect.2.6).In that, the synthetic CH 4 -in-air standards  2012) presented the data with the measurement reproducibility, calculated as the pooled standard deviation of the measurements.Therefore, their uncertainty does not include the uncertainties of the full traceability chain.Furthermore, a statistical provision that accounts for the small number of measurements has not been made by Sperlich et al., (2012).This imposes a hurdle in the comparison with data from MPI-BGC.Therefore, we revise the uncertainty of the CIC data and calculate the full traceability chain as described in Sect.2.4.Furthermore, all δ 13 C measurements from CIC are affected by a small offset of RM 8563 that has been reported by Coplen et al. (2006b) and are therefore shifted by 0.03 ‰ towards more depleted δ C values.Moreover, the δ 13 C data presented in Sperlich et al. (2012) have not been corrected for scale compression.We are able to correct all CIC data for this effect, because the scale compression factor of the instrument at CIC has been determined (1.0025) at the time the study of Sperlich et al. (2012) was published.Applying the scale compression correction shifts the δ 13 C-CH 4 of Fossil and Biogenic by 0.01 and 0.05 ‰ towards more depleted δ 13 C values respectively.The revised data and uncertainties from CIC and the results from MPI-BGC for Biogenic and Fossil are shown in Table 4 for δ 13 C-CH 4 and in Table 5 for δ 2 H-CH 4 .
We perform two comparisons between CIC and MPI-BGC.(i) The calibration results for Fossil and Biogenic from CIC as published in Sperlich et al. (2012) are compared to the calibrations at MPI-BGC using the methods to calibrate pure CH 4 gases for δ 2 H-CH 4 and δ 13 C-CH 4 as described in Sects.2.2 and 2.3.(ii) We performed new combustion experiments at CIC using Fossil and Biogenic and analysed the resulting CO 2 for δ 13 C at both CIC and MPI-BGC.These combustion experiments were made in 2012 but after the publication of Sperlich et al. (2012).Therefore, these experiments provide new data to evaluate the method at CIC.Following the δ 13 C analyses at CIC, the remaining CO 2 gases were cryogenically transferred and flame sealed in glass ampules for δ 13 C analysis at MPI-BGC.The δ 13 C analyses at MPI-BGC were made on "Cora", a MAT 252 dual-inlet IRMS (Thermo Finnigan, Bremen, Germany) that is used for δ 13 C and δ 18 O analysis of CO 2 in air or pure CO 2 gases (Brand et al., 2009b).Unfortunately, the comparison based on the new combustion experiments made at CIC could not include δ 2 H-CH 4 because the system was not capable to process CH 4 samples large enough to provide sufficient amounts of H 2 O.
We use the indices CIC−old for experiments made at CIC and published by Sperlich et al. (2012) and CIC−new for the new combustion experiments at CIC.We use the index MPI−BGC * for the analysis at MPI-BGC of CO 2 samples that were combusted at CIC and MPI−BGC for the calibrations of the two CH 4 gases from CIC using the analytical methods at MPI-BGC presented above (Sect.2.2 and 2.3).Quay et al., 1999;Mikaloff Fletcher et al., 2004).The δ 13 C-CH 4 uncertainty in Megan and Merlin increases to 0.16 and 0.15 ‰, respectively, when the suggested uncertainty of 0.15 ‰ for LSVEC is taken into account in the traceability chain (Qi et al., 2016;Schimmelmann et al., 2016).However, we will use the uncertainty budget without the new uncertainty for LSVEC for the evaluation of internal results.

Results for secondary CH 4 gas calibrations against primary CH 4 gases
We made a total of 260 calibration measurements for the secondary CH 4 gases for δ 2 H-CH 4 and δ 13 C-CH 4 .Altogether, the secondary CH 4 gases cover a large range in δ 2 H (−320 to +36 ‰) and δ 13 C (−70 to −39 ‰), where the former was achieved by spiking some of the gases with pure CH 3 D.The results for secondary CH 4 gas calibrations are shown in Table 3, including the uncertainties of the full traceability chain.We found typical uncertainties on the order of 0.8 ‰ for δ 2 H-CH 4 calibrations and on the order of 0.07 and 0.17 ‰ for δ 13 C-CH 4 calibrations, where the latter includes the uncertainty of 0.15 ‰ in LSVEC.

Results from the comparison between CIC and MPI-BGC
Our comparison results for δ 13 C-CH 4 show overall agreement within the uncertainties of the traceability chains (Table 4).The δ 13 C results from the previous and the new com- for Fossil and Biogenic, respectively, which is within the uncertainty of the full traceability chain and furthermore within the system reproducibility as stated in Sperlich et al. (2012).The δ 13 C differences between the results from the new combustion experiments measured at MPI-BGC * and at CIC (δ MPI−BGC *δ CIC−new ) are 0.10 ‰ for Fossil and 0.11 ‰ for Biogenic, respectively, and agree well within the combined uncertainty of both methods.Table 4 shows even better agreement for δ MPI−BGC * -δ CIC−old .Altogether, the comparisons highlight the reproducibility of CH 4 combustion experiments at CIC and the comparability of δ 13 C measurements in the combustion-derived CO 2 at both laboratories.When comparing the δ 13 C results from the new calibrations at MPI-BGC to the results based on combustion experiments at CIC, the results from MPI-BGC appear slightly more depleted in δ 13 C for both Fossil and Biogenic in all comparisons (Table 4).We find the smallest δ 13 C differences between δ MPI−BGC and δ CIC−new , accounting for −0.08 and −0.09 ‰ for Fossil and Biogenic respectively.The respective differences increase to −0.18 and −0.20 ‰ between δ MPI−BGC and δ MPI−BGC * .It is important to note that only the difference found in Biogenic between δ MPI−BGC and δ MPI−BGC * is outside of the sum of the uncertainties (Table 4).In contrast, we find excellent agreement in all comparisons when the uncertainty of 0.15 ‰ in LSVEC is taken into account.Table 4 also shows excellent agreement in the determination of the differences between Fossil and Biogenic in all δ 13 C measurements, which is an important quantity for the evaluation of scale compression.
Comparing the results for δ 2 H-CH 4 between CIC and MPI-BGC shows overall agreement (Table 5).The differences we find in the δ 2 H-CH 4 calibrations between Sperlich et al. (2012) and MPI-BGC (δ MPI−BGC -δ CIC−old ) are −1.8 and −2.4 ‰ for Fossil and Biogenic respectively.Albeit it is slightly larger than the sum of the uncertainties of the measurements at CIC and MPI-BGC for Biogenic, the difference in Fossil is just within the uncertainties of the two methods.Note that the isotopic difference Fossil−Biogenic is homogenously resolved with 147.9 ‰ at MPI-BGC and 147.3 ‰ at CIC respectively.
3.4 Results of δ 2 H-CH 4 and δ 13 C-CH 4 measurements in synthetic CH 4 -in-air standards to determine compatibility between the propagated isotope scale from IMAU and JRAS-M16 at MPI-BGC The isotopic difference (δ iSAAC -δ pure ) is shown in Table 6 and indicates the offset between the scale that was propagated from IMAU to MPI-BGC (Sect.2.8) and the new synthetic CH 4 -in-air standards (JRAS-M16), assuming no isotope fractionation during the dilution process.Our experiments show excellent agreement for δ 13 C-CH 4 with an average difference of +0.03 ± 0.10 ‰, thus confirming that the propagated scale from IMAU was already very close to the newly determined scale anchor for δ 13 C-CH 4 .For unknown reasons, the δ 13 C-CH 4 measurements of Melly, Fossil and Biogenic show a larger discrepancy between the two methods.Because the discrepancy for Biogenic exceeds the measurement uncertainty by a factor of 3, we have excluded this result from the determination of the laboratory offset.(Coplen et al., 2006b) and scale compression effects.They are furthermore presented with revised uncertainties to include the full traceability chain (Sect.2.4).The uncertainties of the full traceability chains with the recently suggested uncertainty in LSVEC of 0.15 ‰ are shown in brackets.The δ 13 C MPI−BGC * measurements used a system that is virtually unaffected by scale compression (Ghosh et al., 2005) and a WS calibration that is based on NBS 19 as the only CRM; therefore, the δ 13 C MPI−BGC * data do not suffer from the uncertainty in LSVEC.The difference Fossil−Biogenic can be used to compare scale compression effects between the respective methods.
Gas name  , 2003), which we discuss in the following.Quantitative oxidation of CH 4 during δ 13 C-CH 4 analysis requires high reaction temperatures (e.g.Dumke et al., 1989).A major complication during δ 13 C-CH 4 analysis arises when oxidation yields are significantly lower than 100 % (Merritt et al., 1995;Fig. 4 in Sperlich et al., 2012).CH 4 is a potent source of protonation in the IRMS ion source (Anicich, 1993).Introducing unconverted CH 4 together with the CH 4 -derived CO 2 sample into the IRMS results in the formation of CO 2 H + in the ion source, which produces an isobaric interference on the m/z 45 trace, where the δ 13 C signal is measured.This artefact can be prevented when CO 2 and CH 4 are separated after the oxidation, which we achieve with the post-combustion chromatographic column in both the EA-IRMS system (Sect.2.3) and iSAAC (Sect.2.6).Note how this effect would cause an accuracy shift towards more enriched δ 13 C-CH 4 values predominantly during primary CH 4 gas calibrations, because CH 4 samples would be affected by CO 2 H + formation in the ion source while the analysis of the used CRMs would not.
We carefully checked the completeness of CH 4 conversion (EA-IRMS and TC/EA-IRMS) by monitoring for residual CH 4 with the IRMS instruments.In the ion source, CH 4 molecules are subject to fragmentation and re-combination processes, resulting in CH 4 -typical mass spectra during mass abundance scans in the IRMS (Brunnée and Voshage, 1964).The strongest CH 4 -specific signal occurs on the m/z 15 trace (CH + 3 ), which makes the m/z 15 signal a good indicator for incomplete CH 4 conversion (Sperlich et al., 2012).The CH + 4 signal at m/z 16 is not suitable for CH 4 quantification due to the interference with the O + signal from CO + 2 fragmentation.We tune the m/z 44 collector of the IRMS to monitor the m/z 15 trace during the analysis of a CH 4 sample and find an amplitude of 0.12 mV.From Sperlich et al. (2012) we estimate that about 40 % of the total CH 4 signal in a mass abundance scan is recorded on m/z 15.The total CH 4 signal in the mass abundance scan would therefore amount to ∼ 0.3 mV, which we can compare to the ∼ 7000 mV on m/z 44 from a typical CH 4 injection into the EA-IRMS (e.g.Fig. 5).This approximation suggests a CH 4 oxidation efficiency of > 99.9 %.An analogue experiment on the TC/EA-IRMS system (Sect.2.2) shows a conversion efficiency of CH 4 of > 99.9 % as well.Because the ionisation energy of CH 4 is comparable to that of both CO 2 and H 2 , we can ignore this effect in the above determinations.Therefore, we conclude that the CH 4 conversion at MPI-BGC is complete and that we can rule out incomplete conversion as source for measurement errors.
It has been demonstrated that the introduction of carbonates into the high-temperature oxidation furnace of the EA-IRMS yields a high CO 2 conversion rate and δ 13 C results of high precision and accuracy (Coplen et al., 2006b).In order to test for the completeness of carbonate digestion, we added tungsten trioxide (WO 3 ) to some of the carbonate samples during weighing (about 1 : 1 by weight).The goal of this experiment is to increase the instantaneous reaction temperature and to provide additional oxygen during the liberation of CO 2 from different carbonates.While the addition of WO 3 had no effect on the analysis of CaCO 3 and Li 2 CO 3 , it improved the peak shape during BaCO 3 analysis (Table 2).
Table 5.Comparison of δ 2 H results between CIC and MPI-BGC.Indices of the header are explained in Sect.2.9 of the main text.The uncertainty of all data includes the full traceability chain (Sect.2.4), which includes revised uncertainties of the CIC data (Sect.2.9).The difference Fossil − Biogenic allows us to compare scale compression effects between both methods.

Gas name
Fossil −170.1 ± 0.9 −171.9 ± 0.9 Biogenic −317.4 ± 0.9 −319.8 ± 0.8 Fossil−Biogenic 147.3 147.9 However, it did not impact on its δ 13 C.We conclude that the carbonate digestion is not limited by either temperature or oxygen availability and omitted the addition of WO 3 in further reactions.Note that the accurate analysis of carbonates is critical for accurate CH 4 calibrations, even if CH 4 injections themselves are not compromised.
A considerable advantage of the conversion of carbonates in the high-temperature oxidation furnace of the EA-IRMS over other methods (e.g.acid reaction) is that the oxygen isotopic composition is homogenised for all samples.This balances the 17 O correction, which accounts for the isobaric interference between δ 13 C-CO 2 and δ 17 O-CO 2 on m/z 45.The 17 O correction is statistically dependent on the δ 18 O-CO 2 of each individual sample.Hence, any uncertainty arising from the 17 O correction during the calculation of δ 13 C values from m/z 45 ion currents tends to cancel out.The applied 17 O correction is a function built into the evaluation software of the IRMS.The algorithm and ratio assumptions are based on Assonov and Brenninkmeijer (2001).The same technique had been used to revise the VPDB scale by adding LSVEC as a second scaling point (Coplen et al., 2006b).
The EA-IRMS analysis of carbonates includes a wellcharacterised blank contribution that is due to the carbon impurities within the tin capsules that are used for carbonate analyses (Werner et al., 1999).In contrast, no such blank is expected when samples are analysed without tin capsules, as would be the case for gaseous CH 4 samples.While we did not observe a significant δ 13 C difference when tin capsules were added to CH 4 injections and the δ 13 C bias was subsequently corrected for or when the δ 13 C-CH 4 analysis was performed without tin capsules.We continuously added the tin capsules to each δ 13 C-CH 4 analysis and applied the routine blank correction to all measurements in compliance with the PIT between analyses of carbonate reference materials and CH 4 samples.
For δ 2 H analyses, we chose an analogue approach and process both H 2 O and CH 4 using the high-temperature reactor of the TC/EA-IRMS system.Possible artefacts can arise mainly from the stronger surface activities of H 2 O vs. CH 4 prior to the conversion to H 2 (and CO or carbon).H 2 O injections can lead to memory effects, which need to be taken into Table 6.Differences in δ 2 H-CH 4 and δ 13 C-CH 4 between primary/secondary CH 4 gas calibrations and iSAAC measurements of the synthetic CH 4 -in-air standards using the scale anchor based on Carina-1.Differences are calculated as δ iSAAC -δ pure .The bottom line shows the average and the standard deviation (1σ ) of considered differences, excluding the value of Biogenic (

Discussion of the comparison between CIC and MPI-BGC
We compare the results of δ 2 H-CH 4 and δ 13 C-CH 4 calibrations achieved by the two independent methods from CIC and MPI-BGC in Tables 4 and 5.Note that the verification of the principle calibration method (MPI-BGC) by an independent method (CIC) is required for the preparation of QCMs when CRMs are not available (IAEA, 2003).The comparison between CIC and MPI-BGC is to some degree representative of the situation of the community analysing atmospheric δ 2 H-CH 4 and δ 13 C-CH 4 without access to international reference air but locally produced or propagated standard gases.
Even though there is no significant difference between the intercomparison results for δ 13 C-CH 4 , and the difference in δ 2 H-CH 4 is rather small, there seems to be a systematic pattern that the samples combusted at CIC are generally more www.atmos-meas-tech.net/9/3717/2016/Atmos.Meas.Tech., 9, 3717-3737, 2016 enriched in both δ 2 H and δ 13 C (Tables 4 and 5).The cause for this offset is not yet fully understood but will be discussed in more detail.The δ 13 C-CH 4 calibrations presented in Table 4 were made on three different IRMS systems with three different working standards.All δ 13 C measurements were corrected for potential scale compression effects, except from the MPI-BGC * analyses, which were made on an IRMS system specifically tuned to render scale compression effects for δ 13 C, as demonstrated by Ghosh et al. (2005).Because the difference in δ 13 C between Fossil and Biogenic is remarkably well resolved in all comparison measurements (Table 4), we conclude that our δ 13 C comparison does not suffer from a significant scale compression error.Rather, the difference in δ 13 C between the methods seems related to the method of CH 4 conversion.In principle, incomplete CH 4 combustion in the experiments at CIC would create a δ 13 C pattern where the affected experiments appeared more enriched in δ 13 C.This is because the remaining CH 4 fraction in the combustion-derived CO 2 gas would be introduced into the dual-inlet IRMS together with the CO 2 , and form CO 2 H + ions, which creates an artefact on m/z 45 (Sect.4.1).However, we carefully tested every sample for residual CH 4 and are confident that the CH 4 combustions at CIC have been complete.Therefore, we cannot resolve this difference further.
We also observe a small δ 2 H-CH 4 offset between CIC and MPI-BGC.The δ 2 H measurements at CIC were made using combustion-derived H 2 O with two different methods (TC/EA-IRMS and CRDS).Moreover, the measurement procedures at CIC included WSs covering the full VSMOW/SLAP scale.In contast, the direct δ 2 H-CH 4 analysis of the secondary CH 4 gases at MPI-BGC was performed as a one-point calibration against Megan or Merlin with a δ 2 H-CH 4 similar to that of Fossil (Table 3).Please note that δ 2 H scale compression often arises during the analysis of H 2 O because it interacts with all sorts of surfaces in the analytical system.However, CH 4 gas behaves very much like pure H 2 in the high-temperature conversion system and a careful H + 3 -factor determination often results in accurate isotopic distances.If the control of scale compression at MPI-BGC was limited due to the one-point calibration, we would expect the isotopic difference between Biogenic and Fossil to be smaller in the results from MPI-BGC than CIC.However, this is clearly not the case.The isotopic difference between Biogenic and Fossil (δ Fossil -δ Biogenic ) appears to be very similar in the calibrations of both laboratories with 147.3 ‰ at CIC and 147.9 ‰ at MPI-BGC, even showing a slightly larger difference at MPI-BGC (Table 5).Therefore, we are confident that the observed, small δ 2 H offset is not caused by scale compression effects in one of the laboratories.Moreover, the excellent agreement between the experimentally controlled scale compression at CIC and the method at MPI-BGC proves that the analysis at MPI-BGC is free of significant scale compression artefacts over the tested isotopic range of ∼ 150 ‰.
The comparisons show small differences in the calibration results, but we found no evidence that either one of the two analytical methods is more accurate.Note that the difference in both δ 2 H-CH 4 and δ 13 C-CH 4 exceeds the compatibility goal of 1 and 0.02 ‰ by a factor of 2 to 10 respectively (WMO, 2014).We interpret the results of this comparison to reflect calibration differences between laboratories that are to be expected, when CRMs are not available.Finally, we conclude that our new method is as capable to calibrate CH 4 gases to the international isotope scales and that it is as accurate as the method presented by Sperlich et al. (2012).However, we think that our new methods are more suitable for the task to produce and maintain a suite of calibration gases for the following reasons.
-The methods at MPI-BGC are more time efficient than the method of Sperlich et al. (2012).While the new methods at MPI-BGC can be used to calibrate an entire suite of CH 4 gases within a relatively short time, the method of Sperlich et al. ( 2012) is capable of processing only one sample per day.
-The new MPI-BGC methods are based on continuousflow IRMS and follow the PIT to the highest possible degree.In comparison, the method of Sperlich et al. (2012) is based on the combustion of CH 4 in an offline reactor, which requires re-oxidation after every sample and partial dismantling of the system to retrieve the sample for isotopic analysis.Because the analytical system at CIC could theoretically be at a different state for every sample (oxidation state, air leak rate) and because the system at CIC does not allow us to compare two CH 4 gases directly against each other, the methods at MPI-BGC are superior in the ability to fulfil the PIT.
Even though the method at CIC proved to be very reproducible, we cannot rule out that a variation in the oxidation state of the reactor or an undetected air leakage into the system would affect the analysis of some CH 4 samples more than others.Because fulfilling the PIT is of paramount importance for isotope ratio analysis (e.g.Werner and Brand, 2001;Schimmelmann et al., 2016), we believe the method at MPI-BGC is less vulnerable to measurement errors in future calibrations.
4.3 Discussion on the compatibility between the scale anchors for δ 2 H-CH 4 and δ 13 C-CH 4 as propagated from IMAU to MPI-BGC and JRAS-M16 We interpret the excellent agreement between the δ 13 C and CH 4 calibrations in Carina-1 and Carina-2 from IMAU (Table 2) that both gases are precisely referenced and suitable for scale propagation from IMAU to MPI-BGC.The synthetic CH 4 -in-air standards were analysed on iSAAC for δ 2 H-CH 4 and δ 13 C-CH 4 and their isotope values were assigned using a WS that was calibrated against Carina-1.We can then interpret the δ 13 C difference between the iSAAC measurement and the calibrated synthetic CH 4 -in-air standards of +0.03 ± 0.10 ‰ as an accurate estimate for the calibration offset between the propagated scale anchor at MPI-BGC and the newly developed JRAS-M16.
Unfortunately, the situation is currently less straightforward for δ 2 H-CH 4 .The two WSs Carina-1 and Carina-2 were calibrated at IMAU with a difference in δ 2 H-CH 4 of 2.8 ‰ that was insignificant at the time (Table 2).Because Carina-1 and Carina-2 appear indistinguishable in δ 2 H-CH 4 when compared to iSAAC with a measurement precision for δ 2 H-CH 4 of 1.0 ‰ (Sect.2.6), we cannot determine the laboratory offset with the same certainty as for δ 13 C-CH 4 .If either Carina-1 or Carina-2 were representative for the calibrations at IMAU, the δ 2 H-CH 4 offset between the laboratories would amount to +4.2 ± 1.2 or +1.4 ± 1.2 ‰ respectively.A further comparison that includes new measurements on the current system at IMAU is required to determine the offset δ 2 H-CH 4 accurately.This offset can be resolved, for example, when a set of synthetic CH 4 -in-air standards (JRAS-M16) is analysed at IMAU in future.

Discussion on possible use of synthetic CH 4 -in-air standards in future
We demonstrated the ability to test the compatibility between IMAU and MPI-BGC by comparing scale anchors that were previously propagated from IMAU to MPI-BGC to JRAS-M16 gases.Future developments include an interlaboratory comparison to test whether a dedicated set of our synthetic CH 4 -in-air standards (JRAS-M16) could provide a community anchor to the VPDB and VSMOW scales with documented accuracy.A further important test would be to determine to what extent the use of centrally calibrated standard gases could increase compatibility.A recent incidence provides a good example for the vulnerability of δ 13 C-CH 4 observations in the atmosphere without suitable m-RM.LSVEC, the second CRM anchor to the VPDB scale, has recently been discovered to be less reliable than anticipated.Until further notice, LSVEC is suggested to be treated with an enhanced δ 13 C uncertainty of 0.15 ‰ (S.Assonov, personal communication, 2016).It is important to appreciate that this uncertainty is fully added to the uncertainty of δ 13 C-CH 4 measurements, due to the similarity of LSVEC (−46.6 ‰) and tropospheric CH 4 (−47.5 ‰) in δ 13 C.That is, the new uncertainty of LSVEC contributes the largest component in the full error budget of δ 13 C-CH 4 analysis.Note that the suggested uncertainty of LSVEC is (i) on the order of the seasonal δ 13 C-CH 4 cycle in the Southern Hemisphere and (ii) a multiple of the analytical precision of laboratories monitoring δ 13 C-CH 4 .If measurements of δ 13 C-CH 4 considered the new uncertainty for LSVEC, the significance of signals such as the seasonal variability in the Southern Hemisphere would be lost on the cost of a better representation of accuracy.Including the uncer-tainty of LSVEC may further impact on the compatibility between several laboratories and, for example, suggest an artificially imposed spatial δ 13 C-CH 4 gradient, based on calibration artefacts.We advocate the scientific gain when accuracy and compatibility are differentiated (WMO, 2014).The community benefits from a referencing method that enables a compatibility level that is smaller than the atmospheric δ 13 C-CH 4 signal to resolve spatiotemporal δ 13 C-CH 4 differences as primary goal.We think that establishing JRAS-M16 as community scale anchor could be a valuable step towards reaching this goal.As appropriate for any scale anchor that is intended to be usable for the whole community over long periods of time, the scale anchors will have to be re-calibrated frequently in order to detect possible drifts or to improve and correct previous assignments.The results of these efforts will be made available to the public at regular intervals.
We propose the distribution of JRAS-M16, a set of synthetic CH 4 -in-air standards in 5 L glass flasks.While two JRAS-M16 gases shall be used as calibration standard, an optional third JRAS-M16 gas can be used as unknown that is calibrated against the known JRAS-M16 gases as measurement control standard.This experiment would simulate the case when all participating laboratories measure the same sample directly against the same m-RM using the method that is otherwise applied to every sample in the respective laboratory and has the potential to determine the achievable compatibility.A further possibility to share the JRAS-M16 scale anchor would be to send cylinders with air-WSs to MPI-BGC for calibration.Because a dedicated target of this work is to achieve best possible accuracy with JRAS-M16, we provide the uncertainty of the full traceability chain.Once a new CRM has been found in replacement of LSVEC, the δ 13 C-CH 4 and the traceability chain of JRAS-M16 will be revised accordingly.This will also be made upon future CRM revisions or replacements.

Conclusions
The number of laboratories that measure isotope ratios of atmospheric CH 4 is growing and combining data from multiple laboratories could enable new science and increasingly powerful analysis.However, merging data from multiple laboratories for analysis is currently hampered by the lack of reference materials that enable the community to produce a unified data set.To overcome this problem and to improve compatibility between laboratories, we produced synthetic CH 4 -in-air standards (JRAS-M16).We modified standard online IRMS techniques to calibrate pure CH 4 gases for δ 2 H and δ 13 C on international VSMOW and VPDB isotope scales respectively.Because such instrumentation is available to many isotope laboratories, our technical modifications and experiments can be reproduced elsewhere.Eight of the calibrated CH 4 gases were diluted with CH 4 -free air in 5 L glass flasks to produce synthetic CH 4 -in-air standards with known δ 2 H-CH 4 and δ 13 C-CH 4 values.These synthetic gas mixtures were then analysed on a newly developed system (iSAAC) to measure δ 2 H-CH 4 and δ 13 C-CH 4 in air samples.Hitherto, iSAAC used working standards as scale anchors for δ 2 H-CH 4 and δ 13 C-CH 4 , which were calibrated at a partnering institute (IMAU).The history of the propagated isotope scales goes more than 2 decades back in time and includes the propagation between several laboratories.We determine δ 2 H-CH 4 and δ 13 C-CH 4 in our synthetic CH 4 -in-air standards using the scale anchor propagation from IMAU and compare the results with our calibration results for δ 2 H-CH 4 and δ 13 C-CH 4 .We use this method to determine the δ 13 C-CH 4 offsets between the scale anchor that was propagated from IMAU and JRAS-M16, thereby providing a method to improve compatibility.Further comparisons are required to determine the offset for δ 2 H-CH 4 .
We welcome other laboratories to further test our calibrations by analysing JRAS-M16 air sets, which will be available upon request.Another possibility could be to have cylinders with air-WSs sent to MPI-BGC for calibration using JRAS-M16 as scale anchor.JRAS-M16 may help laboratories to anchor δ 2 H-CH 4 and δ 13 C-CH 4 observations to unified community scale anchors.This might be a useful step towards reaching the compatibility goals between laboratories, leading to an improved understanding of atmospheric CH 4 .Future work includes a revision of the δ 13 C-CH 4 calibrations once the replacement for LSVEC is established.This will reduce the uncertainty of the δ 13 C-CH 4 scale anchors significantly.The LSVEC replacement should extend to the δ 13 C-depleted range of biogenic CH 4 gases.

Data availability
The results of our final calibrations with the associated uncertainties of the full traceability chains are published as a Supplement to this paper.The supplementary data file also contains the revised calibrations of the data by Sperlich et al. (2012).These include corrections for the offset in RM8563 and for scale compression effect in the IRMS at CIC.The injection of H 2 O samples into the reactor is critical because it is prone to isotopic fractionation (Werner and Brand, 2001).This fractionation is mainly caused by system memory due to adhesion of injected H 2 O to the reactor walls.The isotopic fractionation can be overcome by repetitive injections of H 2 O samples with identical isotopic composition, thereby overwriting the memory effect until it reaches a marginal level.For H 2 O analyses under constant analytical conditions (e.g.constant reactor temperature), the adhesion effect is a function mainly of the amount of injected H 2 O sample.Moreover, the effect on the isotopic composition scales with the isotopic difference between two consecutive samples (Gehre et al., 2004).Because there is no adhesion of the sample during CH 4 analysis, this memory effect is most pronounced only during the analysis of H 2 O in our study.Subsequent CH 4 analysis does not contribute to system memory but can still be affected by H 2 O desorption from internal surfaces of the analytical system.Therefore, memory effects of H 2 O can propagate into the CH 4 calibrations.Memory effects are identified in a series of replicate H 2 O measurements and are corrected for by modelling the memory function as described in Gehre et al. (2004) and Brand et al. (2009a) on a routine basis, as our system has been used for isotope analysis of H 2 O samples for more than a decade.We conclude that our results are free of artefacts arising from sample memory.Isotopic fractionation during the analysis of the reference waters can also be caused by insufficiently heated septa (Gehre et al., 2004).We injected 106 identical H 2 O samples while we increased the septum temperatures in nine steps from 76 to 137 • C. In general, we observed a δ 2 H enrichment with increasing septum temperature.A systematic increase of δ 2 H-H 2 O with septum temperature is apparent above 90 • C until δ 2 H-H 2 O values plateau at septum temperatures around 130 • C (Fig. A1, blue circles).The stabilising δ 2 H-H 2 O at high temperatures suggests quantitative H 2 O processing without significant isotope fractionation, in line with previous observations (Gehre et al., 2004).In contrast, the three δ 2 H-H 2 O values below 90 • C (red diamonds) show an insignificant but slight increase in δ 2 H-H 2 O with septum temperature, which deviated from the pattern above 90 • C. We cannot explain the mismatch between the two patterns above and below 90 • C. We speculate that the initial heating of the septum to temperatures between 70 and 90 • C caused the desorption of accumulated of H 2 O, which was desorbed once the septum was heated to temperatures above 90 • C.
Quantitative conversion of both CH 4 and H 2 O in the hightemperature reactor is of utmost importance for our study, because incomplete conversion causes isotopic fractionation in the reaction products (e.g.Burgoyne and Hayes, 1998;Hilkert et al., 1999;Gehre et al., 2004).The reactor temper-P.Sperlich et al.: Development and evaluation of a suite of isotope reference gases ature is critical for the efficiency of the conversion process.We performed an experiment with CH 4 and H 2 O injections at different reactor temperatures (Fig. A2).For water injections we observe a pronounced, nonlinear δ 2 H-H 2 O change of ∼ 15 ‰ with reactor temperature increase from 1300 to 1450 • C, reaching a plateau above 1400 • C. The pattern is consistent with previous observations in both trend and magnitude (Gehre et al., 2004).In contrast, the linear fit for δ 2 H-CH 4 increases by only about 1 ‰ over the 150 • C temperature range.However, the slope is statistically insignificant as shown by the 95 % confidence interval of the linear fit (Fig. A2).This analyte-specific isotope variation is also reflected in the areas of the H 2 O and CH 4 -derived H 2 peaks (Fig. A2) (with some significant scatter in the data).While the H 2 O-derived H 2 peak areas increase with increasing reactor temperature, the CH 4 -derived H 2 peak areas remain constant within the error bars throughout the experiments.For an unknown reason, three out of six H 2 peaks that resulted from H 2 O injections at 1400 • C were by 10-15 standard deviations smaller than the remaining three peaks.We present the averages and 1σ standard of the H 2 peaks with and without removal of these outliers in Fig. A2, which shows the exceptional pattern at 1400 • C. Despite this peak size variability, the isotopic composition of all H 2 O injections at 1400 • C is in good agreement.Our experiments indicate that reactor temperatures in excess of 1400 • C are required especially for quantitative conversion of H 2 O, while the effects of reactor temperature on both yield and the isotopic composition of CH 4 -derived H 2 are comparably small.Therefore, we operate the reactor at a temperature of 1450 • C to guarantee quantitative conversion without isotope fractionation of both H 2 O (Gehre et al., 2004) and CH 4 (Burgoyne and Hayes, 1998;Hilkert et al., 1999).
The Supplement related to this article is available online at doi:10.5194/amt-9-3717-2016-supplement.

Figure 1 .
Figure 1.Calibration hierarchy to produce synthetic CH 4 -in-air standards including links of the traceability chain.The long, central arrow shows that the primary CH 4 gases were directly calibrated against CRMs for δ 13 C but not for δ 2 H.The uncertainty (U ) associated with each calibration hierarchy level is indicated by indices that are described in Sect.2.4.

Figure 2 .
Figure 2. Configuration of manual the two-position 10-port valve with two 1 mL sample loops shown in grey dashed box and TC/EA-IRMS system for δ 2 H-CH 4 calibration.The TC/EA-IRMS reactor (displayed as in Gehre et al., 2004) is fed either by the sample line from the 10-port valve or by the syringe via autosampler (not shown).The size of components is chosen to increase clarity.

Figure 3 .
Figure 3. Chromatograms of δ 2 H-CH 4 calibration sequences using TC/EA-IRMS with traces of m/z 2 and m/z 3 shown in black and blue respectively.The bottom panel shows an example of an entire calibration sequence which begins with three square-shaped peaks of pure H 2 , followed by alternations of three to four H 2 O-and three to four CH 4 -derived H 2 peaks before the sequence ends with another three square-shaped peaks of pure H 2 .The top left panel enlarges H 2 peaks from H 2 O (peak no.6-7) and CH 4 (peak no.8-9) injections respectively.A zoom into baseline details of H 2 Oderived peak no.7 and CH 4 -derived peak no. 8 is shown in the top right panel.Red lines indicate the sections used for peak integration (weak widths are 43 and 59 s for H 2 O-and CH 4 -derived H 2 peaks respectively) by the IRMS software.

Figure 4 .
Figure 4.The 10-port valve for manual CH 4 injections is coupled to the EA-IRMS system through custom-made gas inlet into combustion (oxidation) unit for δ 13 C-CH 4 calibration.The proportions of illustrated components are chosen to increase clarity.

Fig. 4 )
Fig. 4) and a Mg(ClO 4 ) 2 trap before it enters the GC column (3 m, 1/4 in.; Porapak PQS, CE instruments) held at 80 • C. Thereafter, the sample enters the IRMS through the open split.Measurement sequences to calibrate primary CH 4 gases to the VPDB isotope scale are created by alternating blocks of manual CH 4 injections and CRM/WS (Table2) applications via autosampler.We applied one WS and one CRM (LSVEC) to calibrate the primary CH 4 gases in a two-point calibration.While MAR-J1 was used as WS in most experiments, ALI-J1 was used once, during a calibration of Merlin.Megan and Merlin were each calibrated on 3 different days to determine the external reproducibility of the δ 13 C results.Chromatograms resulting from CH 4 and from carbonate analyses using EA-IRMS are displayed in Fig.5and show very similar peak shapes for CH 4 and carbonates.Typical m/z 44 amplitudes and peak widths were ∼ 7.4 ± 0.2 V and 101 ± 1 s for both materials respectively.We connected a primary CH 4 and a secondary CH 4 gas to the 10-port valve to calibrate the secondary CH 4 gases (Table1) for δ 13 C in a one-point calibration.All measurement results were corrected for scale compression based on the method suggested inVerkouteren and Klinedinst (2004), using an empirical, mass spectrometer specific correction factor of 1.0056.

Figure 5 .
Figure 5. Chromatograms of δ 13 C-CH 4 calibrations using EA-IRMS with traces for m/z 44, 45 and 46 in green, brown and black respectively.Bottom panels show complete chromatograms of CH 4 and Li 2 CO 3 analyses while the two top panels zoom into the baseline of the traces.The first three square-shaped peaks stem from injections of a pure CO 2 WS while the more Gaussian-shaped peaks result from CH 4 -and Li 2 CO 3 -derived CO 2 analysis.The two red lines indicate the sections that the IRMS software uses for peak integration (CO 2 peak widths are 101 and 100 s for CH 4 and Li 2 CO 3 analysis respectively).
Figure A1.The δ 2 H variation of H 2 O injections with septum temperatures.Blue circles show average δ 2 H-H 2 O values for septum temperatures above 90 • C, the black line is the quadratic polynomial fit to the data above 90 • C while red diamonds display δ 2 H-H 2 O values at septum temperatures below 90 • C. The error bars show 1σ standard deviations and the grey-dashed lines indicate the typical precision limit of 1 ‰ for δ 2 H-H 2 O analysis (Gehre et al., 2004) around the δ 2 H-H 2 O value of the polynomial fit for the septum temperature of 130 • C (set point during calibration experiments).The grey dashed lines show that our δ 2 H-H 2 O analyses remain within a typical precision level as long as the septum temperature is controlled to ∼ 130 ± 10 • C.

Figure A2 .
Figure A2.The dependence of δ 2 H and H 2 peak areas of H 2 O and CH 4 injections from reactor temperatures between 1300 and 1450 • C. Top and bottom panels show H 2 O and CH 4 experiments respectively.δ 2 H isotope ratios are shown in blue for H 2 O and green for CH 4 and refer to the left-hand axes.Average H 2 peak areas are indicated by grey crosses and refer to the right-hand axes.All error bars indicate the standard deviation.The red diamond shows the average peak area and the respective standard deviation including the outliers (see Appendix text).Y axes ranges are matched between top and bottom panels to enable direct comparison of the temperature effect for H 2 O and CH 4 .Equations describe the fits in both panels, displayed by dashed lines.Continuous lines in the bottom panel indicate the 95 % confidence interval of the linear fit.

Table 1 .
Gases used for this study.Note that Mike-1 and Martha-1 were transitional CH 4 mixtures and do not exist anymore.

Table 2 .
Measurement standards used in this study."CRM" and "WS" identify certified reference material and in-house working standards respectively.The uncertainties of the δ 2 H and δ 13 C data from MPI-BGC correspond to the 95 % confidence limit of the error of the mean.We include the uncertainty estimate that the IAEA recently suggested for LSVEC.Publications and additional comments related to the standards are listed in the last column.
extreme scenario would be 0.04 and 0.007 ‰ for δ 2 H-CH 4 and δ 13 C-CH 4 , respectively, which is negligible in both cases.

Table 3 .
Results of CH 4 isotope calibrations.Gas names as used in main text and their function as primary or secondary CH 4 are shown in column 1 and 2 respectively.All uncertainty estimates include the full traceability chain (Sect.2.4).Note that we provide uncertainty estimates for δ 13 C-CH 4 without and with the uncertainty of 0.15 ‰ in LSVEC in column 6 and 7 respectively.Martha-1 and Mike-1 were intermittent gases and used to produce Martha-2 and Mike-2.

Table 4 .
Results for comparison in δ 13 C between CIC and MPI-BGC.Indices of the header are explained in Sect.2.9 of the main text.The CIC data are corrected for the offset in RM 8563 Discussion on the experimental artefact elimination during δ 2 H-CH 4 and δ 13 C-CH 4 calibrations in primary and secondary CH 4 gasesWe present δ 2 H and δ 13 C calibrations in pure CH 4 gases against CRMs, WSs and other CH 4 gases.Samples and reference materials were always analysed in the same analytical systems, thereby complying with the PIT as much as possible.The only limitation of the PIT is due to the chemical difference between unknown samples (CH 4 ) and the known reference materials (carbonates, H 2 O) used for anchoring the CH 4 gases to the respective isotope scales.In order to calibrate the primary CH 4 gases accurately, we need to exclude or eliminate material-and method-specific errors (IAEA (Werner and Brand, 2001) text. 2 H-H 2 O and subsequent δ 2 H-CH 4 analyses, either by discarding initial injections or making appropriate corrections(Werner and Brand, 2001).H 2 O injections produced highest H 2 yields and stable δ 2 H-H 2 O values at reactor temperatures of 1450 • C. Therefore we kept the reactor at 1450 • C during all calibration measurements.In addition, we found a minor dependence of δ 2 H-H 2 O on the septum temperature. W experimentally determined a septum temperature of 130 • C at which the effect on δ 2 H-H 2 O was insignificant and kept the septum at 130 • C during all calibrations.We describe the experiments on reactor temperature and septum temperature in Appendix A in more detail. Nte that it is essential to exclude systematic, material-specific errors to make H 2 O and CH 4 reactions directly comparable for δ 2 H calibration.Based on these experiments we conclude that the δ 2 H-CH 4 calibrations do not contain measurement errors introduced by bracketing δ 2 H-H 2 O analyses.