Estimation of background gas concentration from differential absorption lidar measurements

Approaches are considered to estimate the background concentration level of a target species in the atmosphere from an analysis of the measured data provided by the National Physical Laboratory’s differential absorption lidar (DIAL) system. The estimation of the background concentration level is necessary for an accurate quantification of the concentration level of the target species within a plume, which is the quantity of interest. The focus of the paper is on methodologies for estimating the background concentration level and, in particular, contrasting the assumptions about the functional and statistical models that underpin those methodologies. An approach is described to characterise the noise in the recorded signals, which is necessary for a reliable estimate of the background concentration level. Results for measured data provided by a field measurement are presented, and ideas for future work are discussed.


Introduction
Differential absorption lidar (DIAL), which is based on the optical analogue of radar, provides the capability to mea-sure remotely the concentration and spatial distribution of compounds in the atmosphere (Measures, 1984).The ability to scan the optical measurement beam through the atmosphere enables pollutant concentrations to be mapped and emission fluxes to be determined.The mobile ground-based DIAL systems developed at the National Physical Laboratory (NPL) for pollution monitoring have been used worldwide for over 20 years (Milton et al., 1992;Robinson et al., 2011).They have been deployed for routine monitoring, emission factor studies, research investigations, and targeted monitoring campaigns.A key feature of the NPL system is that it operates in both ultraviolet and infrared regions.The infrared measurements at wavelengths of around 3 µm target a range of hydrocarbon gases, including methane that has a significant background concentration level (of the order of 1.8 ppmv) (Butler and Montzka, 2016).
The lidar technique is based on transmitting a pulse of laser radiation into the atmosphere and measuring the light scattered by the atmosphere and returned to the system.The DIAL system extends the basic lidar technique by operating the laser alternately at two adjacent wavelengths (Measures, 1984).One of these wavelengths, termed the "on-resonant wavelength", is chosen to be a wavelength that is absorbed by the target species, and the other, the "off-resonant wavelength", is a wavelength that is not absorbed significantly by the target species.The two wavelengths are chosen to be close so that parameters describing atmospheric variability, such as differences in scattering media and interference compounds, are the same for both wavelengths.Any measured difference in the returned signals is therefore due to absorption by the target species.
A critical part of the analysis of the measured data provided by the DIAL system is to estimate, and subsequently P. Harris et al.: Background estimation from DIAL measurements correct for, the background concentration level of the target species along the optical measurement path.This is necessary for an accurate quantification of the concentration level in the atmosphere of the target species, which may be a pollutant or greenhouse gas emission, for the purpose of source attribution and to support decisions made on the basis of that quantification.This paper is concerned with approaches to analysing the measured data provided by the DIAL system to estimate the background concentration level of the target species.
The paper is organised as follows.In Sect. 2 methodologies for estimating the background concentration level are described and contrasted.One approach makes assumptions about the nature of the functional and statistical models used to represent the values of path-integrated concentration calculated in terms of the measured data.A second approach relies upon knowledge of the statistical model for the noise in the measured data, and approaches to understand and characterise the noise are considered in Sect.3. Results for measured data provided by a field measurement are presented in Sect. 4. Finally, concluding remarks and a discussion of future work are given in Sect. 5.

Two-step linear least-squares (LLS) approach
For a given elevation angle θ , the path-integrated concentration C(r) of the target species at a distance r along the optical measurement path is given by C(r) = 1 2γ log (S off (r) − α)/P off (S on (r) − β)/P on , where S off (r) and S on (r) are the returned signals corresponding to the off-resonant and on-resonant wavelengths, respectively; γ is a known differential absorption coefficient; α and β represent systematic offsets of the signals due to the instrumentation (Milton et al., 1987); and P off and P on are known normalisation constants used to correct for the different laser powers at the two wavelengths.In the absence of a plume containing the target species, C(r) can be modelled as a straight-line function of distance r, with A representing a constant systematic offset and B representing the background concentration level of the target species along the measurement path.By equating the expressions (1) and ( 2), and re-parameterising, it follows that involving unknown parameters a, b, α, and β, and A and B can be recovered using the expressions Given measured values s off,i and s on,i of the signals S off (r) and S on (r), respectively, at distances r i (i = 1, . .., m), an approach often applied to determining estimates of the unknown parameters in model (3) involves the following two steps: i. evaluate an estimate α 0 of α as the average of measured values s off,i (i = m far , . . ., m), and similarly evaluate an estimate β 0 of β as the average of measured values s on,i (i = m far , . . ., m); ii. evaluate estimates (a 0 , b 0 ) of (a, b) as the solution to the LLS problem min The index m far in step (i) is chosen empirically to define a part of the two signals for which the values of the backscattered lidar signals can be considered to have essentially fallen to zero and correspond to measurements made in the far field.
The indices m low and m hgh in step (ii) are chosen empirically to define that part of the signals that encompasses the backscattered lidar measurement range that is free from the effects of obstructions and for which the signal-to-noise ratio for the values y i (i = m low , . . ., m hgh ) is not too small such that those values provide useful information about the background level.
Figures 1 and 2 illustrate the application of this two-step approach to the data obtained from a field measurement.Figure 1 shows the measured values for the two signals and indicates the positions of the windows of data used in each analysis step.The signals each comprise m = 999 values, and the windows are defined by setting m low = 30, m hgh = 300, and m far = 500.Since the sampling rate of the lidar data acquisition system is 40 MHz, each index step represents a distance of 3.75 m for the DIAL measurements, and therefore the length of the path is (approximately) 3.75 km, the far field in step (i) corresponds to distances greater than 1875 m, and the data analysed in step (ii) corresponds to distances between 112.5 m and 1125 m. Figure 2 shows the path-integrated concentration data calculated in terms of the measured signal values and the estimates α 0 and β 0 , and the straight-line model with A 0 and B 0 calculated in terms of the estimates a 0 and b 0 using the expressions (4). to near-field scatter rather than the backscattered lidar return signals.For indices i > m hgh , the measured signal values are such that either the path-integrated concentration values calculated in terms of them and the estimates α 0 and β 0 are undefined or they are dominated by noise and provide little useful information about the background concentration level.The two-step LLS approach is based on two main assumptions.Firstly, an assumption is made about the consistency of a straight-line model with the data y i (i = m low , . . ., m hgh ).In practice, a plume containing the target species (whose position and concentration are unknown but are to be determined) may be present, and, consequently, the model may be inadequate for the data.Indeed, deviations of the data from the model may provide information about the position and pathintegrated concentration level of the plume.Secondly, an assumption is made about the statistical model for the data y i (i = m low , . . ., m hgh ), viz.that the errors in the data are realisations of random variables that are independent and identically distributed (Mardia et al., 1979).It is apparent from Fig. 2 that the deviations between the path-integrated concentration data and the model are far from random, and consequently there is strong evidence that such an assumption about the statistical model is invalid.It follows that there can be doubt about the reliability of the estimate of the background concentration level provided by the two-step LLS approach.

One-step generalised least-squares (GLS) approach
Some information about the location of the plume -in terms of values of indices m s and m e , where m s ≤ m e , that define a window known to contain the plume -is often available be- cause there can be knowledge about the location of the source of the plume relative to the location of the DIAL system.However, the reliability of this information will vary with the quality of the knowledge of the emission source location and the meteorological conditions at the time of the measurement.Given this additional information, the model (2) can be replaced by one of the form with B representing (as before) the background concentration level and A 2 the path-integrated concentration level of the target species within the plume.(The condition A 2 = 0 would indicate the absence of a plume.)Use of the model ( 8) is a simple way to address the deficiency of the model (2) by representing the overall (macro) affect of the plume on the path-integrated concentration level without attempting to model the detail (at a micro level) of how the concentration level varies within the plume.
The deficiency in the statistical model for the data y i can be addressed by characterising the statistical properties (or probability distributions) for the errors in that data.However, such a characterisation is not straightforward because (a) those errors are obtained as a non-linear transformation (involving a quotient and a logarithm) of the errors in the measured values of the individual returned signals and (b) they depend on the errors in the estimates (α 0 , β 0 ) of (α, β), which, in turn, depend on the measured values for (parts of) the returned signals.However, it might be expected that the errors in the measured values of the individual returned signals themselves would be simpler to characterise and treat.
An alternative analysis approach is then based on solving a GLS problem as follows (Forbes, 1993;Forbes et al., 2002;Milton et al., 2006).Let I 1 = {m LOW , . . ., m s } and P. Harris et al.: Background estimation from DIAL measurements I 2 = {m e , . . ., m HGH } identify those parts of the measured data before and after the window containing the plume; e is the vector containing e off,i = s off,i − S off,i , i ∈ I , and e on,i = s on,i − S on,i , i ∈ I , where I = I 1 ∪ I 2 ; and V is the covariance matrix for the complete set of data s off,i , i ∈ I , and s on,i , i ∈ I .The index m LOW will usually be chosen equal to m low , but m HGH can exceed m hgh because the approach uses the measured data s off,i and s on,i directly and does not require the calculation of the data y i .Estimates of unknown parameters a 1 ; a 2 ; b; α; β; S off,i , i ∈ I ; and S on,i , i ∈ I , are obtained as the solution to the GLS problem mine subject to the constraints and The parameters A 1 , A 2 , and B in the model ( 8) are recovered from those appearing in the above formulation using (cf. expression 4).By using the equality constraints ( 10) and ( 11) to eliminate the parameters S off,i , i ∈ I , the problem takes the form of an unconstrained non-linear leastsquares problem with unknown parameters a 1 ; a 2 ; b; α; β; and S on,i , i ∈ I , and can be solved using standard numerical algorithms (Paige, 1979;Gill et al., 1981;Björck, 1996).
The problem provides (simultaneously) estimates of the unknown parameters together with the covariance matrix associated with those estimates.In terms of those estimates and uncertainties, it provides estimates and uncertainties for A 2 (the path-integrated concentration level of the target species within the plume), B (the background concentration level of the target species), and the values of the returned signals at distances r i , i ∈ I .The above formulation of the GLS problem makes no assumption about the functional form of the returned signals.In an alternative formulation, the returned signal corresponding to the on-resonant wavelength is modelled as in terms of basis functions φ j (r) (j = 1, . . ., n), and the GLS problem is formulated as follows: determine estimates of unknown parameters a 1 , a 2 , b, α, β, and c j (j = 1, . .., n) that solve the minimisation (9) subject to the constraints and For example, the signal S on (r) (or each part of the signal before and after the window containing the plume) might be described by a cubic polynomial spline function with the basis functions φ j (r) taking the form of B-spline basis functions of order four specified in terms of "knots", which are the positions where the cubic polynomial pieces of the spline curve are joined (de Boor, 1978;Cox, 1972Cox, , 1993)).The reason to use a parametric representation of the signal is to impose a degree of smoothness on the values of the returned signals.
A number of variants of the above formulations of the GLS problem are also considered: 1.Only the data after the window containing the plume are treated, i.e.I 1 is taken to be the empty set, and the GLS problem is solved for estimates of the parameters a ≡ a 1 + a 2 , b, α, β, and c j (j = 1, . . ., n).
2. The parameters α and β are fixed, for example, to be the values α 0 and β 0 determined in terms of the data s off,i and s on,i for i = m far , . .., m, as in the two-step LLS approach.
3. The values α 0 and β 0 are considered to be estimates of the parameters α and β, and the minimisation (9) takes the form mine where λ α and λ β reflect the "quality" of the estimates α 0 and β 0 .
The reasons for considering these variants of the basic problem, e.g. in which the additional terms in the minimisation (16) provide a regularisation of the minimisation (9), are discussed in Sect.3.
To apply the analysis approach, it is necessary to know the covariance matrix V based on a characterisation of the noise in the (raw) returned signals.Obtaining such knowledge is the subject of Sect. 3. In contrast to the two-step LLS approach, it is expected that the alternative analysis approach will provide reliable estimates of the required quantities supported by an assessment of their associated uncertainties.

Auto-regressive model for the noise
A characterisation of the noise in the two returned signals is undertaken in terms of estimates of the errors in the measured data for those signals.Let index m E ≥ m e be such that it can be stated a priori with confidence that the indices m E , . . ., m HGH define a part of the signals that does not include a plume.The data s off,i and s on,i within that part of the signals are smoothed (or filtered), and the residual deviations associated with the smoothed data are used as estimates of the errors in the measured data.Figure 3 illustrates the residual deviations defined by least-squares cubic polynomial spline fits to the measured data for the signals shown in Fig. 1 with m E = 100 and m HGH = 500.A cubic polynomial spline function is chosen as a flexible and empirical model to describe the underlying signals for r m E ≤ r ≤ r m HGH .An alternative approach to filtering the data is to use a wavelet decomposition, but the results obtained and described in Sect. 4 are not substantially different.
Figure 4 shows the auto-and cross-correlation functions associated with the estimates of the measurement errors for the two signals.The graphs in these figures suggest that the errors are correlated, both within each signal and between the two signals.Another method for studying longterm correlations (or "memory") in signals is detrended fluctuation analysis (DFA) (Peng et al., 1994), which has been applied successfully to a wide range of applications in physics, medicine, climatology, and other areas.When the auto-correlation function takes the form the corresponding detrended fluctuation function takes the form where η and ζ are related by η = 1−ζ /2 (Kantelhardt, 2001).Whilst Gaussian noise is usually assumed to be "white", i.e. its auto-correlation function decays exponentially and the DFA scaling exponent is η = 0.5, in real-world systems the stochastic variables are often described by co-called "red" noise with η > 0.5, which means that values have memory of previous ones.The DFA scaling exponent is estimated from the linear fit to the data in a log-log plot of the fluctuation function F (s). Values of η between 0.5 and 1 denote stationary noise, whereas values bigger than 1 denote non-stationary long-term correlated noise, with random walk noise having η = 1.5.When applied to the estimates of the measurement errors for the two signals shown in Fig. 3, the values of the DFA scaling exponents are just less than 1.5 for small lags s, indicating that the signals have appreciable short-term memory.
A (multi-output) auto-regressive (AR) model is used to characterise the noise in the signals.Let d off,i and d on,i denote the residual deviations as displayed in Fig. 3.Then, the AR model takes the form and where q is the order of the AR model, (w off,i , w on,i ) are random draws from the bivariate Gaussian distribution N (0, ), and (w off,i , w on,i ) is assumed to be uncorrelated with (w off,j , w on,j ) for i = j .(In a general form of the AR model defined above, the number of terms in each summation can be chosen to be different.)For a given choice of order q, fitting the AR model to the residual deviations provides estimates κ ,k and τ ,k ( = 1, 2, k = 1, . . ., q) for the parameters κ ,k and τ ,k , with an associated covariance matrix, together with an estimate of .Let be the Cholesky decomposition of (Golub et al., 1996), and define where For the residual deviations shown in Fig. 3 and the AR model of order q = 4 fitted to those residual deviations, Fig. 5 illustrates the transformed residual deviations z off,i and z on,i , which are expected to be random draws from the bivariate Gaussian distribution N (0, I), where I denotes the identity matrix.Figure 6 shows the auto-and cross-correlation functions associated with z off,i and z on,i .The graphs in Figs. 5 and 6 suggest that the transformed residual deviations are much less correlated compared with the estimates d off,i and d on,i of the measurement errors, and they resemble much more closely a white (Gaussian) noise process.Furthermore, the values of the DFA scaling exponents are approximately 0.5 for all lags s.An approach to selecting the order q of the AR model used to characterise the noise is to obtain model fits for increasing orders and to select that value of q for which the quality of the fit is not substantially improved using higher orders.Figure 7 illustrates how the standard deviations and correlation coefficient derived from the covariance matrix for the pairs (w off,i , w on,i ) of residual deviations associated with a fitted AR model vary with the order q.It is seen that these statistics essentially saturate for q ≥ 4, which motivates the selection of the model order made above.Furthermore, it is seen that the standard deviations for the two signals saturate to values that are quite similar, suggesting that the white-noise processes underlying the noise in the two returned signals are themselves comparable.

Formulation of GLS problem with auto-regressive noise model
The estimated AR model determines the covariance matrix V in the formulation (9) of the GLS problem.However, it is unnecessary to explicitly construct the matrix V, which can be a large matrix.Instead, the fitted AR model is used to define Distance/km for i ∈ I 1 ≡ {m LOW + q, . . ., m s } and i ∈ I 2 ≡ {m e + q, . . ., m HGH }.Then, since (f off,i , f on,i ), i ∈ I = I 1 ∪I 2 , is uncorrelated with (f off,j , f on,j ) for i = j and has covariance matrix , the formulation (9) can be replaced by or equivalently by where and the minimisation is subject to the same constraints ( 10) and (11) (or 14 and 15).For the signals shown in Fig. 1, and using an AR model of order q = 4 to characterise the noise in those signals as determined above, consider solving the GLS problem for the data defined by indices I ≡ I 2 = {m e = 100, . .., m HGH = 500} after the window containing the plume (variant 1 in Sect.2.2) and for various fixed values of the parameters α and β (variant 2 in Sect.2.2).Let here a function of α and β, be the mean square error evaluated at the solution, where |I | is the number of indices in I and p = 2 + n is the number of unknown parameters.Figure 8 shows how S 2 depends on α when β = β 0 and on β when α = α 0 .The graphs in the figure also show as vertical dashed lines the particular values α 0 and β 0 .Although a minimum with respect to β is well defined and indeed corresponds to the value β 0 , the same is not the case for α, since S 2 is essentially constant for values of α less than α 0 when β = β 0 .
Figure 9 illustrates the solution to the GLS problem with unknown parameters a, b, α, β, and c j (j = 1, . . ., n).The solution is displayed by showing the path-integrated concentration data calculated in terms of the estimates of α and β and the straight-line model calculated in terms of the estimates of A and B. For comparison, the path-integrated concentration data calculated in terms of the estimates of α 0 and β 0 , and the linear least-squares straight-line fit to that data are also shown: cf.Fig. 2. Figure 10 illustrates the transformed residual deviations corresponding to the solution to the GLS problem shown in Fig. 9. Figures 11 and 12 illustrate the same information but for the solution to the GLS problem with α = α 0 and β = β 0 .It can be seen that the quality of the two solutions, in terms both of the values of S 2 and of the distributions of transformed residual deviations, are comparable.However, the solutions themselves are quite different, which is a reflection of the fact that the parameter estimates are confounded with each other and are not well defined by the data (as illustrated also in Fig. 8).It is for this reason that consideration of regularised versions of the GLS problem (variants 2 and 3 in Sect.2.2) are considered necessary.
Finally, Figs. 13 and 14 illustrate the solution to the GLS problem with unknown parameters a, b, α, β, and c j (j = 1, . . ., n) for the data defined by indices I ≡ I 2 = {m LOW = 30, . . ., m HGH = 500} that includes data before and within the window containing the plume but does not allow for the presence of the plume in the functional model.Here, particularly for data to the left of the window, the transformed residual deviations tend to be large and suggest a lack of fit of the model, which is attributed to the effect of the plume.It is for this reason that adjusting the functional model for the measured data, as well as characterising the statistical model for the data, is an important part of the analysis.

Results
Results are presented for field measurements made for six elevation angles θ spanning the interval 3.7 to 9.7 • .Figures 1 to 14 relate to the measured data corresponding to the largest elevation angle in that interval.Additionally, Figs. 15 and 16 illustrate the solution to the GLS problem with unknown parameters a, b, and c j , (j = 1, . . ., n) for the data defined by indices I = I 1 ∪ I 2 with I 1 = {m LOW = 30, . . ., m s = 50} and I 2 = {m e = 100, . . ., m HGH = 500}, i.e. allowing for the presence of a plume.Note that in Fig. 16 a larger scale for the vertical axis compared to previous graphs showing transformed residual deviations is needed to show the behaviour of the deviations corresponding to positions before the window set to contain the plume.Here, the transformed residual deviations suggest that the noise characteristics may be dif- In red, the results of the two-step LLS approach applied to the data with indices m low to m hgh : the path-integrated concentration data calculated using α 0 and β 0 , and the linear least-squares straight-line fit to that data (cf.Fig. 2).In blue, the results of the GLS approach applied to the data with indices m e to m HGH : the data are calculated in terms of estimates of α and β, and the straightline model in terms of estimates of A and B.
ferent for the data before and after the window considered to contain the plume.Although the transformed residual deviations for the data before the plume appear essentially random, their magnitude is larger than would be expected.The analysis described in Sects. 2 and 3 for the measured data relating to the largest elevation angle is repeated for the other elevation angles, and the results are presented in Fig. 17 and Tables 1 and 2. For all elevation angles the same window of data containing the plume and the same order for the AR model used to characterise the noise in the returned signals are considered.Figure 17 compares the estimates of the parameters κ ,k and τ ,k (with κ 1,0 and τ 2,0 set to be unity), and Table 1 compares the estimates of the elements of the covariance matrix that jointly define the AR model used to characterise the noise in the returned signals for each elevation angle.There is generally good agreement between the estimates of κ 1,k and τ 2,k , which suggests that the "autocorrelation" terms in the statistical model defined by those parameters are reproducible across the different elevation angles.The estimates of κ 2,k and τ 1,k are less in agreement, and in particular the estimates of κ 2,k for the smallest elevation angle appear somewhat different from those for the other elevation angles (lower-right graph of Fig. 17), which suggests the "cross-correlation" terms are not so reproducible.A technical explanation for that property of the statistical model is currently not available but will be investigated in further work.
Table 2 gives the estimates of the background concentration level B obtained in various ways and compares them for different elevation angles.In all cases, the offset parameters α and β are set to the values α 0 and β 0 calculated in terms of the signal values measured in the far field.Estimates of B are obtained using the LLS approach; the GLS approach using data with indices I 2 = {m e = 100, . . ., m HGH = 500} considered to be beyond the plume; and the GLS approach using data with indices I 1 ∪ I 2 and I 1 = {m LOW = 30, . . ., m s = 50}, i.e. allowing for the presence of a plume.In the last case, an estimate is also provided for the parameter A 2 representing the path-integrated concentration level of the target species within the interval defined by the indices m s = 50 to m e = 100.For all but the smallest elevation angle, the estimate of A 2 is positive, which may be taken to indicate the presence of a plume somewhere within the chosen window.
It may be expected that the results provided by the GLS approach using the data with indices I 2 are the most reliable because, firstly, data that are possibly influenced by the presence of a plume are excluded and, secondly, the same data are used for the purpose of noise characterisation.Indeed, the estimates show the least variability with elevation angle compared with the other approaches to determine estimates of the background concentration level, and particularly compared with the estimates provided by the two-step LLS approach.The results provided by the GLS approach using the data with indices I = I 1 ∪ I 2 are, for some elevation angles, influenced appreciably by the additional data measured close to the DIAL system and before the plume.That data constitute only a small part of the data set (about 5 %), and, as noted above, there is evidence of an inconsistency between those data and the functional and/or statistical models used in the analysis.In particular, the nature of the transformed residual deviations suggests that the statistical model determined using the data after the plume may not apply for the data before the plume.Ideally, the noise in the data before the plume would be characterised either separately or in combination with the data after the plume, but that is made difficult Path-integrated concentration/ppmv because the amount of data is small and there is a gap between the two parts of the data.Further analysis using data sets for which there is no plume may help to give a better understanding of this aspect.

Concluding remarks and future work
This paper has been concerned with approaches to analysing the recorded measured data obtained using NPL's DIAL system to estimate the background concentration level of a target species in the atmosphere.The estimation of the background concentration level is necessary for an accurate quantifica- tion of the concentration level of the target species within a plume, which is the quantity of interest.The paper has focussed on methodologies for estimating the background concentration level and, in particular, contrasting the assumptions about the functional and statistical models that are part of those methodologies.An approach to estimating the background concentration level has been described.It uses a functional model for the path-integrated concentration level that allows for the presence of a plume.It also uses a statistical model in the form of an auto-regressive function to describe the noise in the measured data for the signals corresponding to the off-resonant and on-resonant wavelengths.The functional and statistical models are then used to formulate a generalised leastsquares problem whose solution provides estimates of the background concentration level and other model parameters, including the path-integrated concentration level of the target species in the plume.Results have been presented for field measurements made for six elevation angles using a variant of the generalised least-squares approach in which the offset parameters α and β in the functional model are set to fixed values.From an analysis of the measured data recorded beyond the plume the results show reasonable consistency between the statistical models for the noise and between the estimates of the background concentration level correspond-Table 2. For each elevation angle θ (column 1), estimates (α 0 , β 0 , b 0 ) provided by the two-step LLS approach (LLS, columns 2-4), estimate b of b provided by the GLS approach with α = α 0 and β = β 0 for data with indices I 2 = {m e = 100, . . ., m HGH = 500} (GLS 1 , column 5), and estimates ( b, a 2 ) of (b, a 2 ) provided by the GLS approach with α = α 0 and β = β 0 for data with indices I 1 ∪ I 2 with I 1 = {m LOW = 30, . . ., m s = 50} (GLS 2 , columns 6-7).ing to the different elevation angles.However, the estimates of the background concentration level are influenced by the inclusion of measured data before the plume.

LLS GLS
The paper describes on-going work on the analysis of the measured data provided by NPL's DIAL system.In future publications, issues associated with the results presented in this paper will be addressed, and further aspects of the analysis will be described, including the following: 1.The approach relies on the availability of knowledge about the location of the plume, which is then used as the basis for excluding data that can be expected to be inconsistent with the functional and statistical models used in the analysis.There is benefit in trying to refine and improve that knowledge to obtain a short window containing the plume and to increase the amount of data available to estimate the background concentration level.Similar developments could also be used to identify the presence of an unexpected plume.
2. The presented results have not included an assessment of the quality of the estimate of the background concentration level in the form of a statement of uncertainty, which is a necessary part of the quantification of the background concentration level.The assessment will need to consider not only the influence of noise in the recorded signals, which has been characterised using a particular statistical model, but also the uncertainty associated with that model, which has been estimated from an analysis of the measured data.
3. The presented results suggest that the statistical model obtained to characterise the noise in the recorded signals after the plume may provide only a partial description of the noise before the plume, in the sense that it captures the correlation structure of the noise but not its magnitude.A complete description of the noise is necessary if the part of the signals before the plume are to be used to obtain a reliable estimate of the background concentration level supported by an associated uncertainty.
4. The analysis has been applied separately and independently for each elevation angle θ .Making an assumption about the dependence of the background concentration level on θ -for example, that it is a constant or slowly varying function of θ -would allow the measured data for the different elevation angles to be aggregated and analysed together.
5. Additional experimental data -for example, direct measurement of the background concentration level undertaken independently, such as in situations where a plume is known not to exist -would assist in the validation of the results of the analysis.

Data availability
The measured data used in this work are not publically available.The data are part of a field measurement undertaken using NPL's DIAL system to quantify the concentration level of methane in a plume produced by a methane source, and as such they are commercially sensitive.The data are used here to demonstrate and compare the described methods for the estimation of the background concentration level of methane in the proximity of the source.
Author contributions.Tom Gardiner, Rod Robinson, and Fabrizio Innocenti defined the models and provided the measured data from field measurements.Peter Harris, Nadia Smith, and Valerie Livina developed the analysis approaches, implemented the approaches in software, and processed the measured data to obtain numerical results.Peter Harris prepared the manuscript with contributions from all co-authors.

Figure 2 .
Figure 2. Path-integrated concentration data and straight-line model calculated using the two-step LLS approach.

Figure 5 .Figure 6 .
Figure 5. Transformed residual deviations for (parts of) the measured signals shown in Fig. 1 obtained using a multi-output autoregressive model of order four for the residual deviations shown in Fig. 3: (left) at off-resonant wavelength and (right) at on-resonant wavelength.

Figure 8 .
Figure 8. Values of the mean square error for (left) different values of α with β = β 0 , and (right) different values of β with α = α 0 , corresponding to GLS fits to data with indices m E to m HGH .The vertical dashed lines in the two graphs correspond, respectively, to the values α 0 and β 0 .

Figure 10 .Figure 11 .
Figure 10.Values of the transformed residual deviations for the signals at (left) the off-resonant wavelength, and (right) the onresonant wavelength, for the results of the GLS approach shown in Fig. 9.

Figure 12 .Figure 13 .
Figure 12.As Fig. 10 but for the results of the GLS approach shown in Fig. 11.

Figure 14 .
Figure 14.As Fig. 10 but for the results of the GLS approach shown in Fig. 13.

Figure 15 .Figure 16 .
Figure15.As Fig.9but for the GLS approach applied to the data with indices m LOW to m HGH and assuming a plume is contained in a window defined by indices m s and m e .The vertical lines show the endpoints of the window that is assumed to contain the plume.

Table 1 .
For each elevation angle θ, estimates of the parameters κ 1,k (top left), τ 1,k (top right), τ 2,k (bottom left), and κ 2,k (bottom right) defining the AR model in a characterisation of the noise in the two returned signals for different elevation angles.For each elevation angle θ , estimates of the elements of defining the AR model in a characterisation of the noise in the two returned signals.