Improvement of vertical velocity statistics measured by a Doppler lidar through comparison with sonic anemometer observations

Since turbulence measurements from Doppler lidars are being increasingly used within wind energy and boundary-layer meteorology, it is important to assess and improve the accuracy of these observations. While turbulent quantities are measured by Doppler lidars in several different ways, the simplest and most frequently used statistic is vertical velocity variance (w′2) from zenith stares. However, the competing effects of signal noise and resolution volume limitations, which respectively increase and decreasew′2, reduce the accuracy of these measurements. Herein, an established method that utilises the autocovariance of the signal to remove noise is evaluated and its skill in correcting for volume-averaging effects in the calculation of w′2 is also assessed. Additionally, this autocovariance technique is further refined by defining the amount of lag time to use for the most accurate estimates of w′2. Through comparison of observations from two Doppler lidars and sonic anemometers on a 300 m tower, the autocovariance technique is shown to generally improve estimates of w′2. After the autocovariance technique is applied, values of w′2 from the Doppler lidars are generally in close agreement (R2 ≈ 0.95− 0.98) with those calculated from sonic anemometer measurements.

Abstract.Since turbulence measurements from Doppler lidars are being increasingly used within wind energy and boundary-layer meteorology, it is important to assess and improve the accuracy of these observations.While turbulent quantities are measured by Doppler lidars in several different ways, the simplest and most frequently used statistic is vertical velocity variance (w 2 ) from zenith stares.However, the competing effects of signal noise and resolution volume limitations, which respectively increase and decrease w 2 , reduce the accuracy of these measurements.Herein, an established method that utilises the autocovariance of the signal to remove noise is evaluated and its skill in correcting for volume-averaging effects in the calculation of w 2 is also assessed.Additionally, this autocovariance technique is further refined by defining the amount of lag time to use for the most accurate estimates of w 2 .Through comparison of observations from two Doppler lidars and sonic anemometers on a 300 m tower, the autocovariance technique is shown to generally improve estimates of w 2 .After the autocovariance technique is applied, values of w 2 from the Doppler lidars are generally in close agreement (R 2 ≈ 0.95 − 0.98) with those calculated from sonic anemometer measurements.

Introduction
Various scanning strategies and measurement methods have been used to quantify turbulence characteristics from Doppler lidar (DL) observations.(Sathe and Mann, 2013) summarise the state-of-the-art DL turbulence measurement techniques and limitations with current observations, several of which are briefly described here.One method involves velocity structure functions, which can be calculated longitudinally along the beam or transversely across azimuths in sector plan position indicator (PPI) scans (e.g.Eberhard et al., 1989;Frehlich and Cornman, 2002;Krishnamurthy et al., 2011;Davies et al., 2004).Values of the horizontal wind variance (σ 2 u and σ 2 v ) can be calculated from range height indicator scans, by first separating the measured velocity into height bins and calculating the variance in velocity at each height (e.g.Banta et al., 2006;Pichugina et al., 2008).A novel six-beam technique proposed by (Sathe et al., 2015) can be used to calculate all six terms within the Reynolds stress tensor.
Quantifying vertical velocity variance w 2 and calculating vertical velocity spectra are some of the simplest and most direct measurements of turbulence that are possible with a DL, since no complex scanning strategies are required and w is being directly measured at high temporal resolution (≈ 1 Hz).These measurements have been used in many studies (e.g.Hogan et al., 2009;Lothon et al., 2009;Barlow et al., 2011;Shukla et al., 2014).For this measurement, T. A. Bonin et al.: Vertical velocity variance measurements from Doppler lidars the DL simply points at zenith and continually collects measurements of the vertical velocity w, for which w 2 , spectra of w, and other turbulence statistics of w can be calculated at every range gate over a user-defined time interval.These measurements are often used to derive other important planetary boundary layer (PBL) variables.Sensible and latent heat fluxes can be retrieved using w 2 and w skewness profiles (Gal-Chen et al., 1992;Davis et al., 2008;Dunbar et al., 2014).The mixing height can be determined from profiles of w 2 as the height where w 2 decreases below a threshold value (Pearson et al., 2010;Barlow et al., 2011).Integral time and length scales, which are critical parameters for turbulence schemes within numerical models, can be calculated from the autocorrelation of w (Lenschow et al., 2000;Lothon et al., 2006).Eddy dissipation rate can be estimated from the spectrum of w (O' Connor et al., 2010).With all of these above variables being derived from observed fluctuations of w, it is important to assess the accuracy of DL w measurements and their derived statistics.
The mean wind speeds computed from DL velocityazimuth display (VAD) or Doppler beam swinging (DBS) scans have been shown to compare well to those from anemometers, radiosondes, and radar wind profilers (e.g.Smith et al., 2006).In comparison to sonic anemometers, the sampling volume and averaging time of DLs is large (∼ 20 m and 1 s respectively); therefore DLs are unable to resolve smaller scales of turbulence.A diagram showing these effects is provided in Fig. 1.Additionally, DL data can be noisy when aerosol loading, and the signal-to-noise ratio (SNR) is small, largely due to limitations in accurately estimating the mean frequency of the returned signal (e.g.Frehlich and Yadlowsky, 1994).These two limitations have opposite effects on computed higher-order statistics such as w 2 .Noise increases computed w 2 and resolution volume effects reduce the values of w 2 measured by the DL compared to the true atmospheric variance.(Barlow et al., 2011) compared the standard deviation of w, σ w , with those from a sonic anemometer and found that the sonic anemometer generally observed larger values of σ w due to the higher sampling frequency.When the sonic anemometer data were averaged to match the frequency of the DL observations, the values of σ w from the DL and sonic anemometer were in better agreement, but considerable scatter still existed.(Fuertes et al., 2014) used measurements from three synchronous DLs to compute the three-dimensional wind vector at 0.5 Hz for comparison with sonic anemometer measurements, and showed that the DL and sonic measurements were in agreement when the sonic observations were filtered and downsampled to match the spatial and temporal sampling of DL measurements.
Both (Barlow et al., 2011) and (Fuertes et al., 2014) highlight that DLs are incapable of resolving turbulence on small spatial and temporal scales, as shown in Fig. 1, but do not offer corrections for these limitations in the DL measurements.The DL is able to resolve many of the larger turbulent eddies, but vertical velocities associated with eddies smaller than the range-gate size, such as those shown in grey, cannot be resolved.Many of the smaller eddies may be captured by the sonic anemometer, since their resolution volume is much smaller than the DL.Image is not to scale.(Hogan et al., 2009) attempt to correct for underestimates of w 2 by extrapolating the power spectrum out to higher frequencies that cannot be resolved by the DL, but do not have sonic anemometer measurements to which these corrected DL w 2 values can be compared.While this method may correct for the inability of DLs to capture smaller scales of turbulence, appropriate techniques for removing noise from DL observations when SNR is low were not discussed.Herein, we propose to use the autocovariance method discussed by (Lenschow et al., 2000) to correct for the effects of both noise and the resolution volume and accurately determine the value of the radial velocity variance.In this case, the analysis focuses on measurements when the lidar beam was pointed vertically.(Lenschow et al., 2000) originally proposed this method as a means of measuring higher-order moments in noisy data, and the technique will be discussed in detail in Sect.3.
Few studies compare lidar-derived second-and third-order statistics that utilise the autocovariance method with in situ measurements.(Turner et al., 2014a) has compared Raman lidar estimates of water vapour variance and skewness with those from aircraft observations and showed general agreement in the profile shape, but significant differences in the observations were also apparent.Some disagreement in the measurements was partially attributed to sampling differences.(Lenschow et al., 2012) compared normalised profiles of DL-derived vertical velocity variance, skewness, and kurtosis with those from other experiments and generally found good agreement.However, to date, the accuracy of these higher-order statistical values from lidars has not been closely evaluated by comparison with other in situ observations.
Herein, we analyse the applicability of the autocovariance method of retrieving variance values.DL measurements and derived estimates of w 2 are directly compared with those from sonic anemometers to address the following questions.
1. What are the optimal parameters that should be used when applying the autocovariance method?How sensitive are the derived statistics to these parameters?
2. What scales of turbulence can DLs explicitly resolve?Can the autocovariance technique be used to correct for the limitations of time and volume averaging?
3. How robust are DL-derived estimates of vertical velocity variance, and how does the accuracy of these change with height and for different stability regimes?
A description of the instrumentation used and the experiment, including weather conditions during the measurements, is provided in Sect. 2. The autocovariance technique and the ideal number of lags used in its application is described in detail within Sect. 3. Within Sect.4, the effect of temporal averaging and number of lags used in the fitting is discussed using time-averaged sonic anemometer data.Comparisons of DL and sonic anemometer measurements and derived statistics are presented within Sect. 5. Potential additional applications of this technique and the need for other intercomparison studies are discussed in Sect.6.A summary and the main conclusions are provided in Sect.7.

Experiment and instrumentation
Measurements used in this study were collected during the Lower Atmospheric Thermodynamics and Turbulence Experiment (LATTE), which was conducted at the Boulder Atmospheric Observatory (BAO) from 10 February to 28 March 2014 with a small extension to 28 April 2014.The BAO is located 25 km east of the foothills of the Rockies within gently rolling terrain near Erie, CO, USA.The BAO has a suite of permanently and semi-permanently installed meteorological and boundary-layer instruments, such as sodars, a ceilometer, and a 300 m instrumented tower.More complete details of the BAO facility and the surrounding terrain are discussed by (Kaimal and Gaynor, 1983).In addition to these permanently installed instruments, several Doppler lidars and an unmanned aerial system were deployed at the site by Lawrence Livermore National Laboratory (LLNL) and the University of Oklahoma (OU).Additionally, the National Center for Atmospheric Research (NCAR) deployed a new 449 MHz wind profiler (see Lindseth et al., 2012) at the site for validation of wind and reflectivity measurements.
One of the primary objectives of LATTE was to measure and validate PBL three-dimensional turbulence fields with multiple Doppler lidars, sonic anemometers, an unmanned aerial system, and radar.Several different DL scanning strategies outlined in Sect. 1 were tested for comparison of turbulence measurements with those collected from sonic anemometers, which have been compared by (Newman et al., 2016).Herein, we focus on measurements taken between 17:00 UTC (11:00 local standard time, LST) 26 March and 15:00 UTC (09:00 LST) 28 March, when two DLs were placed within 2 m of the sonic anemometer booms on the 300 m tower.Since these measurements are in close proximity with each other, vertical velocity statistics calculated from the DL measurements can be directly compared with those from sonic anemometers.A summary of the instruments used within this study is provided in Table 1, and they are described in more detail below.

Boulder atmospheric observatory tower
The 300 m BAO tower is permanently instrumented with cup anemometers, temperature, humidity, and ozone sensors at 10, 100, and 300 m on booms extending to the southeast (154 • ) of the tower.In addition to these measurements, six sonic anemometers were temporarily installed on booms on both the south-eastern and north-western (334 • ) sides of the tower.The sonic anemometers were equally spaced every 50 m in height, with the lowest mounted at 50 m and the highest at the top of the tower at 300 m.Six R. M. Young 3-D sonic anemometers (model 81000, R. M. Company, Traverse City, Michigan, USA) provided by OU were installed on the north-western booms, and these were sampled at 30 Hz. NCAR provided the Campbell Scientific 3-D sonic anemometers (CSAT 3, Campbell Scientific, Logan, Utah, USA), which sampled at 60 Hz and were installed on the south-eastern booms.A spike filter was used to remove erroneous measurements, in which data points that were farther than 3 standard deviations of the mean, calculated over 30 min windows, were removed.

Doppler lidars
Two DLs, a LLNL WindCube v2 (henceforth LLNL WC) and an OU Halo Streamline (henceforth OU DL), were deployed next to the base of the 300 m tower from 26 to 28 March.The WC was situated to the north-west of the tower, while the OU DL was deployed south-east of the tower.The DLs were located a few metres from the ends of the booms on the tower so that the beam would not be obscured.For the analysis, turbulence statistics from each DL were generally compared with those from the sonic anemometers above each lidar.Specifically, statistics from the LLNL WC are compared with those from the R. M. Young anemometers, and statistics from the OU DL are compared with those from the CSAT 3 anemometers.Similarly to the sonic anemometers, a spike filter was used to remove radial velocity measurements that were more than 3 standard deviations of the mean over a 30 min window.

OU Halo Streamline
The OU DL uses a pulsed, heterodyne 1.5 µm laser to detect backscattered energy from aerosols within the atmosphere and to determine the radial velocities along the laser beams from the Doppler shift in the received signal.The range gate size is user-adjustable, with a minimum spacing of 18 m that was used during this portion of LATTE.The smallest range gate spacing of 18 m was chosen to minimise the effects of volume averaging.The focus was set at 300 m, which is the minimum possible focus length, so that generally the largest SNR and highest-quality data are at that height.Since the aerosol content of the air was generally low during the experiment, as predominantly westerly winds advected clean air from the Rocky Mountains over the BAO site, the noise in the measurements tends to increase significantly for measurements closer to the surface and farther from the focus height.Details about the Halo Streamline hardware, specifications, and theory of operation can be found in (Pearson et al., 2009).

LLNL WindCube v2
The LLNL WC was designed primarily for wind energy applications and thus continuously conducts DBS scans for retrieval of horizontal winds within the lowest 200 m of the atmosphere.The DBS scan consists of consecutive beams off-zenith pointing north, east, south, west, and followed by a vertically pointing beam.largest at 50 m and decrease with height; thus the highestquality measurements are closer to the surface than those from the OU DL.

Meteorological conditions
To document the general meteorological conditions, mean temperature, wind speeds, and bulk Richardson number (Ri) at three heights during the 2-day period when the DLs were located next to the towers are shown in Fig. 2. The value of Ri, which is used as a proxy for stability, is calculated as where g is gravity, θ is the potential temperature, θ is the difference in temperature between the measurement height and 10 m, z is height, z is the difference in sampling height (i.e, z−10 m), and V is the wind speed at the height at which Ri is calculated, implicitly assuming a no-slip condition for which the surface wind speed is zero.
During a significant portion of the observational period, the wind speed in the lowest 300 m was greater than 5 m s −1 .Particularly high winds occurred around 03:00 UTC on 27 March.Since the wind direction was westerly (wind direction not shown), these were likely associated with downsloping flow that frequently occurs in this area on the lee of the mountains, which transports momentum downward from higher in the troposphere (e.g.Brinkmann, 1974).Conditions predominantly supported weakly stable and neutral stability during the study period, as indicated within Table 2. Stable conditions (Ri > 0.25) were observed for several hours on 27 March, but strongly unstable conditions where Ri < −0.1 were never present.
3 Correction of lidar variance values (Lenschow et al., 2000) discusses a methodology of estimating second-to fourth-order moment values within noisy measurements, with a focus on observations taken by various types of lidars.This technique has been used in numerous studies to estimate second-and higher-order moments of water vapour from Raman lidars and differential absorption lidars (DIALs) (e.g.Machol et al., 2004;Wulfmeyer et al., 2010;Turner et al., 2014a, b), temperature from Raman lidar (e.g.Behrendt et al., 2015), ozone from a DIAL (e.g.Machol et al., 2009;Alvarez II et al., 2011), velocity from Doppler lidars (e.g.Grund et al., 2001), and it was also extended to correct eddy-covariance flux measurements of trace gases (e.g.Mauder et al., 2013;Peltola et al., 2014).This technique has been used to study higher-order statistics in the convective boundary layer (e.g.Muppa et al., 2016;Van Weverberg et al., 2016); it has also been used to remove random noise from routine measurements of variance, including within the stable boundary layer (e.g.Tucker et al., 2009;Pal et al., 2013;Schween et al., 2014;Bonin et al., 2015).While this methodology has been used within many studies, to the authors' knowledge there are no in-depth evaluations of corrected and uncorrected turbulence statistics, using the (Lenschow et al., 2000) technique, of any quantity measured by a lidar against in situ observations.Herein, we evaluate the applicability of this technique to extract accurate variance estimates from lidar measurements.

Overview of method
The method described by (Lenschow et al., 2000) to obtain turbulence statistics in noisy data is outlined here.The second-order autocovariance function (M 11 ) of a stationary time series is defined as where w is an autocorrelated variable (herein, specifically vertical velocity), is contamination from random white noise, t is the time-lag, and primes denote deviations from the mean.If the noise is uncorrelated, as is expected with lidar measurements, the cross terms become small and negligible, thus at a lag of zero This relationship shows that the measured variance by the lidar M 11 (0) is the result of both the true atmospheric variance w 2 and the noise variance in the returned signal 2 .By assuming that w 2 is largely due to isotropic turbulence within the inertial subrange (Monin and Yaglom, 1979), which is generally true within the PBL except when gravity waves are present, the expected autocovariance function M * 11 is in which C is a parameter related to eddy dissipation since w is a component of the velocity.Henceforth, the fitting of Eq. ( 4) will be referred to as the "structure function fitting", as the 2/3-power within Eq. ( 4) ultimately stems from Kolmogorov's structure function (Kolmogorov, 1941).By treating both w 2 and C as unknowns and fitting M * 11 to the observed M 11 at lags within the inertial subrange, estimates of w 2 and 2 can be made wherein Using this relationship implicitly requires that Taylor's frozen hypothesis is valid (Taylor, 1938), meaning that turbulent eddies do not evolve over time as they pass through the resolution volume for timescales over which the fitting of Eq. ( 5) is applied.This method requires that at least a portion of the inertial subrange is resolved by the instrument.For many studies that have been performed (e.g.Wulfmeyer et al., 2010;Turner et al., 2014b;McNicholas and Turner, 2014), the authors have manually ensured that the lower-frequency portion of the inertial subrange is resolved.However, it is not always possible to manually ensure that the inertial subrange is partially resolved for routine measurements, wherein systems are running continuously for long durations and variance profiles are constantly produced through an automated routine.Thus, herein the applicability of this technique is evaluated for all conditions and time periods.

Number of lags for fitting
For the most accurate and robust estimates of variance or higher-order moments, the proper number of lags to use for the fitting is not well known or trivial.Within previous studies that use this method, the number of lags is not stated or is a seemingly arbitrary number that the authors determined were within the inertial subrange (e.g.Lenschow et al., 2000;Wulfmeyer et al., 2010).The fitting has been typically applied using the first lag up through a maximum lag time (τ max below) that ranges from 12.5 s (e.g.McNicholas and Turner, 2014) to over 100 s (e.g.Behrendt et al., 2015).Ideally, the smallest lag used in the fitting should correspond to the timescale at which contributions to M 11 from turbulent eddies that cannot be explicitly resolved become negligible.This can also be thought of as the timescale corresponding to the smallest-scale eddies that can be resolved by the lidar.Here, we define the ideal lags to use in the fitting, based on the spatial resolution of the instrument and turbulence characteristics.The smallest lag τ min is defined as where r is the size of the range gate or resolution volume and V is the mean horizontal wind speed.Assuming that turbulence is isotropic, this ensures that τ min is large enough that eddies of the same size or smaller than the resolution volume will not negatively affect the fitting, which would lead to underestimates of the true M 11 at short lags,.Within this study, the maximum value of τ min is set to 8 s for time periods when the wind speed is small.This maximum for τ min is somewhat arbitrary, but a maximum value is needed since τ min → ∞ as V → 0. Since the evolution of turbulent structures becomes significant for larger lag times (Higgins et al., 2013), this limit for τ min is a compromise between minimising the effects of both volume averaging and time evolution of turbulence.
The largest lag should be within the inertial subrange, but not so long that frozen turbulence cannot be safely assumed.We define the largest lag τ max as where t int is the integral timescale.Under convective conditions, generally τ max = t int 2 .However, under more stable conditions, computed values of t int often become much larger than the typical timescales within the inertial subrange due to the small value of w 2 .Under these conditions, the time at which M 11 = M 11 (0) 2 is used for τ max instead.Values of t int are defined as wherein w 2 is M * 11 (0), and t int and w 2 are both iteratively solved.Within the integral, the value for M 11 (0) is replaced with M * 11 (0) to remove the contamination of uncorrelated noise.Values of t int were also calculated by replacing M 11 with M * 11 for the lidar data (not shown), but those values of t int were biased low compared to the computed t int from the sonic anemometers.Values of t int calculated from the DL observations compared with those calculated from sonic anemometer measurements are shown in Fig. 3. Generally, the values of t int calculated from the OU DL are in better agreement with those derived from anemometer measurements than with those from the LLNL WC.This is likely due to the faster sampling rate of the OU DL; therefore the time between lags (dt) is shorter in the numerical integration.Considerable scatter is evident in the estimates of t int due to differences in the values of M 11 at various lags, as discussed in Sect.5.2.If the conditions are such that τ min is greater than τ max , such as in the stable boundary layer with weak winds, then τ max is set to be one lag more than τ min .Typical values of τ min and τ max that were used for the OU DL during LATTE are shown in Table 3.
When using the autocorrelation fitting to determine the values of w 2 and 2 , it is expected that M * 11 (0) (i.e.w 2 ) is less than M 11 (0), and that 2 is the positive difference of M 11 (0) and w 2 .An example of this is shown in Fig. 4a, in which the fitting of Eq. ( 4) leads to a smaller estimate of w 2 than M 11 at lag zero.However, we observe that under periods of strong turbulence and high SNR, estimates of w 2 can be greater than M 11 (0) as shown in Fig. 4c.Using the definition in Eq. ( 6), values of 2 are negative in these cases, which is physically impossible since 2 in the signal is always positive or zero.To our knowledge, this behaviour has not been discussed in any previous studies.We attribute this negative error to spatio-temporal-averaging effects, for which the smaller scales of turbulence cannot be properly captured by the DL and the true noise in the measurements is small.This is also apparent within the spectra within Fig. 4d, where the Doppler lidar power spectrum density is smaller than that of the sonic anemometer at high frequencies (f > 0.1 Hz).Sampling errors are not likely to cause this discrepancy due to the fact that the lidar beam was within a few metres of the sonic anemometer, and the same averaging time was used.This underestimate generally occurs when τ min is large (> 4 s), which is the case when averaging effects are significant due to the slow advection of eddies through the measurement volume.The accuracy of the fitting under conditions when M * 11 (0) is both greater and smaller than M 11 (0) is discussed in Sect. 5 through comparison with sonic anemometer measurements.
Numerous studies discuss the importance of considering the averaging time when measuring turbulence statistics (e.g.Lenschow et al., 1994;Mahrt, 1998;Hollinger and Richardson, 2005).These errors are related to the representativeness of statistics from a single point measurement to the PBL as a whole, and need to be considered when making generalisa-   tions about the PBL from single-point measurements.However, they are outside of the scope of this study.Since measurements from the DL and sonic anemometer were taken within a few metres of each other, which is smaller than the resolution volume of the DL, measured statistics are expected to be very similar to each other and errors due to the spatial separation of the instruments should be minimal.Any differences in statistics of w between the two instruments are more likely due to differences in sizes of sampling volumes and measurement principles.

Effect of temporal averaging and number of lags for fitting
As discussed in Sect.3.2, the proper number of lags to use for the fitting of the Eq.(4) has previously not been carefully evaluated.Here, the autocovariance method is applied using various time lags to identify the accuracy of estimated values of w 2 for differing numbers of lags.Measurements of w from the SE sonic anemometer at 50 and 300 m are averaged over 1 s (1 Hz) and 10 s (0.1 Hz) intervals to simulate the typical averaging times of DLs and DIALs/Raman lidars respectively.From these averaged time series, values of M 11 are calculated to test various lag times for the extrapolation.
Examples of M 11 calculated from the raw and averaged sonic anemometer measurements are shown in Fig. 5, with fittings of the structure function applied using the specified number of lags outlined in Sect.3.2.τ min is calculated assuming a range gate size of 20 m, simulating DL values of τ min .Values of M 11 (0) from the 1 and 0.1 Hz averaged data are smaller than those from the 60 Hz observations, since smallscale fluctuations are removed during the averaging.However, values of M 11 at larger lags, from the 1 Hz averaged and raw observations, are often very similar.Thus, the fitting of the structure function to the 1 Hz observations generally accurately models M 11 , and the extrapolation to zero lag is nearly identical to that from the raw time series.This clearly exemplifies how this technique can be used to increase the variance to a more accurate value when small-scale fluctuations are removed during averaging.Values of M 11 from the 10 s averaged data can be markedly different to those from the raw time series, especially when M 11 decreases quickly at small timescales as in Fig. 5a, e. Due to these differences, values of M 11 are not accurately modelled when fitting the structure function to the 0.1 Hz observations.
To further evaluate the effects of averaging time and lag time, estimates of w 2 either directly computed from the averaged time series or using M * 11 are compared with those from raw sonic anemometer measurements, shown in Fig. 6.Estimates of w 2 are calculated from the 0.1 and 1 Hz measurements using both 10 and 100 s of lag time and the previously defined number of ideal lags, wherein τ min ranged from 0.9 to 8 s and τ max varied between 2 and 41 s. (Lenschow et al., 2000) initially suggested using the first few lags for the fitting, thus 10 s of lag time is chosen to fulfil this suggestion.The 100 s of lag time shown is similar to those used when applying this method to DIAL or Raman lidar measurements (e.g.Behrendt et al., 2015), which is necessary since these observations are contaminated by significantly larger values of 2 and the sampling rate is much lower.
Using the previously mentioned ideal number of lags, the w 2 estimates using M * 11 from the 1 Hz averaged data are in close agreement with those calculated from the raw 60 Hz measurements for the entire range of w 2 .This demonstrates that the structure function fit accurately models the sonic autocovariance at short lag times, which has not been shown in previous studies.Furthermore, this indicates that the sonic data contain little noise after the spike removal mentioned in Sect.2.1, thus these sonic observations may be used for verification of DL measurements.Additionally, the lags defined in Sect.3.2 are appropriate for retrieving accurate estimates of w 2 .When comparing the estimates using M * 11 with di-rectly computed w 2 from the 1 s averaged time series (red in Fig. 6c, d), the measured value of the variance is clearly improved.At 50 m, the directly computed w 2 is biased low by ≈ 11 %, which is reduced to 2.5 % if M * 11 is used as the variance instead.At 300 m, the improvement is less dramatic since turbulence scales are often larger, and the time averaging does not remove as much as of the variance.Still, the low bias is removed if M * 11 is used instead of the directly computed w 2 .This improvement clearly shows and quantifies how the autocovariance technique can be used to correct for averaging effects, which are more important closer to the surface.
This agreement also shows that the random error that may arise from using varying numbers of lags, rather than a set number of lags, is minimal.If the random error were to change drastically based on the variable number of lags used in the fitting, there would be much larger scatter for some of the data points depending on the amount of lags that are used.However, since the linear regression line for the 1 Hz data (red line in Fig. 6) follows a nearly one-to-one relationship and coefficient of determination is ≈ 1, the fitting is good for all lag times used and the random error due to the variable lag time is small.
With the 0.1 Hz averaged observations, M * 11 is generally in agreement with the raw value (using 60 Hz data) of w 2 when using the ideal number of lags at 300 m, although greater scatter is apparent than for the 1 Hz data.In fact, using M * 11 instead of the directly computed w 2 for the 0.1 Hz data reduces the low bias from 16.6 to 3.5 %.However at 50 m, the value of w 2 is often biased very low (by > 40 %) since turbulent scales are much smaller than at 50 m and much of the variance is averaged out.This underestimate at 50 m is not improved by using M * 11 since the inertial subrange, which is typically small at 50 m, is not typically resolved after the 10 s averaging.
By using the first few lags of the autocovariance (i.e. 10 s lag time), values of w 2 are accurately retrieved at 300 m.When this amount of lag time is used at 50 m, the variance is generally underestimated by 15-20 %, as shown in Fig. 6a.Since the integral length scale decreases closer to the ground (Lenschow and Stankov, 1986), the inertial subrange becomes smaller and does not extend to lower frequencies.Thus, while 10 s of lag time may be appropriate at 300 m, it yields an underestimate of w 2 at 50 m; this demonstrates that it is best to use a variable lag time for the retrieval of variance, one that scales with the integral timescale for the given measurements at a certain height.Otherwise, profiles of variance may be biased at certain heights leading to erroneous conclusions or scalings.
When 100 s of lag time is used within the fitting of the structure function, estimates of w 2 are generally grossly underestimated regardless of the averaging time.This lag time should never be applied under stable conditions since the inwww.atmos-meas-tech.net/9/5833/2016/Red and brown dots are the estimate from sonic data averaged to 1 and 0.1 Hz respectively, after the fitting is applied using the specified number of lags discussed in Sect.3.2.Magenta and blue dots are estimates of sonic data averaged to 1 and 0.1 Hz respectively, after the fitting is applied using 100 s of lag time.Green dots are the estimates from the 1 Hz sonic data, using 10 s of lag time for the fitting.Comparison of directly computed w 2 at 50 (c) and 300 m (d) from the raw 60 Hz compared and averaged observations averaged to 1 (red) and 10 s (brown).Black line indicates a one-to-one relationship.The dashed lines show the linear regressions, with the equations for linear regression and coefficients of determination shown in the upper-left corner.ertial subrange is small.Additionally, a 100 s lag time is too large during unstable conditions, leading to underestimates of w 2 .Furthermore, since w 2 is underestimated, values of t int as defined in Eq. ( 9) will be overestimated, leading to the inaccurate interpretation that a larger number of lags is acceptable.

Comparison of vertical velocity statistics
Results presented herein utilise a 30 min window for the computation of M 11 and other vertical velocity statistics from the DLs and sonic anemometers.Averaging times of 10 min and 1 h were also considered and tested.The primary advantage of a longer averaging time is the reduction of sampling errors w 2 , which are errors due to sample representativeness of the population, calculated using formulations within (Lenschow et al., 1994).As a trade-off, a longer averaging window does not allow the resolution of rapidly changing features.During the study period, boundary layer conditions were often quickly evolving (see Fig. 2), thus a 1 h window was ruled out.The 30 min averaging window was chosen as a compromise to resolve changing conditions while ensuring the robustness of statistics by reducing the sampling error compared to a shorter averaging time.
Regardless, the results and statistics between sonic anemometer and DL observations that are presented herein did not significantly change depending on whether a 10, 30 min, or 1 h averaging time was used.The fact that the statistics did not change significantly depending on the averaging time is primarily due to only small differences in sampling volumes caused by the close proximity of the lidar beam to the sonic anemometer.Furthermore, this confirms that sampling errors are not an important source of differences in measurements between the lidar and sonic due to the short distance between sampling volumes.Quantitatively, sampling errors were found to be less than 5 % of w 2 for 84 % of all DL observations due to the long half-hourly averaging time in comparison to t int , which was typically of the order of 1 min.In fact, for 50 % of the DL estimates, the sampling error is less than 1 %.Thus, throughout the rest of this study, sampling error is generally not a significant source of error and discrepancy between the DLs and the sonic anemometers, especially since the measurements were taken within such close proximity of each other.If the sonic anemometer and DL were farther apart, these errors would be much more important.
As thoroughly discussed by (McCaffrey et al., 2016), the 300 m BAO tower itself affects the flow around it, most notably downwind of the tower where a significant wake exists.To eliminate observations possibly waked by the tower, observations are removed when the upwind direction was within ±45 • of the tower.Measurements are also removed during time periods when precipitation or virga are evident on ceilometer measurements at the BAO site, since precipitation affects the DL measured w.Since the method discussed in Sect. 3 quantifies and removes noise, no explicit SNR filter was used to remove observations.All measurements of w from the DLs, regardless of the SNR value, are used in the computation of w 2 .However, DL data are only considered during time periods when the estimated 2 is smaller than the threshold.Several different threshold values were evaluated, and a threshold of 1 m 2 s −2 was a good compromise between keeping data for which accurate w 2 statistics can be retrieved and eliminating meaningless results.Threshold values based on the ratio of 2 to w 2 were evaluated, but no threshold ratio could be found that both kept accurate values of low w 2 and removed inaccurate values of high w 2 .

Comparison of time series of w from DLs and sonic anemometers
Example time series of w from each DL compared with similar measurements from sonic anemometers are shown in Fig. 7 for both convective and stable periods.Generally, measurements from both the sonics and DLs show similar trends in how w varies over time.Maxima and minima of w occur at nearly the same time, which is particularly apparent in Fig. 7a, b where the fluctuations are much larger.The magnitude of each individual maximum/minimum of w is generally less in the DL observations compared to those from the sonic anemometers.The longer time and larger volume averaging of the DLs reduces the magnitude of its observed fluctuations.Additionally, both DLs do not resolve all of the fluctuations that occur at short timescales.Differences in sampling frequency of the OU DL and LLNL WC are evident, particularly in Fig. 7a, b.Since the LLNL WC has a lower sampling rate than the OU DL, turbulence statistics computed from its measurements are not as representative of the true atmospheric variance as those from the OU DL.
Observations during stable conditions, shown in Fig. 7c, d, show less agreement in the time evolution of w between the DL and sonic anemometer measurements compared to those during unstable conditions.This can be explained by the fact that during more neutral/unstable conditions atmospheric turbulence is governed by large turbulent eddies while dur-ing stable conditions much smaller eddies, which cannot be fully resolved by DL measurements, dominate.Differences in sampling volume and frequency thus have stronger effects during stable conditions, explaining why the scatter between the two different DLs becomes larger during stable conditions than during unstable conditions.

Comparison of DL, sonic spectra and autocovariance function
Several examples of M 11 and the spectra of w from the OU DL and LLNL WC, compared with similar statistics from the collocated sonic anemometer, are shown in Figs. 8, 9.The cases were chosen to show the accuracy of the DL spectra and M 11 under various atmospheric stabilities and different noise levels, when noise or spatiotemporal averaging effects are evident.Measurements from the sonic anemometer are averaged to replicate the averaging time and sampling frequency for each DL to show how they are affected by each of these parameters and evaluate the accuracy of the method discussed in Sect.3. Generally, the spectra for the OU DL show good agreement with those calculated from sonic anemometer measurements.Under all of the cases presented, the lower frequency end of the inertial subrange is captured by the OU DL, as a portion of the spectra follows the theoretical −2/3 line.However, at high frequencies (f > 0.1 Hz) within Fig. 8b, d, the OU DL spectra flattens out or increases, which is likely due to a combination of noise in the signal increasing the variance and spectral aliasing.
Noise within the OU DL spectra in Fig. 8f is large enough to cause an increase in the spectra at high frequencies.Additionally, within both the OU DL and downsampled sonic spectra shown in Fig. 8, the flattening of the spectra at the highest resolved frequencies is due to spectral aliasing from the smallest-scale turbulent motions being undersampled in time (see Kirchner, 2005).While the spectra of w from the OU DL are generally in agreement with those from the sonic anemometers, significant differences are apparent in M 11 computed from the two instruments (Fig. 8a, c, e).For example, within Fig. 8a, M 11 values are similar for short lags up to 10 s, after which M 11 computed from the OU DL is greater than that from the sonic anemometer.On the contrary, within the time period for Fig. 8e, the values of M 11 are in better agreement at larger lags (greater than 30 s), but the OU DL-derived M 11 is much lower than those computed from the sonic anemometers at shorter lags.The reasons for these differences are not clear, but are likely due to differing measurement volumes.The anemometer samples a volume of air precisely at 300 m, while the comparable OU DL measurement is centred at 297 m and averaged over a layer between 288 and 306 m.Thus, the exact measurements of w and its derived statistics are expected to be slightly different between the sonic anemometer and DL.Within all three cases shown in Fig. 8, M 11 (0) calculated from the OU DL measurements is smaller than that calculated from the sonic anemometer measurements.This indicates that, within these cases, the variance is underestimated by the OU DL.Using the structure function fit, the estimated values of the variance from the OU DL data are improved for the cases shown in Fig. 8a, c.Within Fig. 8e, the fitting leads to a smaller value of OU DL-derived w 2 due to the fact that noise is present, as shown in Fig. 8f.By applying the structure function fit to the sonic anemometer measurements that are averaged to simulate the OU DL observations, it is shown that the fitting uses a proper number of lags to estimate the expected M 11 to lag zero.The effect of averaging time on retrieved estimates of variance is discussed more thoroughly in Sect. 4.
Based on the presented spectra in Fig. 9b, d, f, LLNL WC w spectra are generally in good agreement at most frequencies with those derived from sonic anemometer measurements.Similarly to the OU DL spectra, the LLNL WC spectra are often larger than those from the anemometers at high frequencies due to noise in the signal and spectral aliasing of higher frequencies that are not resolved by the reduced sampling rate.The LLNL WC is often, but not always, able to resolve the lower-frequency portion of the inertial subrange.Within the convective conditions shown in Fig. 9d, f, the high-frequency region of the LLNL WC spectra follows the −2/3-law expected within the inertial subrange.However, within the time period shown in Fig. 9b, the inertial subrange is not resolved due to the fact that turbulence scales are small and that the sampling frequency of 0.25 Hz is not fast enough to capture the smaller turbulence scales.
Generally, the values of M 11 at various lags computed from either the LLNL WC or sonic anemometers are in agreement with each other.However, differences in M 11 do exist due to similar reasons to those discussed for the OU DL.For time periods when much of the lower-frequency portion of the inertial subrange could be resolved (i.e. in Fig. 9c, e), the structure function fitting yields an improved estimate for w 2 compared to the raw variance at M 11 (0), which is closer to the sonic-derived value.However, when only a small portion of the inertial subrange is resolved, the fitting of Eq. ( 4) poorly models the true values of M 11 at short lags and the estimated value of w 2 is grossly underestimated.This is expected, as the fitted function only applies within the inertial  subrange.For each fitting of M * 11 , the amount of time lag used in the fitting did not change for each of the cases shown herein.Since values of t int at 100 m were typically small during the experiment, the resulting values of τ max were also small.Since w was only sampled every 4 s, this inherently limited the values of τ min and τ max that could be used due to the discretization of the lag times.

Accuracy of DL variance estimates
A time series comparison of DL and sonic anemometer measurements of w 2 at 100 and 300 m is shown in Fig. 10.The DLs capture the evolution of the w 2 well, showing good agreement with the sonic anemometers.Both measurement systems capture the typical diurnal cycle, with w 2 increasing during the morning hours and decreasing in the afternoon, generally diminishing to weak turbulence overnight.Additionally, the DL systems capture several short duration events, such as the burst of turbulence at ≈ 02:30 UTC when the wind speed rapidly increased for the first time (Fig. 2).Generally, using M * 11 to measure w 2 only slightly increases its value compared to the raw variance.However, there are some time periods when increased noise is present and M * 11 provides a much better estimate than the raw vari-  Measurements shown are those calculated from the raw sonic observations (blue), sonic data averaged to match the lidar averaging time (red), and LLNL WC observations (green).Dashed lines overlaid on M 11 are the fittings of the structure function fit to the corresponding measurement.The solid black line in the spectra denotes the theoretical −2/3 slope in the inertial subrange.The fitting for the filtered sonic data uses the same lags as for the DL.M 11 and the spectra are computed over 00:30-01:00 UTC (a, b), 21:30-22:00 UTC (c, d), and 17:30-18:00 UTC (e, f) on 27 March.Observations in (a, b) are during weakly stable conditions (Ri = 0.1), while data shown in (c-f) are during near-neutral conditions (|Ri| < 0.05).Values of τ max and τ min for each time period is shown in the lower left of (a, c, e).Spectra are smoothed for clarity using a 5-point moving average.ance alone, such as 12:00 UTC on the 27 for the LLNL WC and 12:00 UTC on the 28 for the OU DL.Measurements of w 2 from the DLs are generally in less agreement with those from the sonic anemometers at night, especially as a relative error with respect to the magnitude of w 2 .
Comparisons of 30 min-averaged w 2 from the DL observations, which were either directly computed or estimated using the structure function fitting, and those from sonic anemometers are shown in Fig. 11.For both the LLNL WC and OU DL, using the structure function fitting to estimate values of w 2 generally provided more accurate and less bi-ased values, based on the higher values of R 2 , a slope of the best-fit line closer to one, and a y-intercept closer to zero compared to the values computed directly from the time series.While the improvement in w 2 estimates is not drastic, which is not surprising given the sampling rate and that noise is often not large, the high bias under weak turbulence is removed and the low bias under stronger turbulence (w 2 > 0.5 m 2 s −2 ) is reduced.The greater scatter in the LLNL WC measurements is attributed to its reduced sampling frequency.Since there are 3 s gaps in its measurements of w while it collects data at off-zenith angles for the DBS scan, values of w 2 computed from the LLNL WC are not as robust as those from the OU DL, which took w measurements continuously with a sampling frequency of 0.7 Hz.
Still, LLNL WC estimates of w 2 are generally in good agreement (R 2 ≈ 0.9) with those from the sonic anemometers and show low bias when using the autocovariance fitting.
For the OU DL, estimates of w 2 are generally improved when using the structure function fitting for the entire range of variance values.When the raw OU DL w 2 value is lower than the sonic value, the autocovariance technique generally increases the estimate of w 2 .Conversely, when significant noise is present and the raw OU DL w 2 is greater than the sonic value, the autocovariance technique generally reduces the estimate of w 2 , improving the estimate.This is particularly evident in Fig. 11b, where there are a cluster of points of uncorrected w 2 that are much larger than the true atmospheric variance.These overestimates are coincident with a time period when the OU DL SNR was reduced.For estimates of w 2 from the LLNL WC, directly computed values of w 2 are generally larger than those computed from the structure function fitting regardless of the magnitude of the turbulence.Again, the autocovariance fitting accurately removes the contribution of noise in the measured w 2 , which is particularly apparent when w 2 is less than 0.5 m 2 s −2 in Fig. 11d.This improvement is mostly during a time period of reduced SNR for the LLNL WC at 150 m.For time periods when turbulence is strong, the autocovariance fitting to the LLNL WC data often leads to reduced values of w 2 ; this could be due to increased noise in the LLNL WC observa-tions compared to those from the OU DL under similarly strongly turbulent conditions or the fact that the structure function fitting is not as accurate with a smaller number of lags used due to the reduced sampling rate of the LLNL WC compared to the OU DL.
As shown earlier, the DL must be able to resolve a portion of the inertial subrange in order to accurately extract measurements of w 2 .If part of the inertial subrange is not explicitly resolved when turbulence scales are small, then a proper fitting that is representative of how M 11 actually varies at small lags cannot be accurately applied.Regardless, even in these conditions when turbulence is weak, it is especially important to not simply use w 2 directly computed from the time series, as 2 is often a large component of the computed w 2 , as shown for small values of w 2 in Fig. 11b, d.Thus, even for these cases, applying the structure function fitting often provides more accurate estimates of w 2 , although M * 11 (0) values systematically underestimate the true variance.
For the OU DL, estimates of w 2 are generally in better agreement at 300 m than at lower heights, as shown in Fig. 11.In fact, during the 2-day observational period, w 2 is generally more underestimated at 150 m than at 300 m, reflected by the smaller slope of the best fit line (0.838 at 150 m compared to 0.943 at 300 m).While the reason for this is not entirely clear, it is thought that the more accurate measurements are made at higher heights due to the fact that eddies are larger further from the ground, which are better resolved by the DL.These differences in how the accuracy of lidar variance measurements change with height needs to be considered when evaluating how second-and higher-order statistics vary with height.

Effect of turbulence characteristics and stability
The relationship between the accuracy of turbulence parameters measured by both DLs and stability, specifically Ri, during those observations is shown in Fig. 12.During neutral/unstable conditions when Ri is close to zero or negative, the estimates of w 2 from both DLs are generally more accurate than those measurements during stable conditions.This is evident from the lower scatter and ratios of w 2 which are closer to one under unstable conditions for both uncorrected and corrected estimates.Additionally, especially for OU DL measurements, the corrected measurements during convective conditions are larger and more accurate than those that are uncorrected.When conditions are stable, there is substantially more scatter in the quality of the DL measurements and the improvement due to the structure function fitting is less clear.There are times when the correction technique improves the w 2 estimates, such as when significant noise is accurately removed.These time periods also tend to occur when SNR is reduced, as shown in Fig. 13.The method can  also lead to worse estimates of w 2 , when turbulence scales are small and the inertial subrange is not properly resolved by the DLs.Considering both of these factors, it is best, under stable conditions, to only use the extrapolation to estimate w 2 when the SNR falls below a certain threshold value, such as −17 dB for the LLNL WC or −19 dB for the OU DL.This threshold will vary based on the lidar system and operating parameters.

Discussion
Below, recommendations are made as to the implementation of this technique for use with DLs based on these results.Additionally, the importance of validation studies for measurements from various types of lidars is discussed.

Possible applications to other DL scanning techniques
Within Sect.5, it is shown that the autocovariance technique can be used to improve DL turbulence measurements, specifically w 2 , by removing noise and correcting for unresolved turbulence structures.This method could similarly be applied to measurements of other turbulent quantities.For instance, for a DL continuously pointing at a very low elevation (near zero) into the wind, values of σ 2 u can be derived by using a similar technique.Furthermore, this technique could be applied in conjunction with more advanced scanning strategies.For turbulence measurements using the six-beam scanning strategy (Sathe et al., 2015), variances are first computed for each of the six independent beams.The six components of the Reynolds stress tensor can be computed from the individual variances of the six beams.However, due to the equations for computing each component of the Reynolds stress tensor, the effect of noise within each individual measurement is magnified.In particular, if there is a large amount of noise in the vertical beam compared to other beams from differences in SNR, then negative values of σ 2 u and σ 2 v can be computed (Newman et al., 2016), which is not realistic.Thus, if the observations within each beam are taken at a quick enough sampling rate to resolve the inertial subrange, the autocovariance technique should be applied to variances calculated from each beam before computing the velocity variances.

Importance of validation studies for various types of lidars
Remote sensors, such as lidars, have the ability to measure various quantities throughout the atmosphere.However, it is imperative that these measurements are compared with in situ observations for validation.Through this, relative accuracies can be quantified, so that future measurements using only remote sensors can be correctly interpreted and utilised.
While the first careful analysis of w 2 estimates measured using a method proposed by (Lenschow et al., 2000) are presented here, further intercomparison studies of DL and in situ measurements are needed.Since this study was conducted over a short 2-day period in early spring, the atmospheric conditions were not representative of the wide range that may occur over the entire year, as unstable/convective conditions were not present.Additionally, the measurement comparisons all were within the lowest 300 m of the PBL.While this provided a larger overlap region than allowed by most conventional meteorological towers, it still only encompasses a fraction of the possible PBL depth.As discussed in Sect.5.3, there is evidence that biases in DL w 2 change with height, which needs to be investigated further.These biases will affect how the DL-measured profiles of w 2 vary with height.
In addition, measurements of higher-order moments from other types of lidars, such as DIALs and Raman lidars, should also be compared with in situ measurements.Due to the larger averaging time and often larger lag times used, errors associated with these lidars are likely higher and mea-T.A. Bonin et al.: Vertical velocity variance measurements from Doppler lidars sured quantities are likely biased low, as discussed in Sect. 4. (Turner et al., 2014a) compared Raman lidar-derived estimates of water vapour variance and skewness with those measured from aircraft, showing general good agreement in the trends in the profiles.However, the accuracy of the values of the DIAL and Raman lidar higher-order moments should be carefully evaluated.

Summary and Conclusions
Here, a method discussed by (Lenschow et al., 2000) for measuring higher-order statistics using autocovariances from lidar data is carefully evaluated.Specifically, estimates of vertical velocity variance and integral timescales derived from DL observations are compared with similar measurements from collocated sonic anemometers.Two DLs, a WindCube v2 and a Halo Streamline, were placed within a few metres of the 300 m tower at the Boulder Atmospheric Observatory in Erie, Colorado, USA from 26 to 28 March 2014.The tower was instrumented with sonic anemometers at 50 m intervals, up to 300 m, for validation and comparison with measurements from the DL.
The impact of several parameters on the accuracy and quality of lidar variance estimates is investigated using two methods.Firstly, sonic anemometer observations are averaged to simulate typical averaging times of different types of lidars, after which the autocovariance technique is used with various lag times to retrieve the variance values.Secondly, variances computed from the sonic anemometers are compared with those from the DL observations.Both were computed directly using the autocovariance technique.Through these comparisons, the following is shown.
-The amount of lag time used within the fitting of the structure function to the autocovariance is critical for accurate estimates of w 2 , and the number of lags leading to accurate retrievals of variance estimates are defined herein.Short lag times, for which the small-scale turbulent eddies are not accurately sampled (i.e. less than τ min ), should not be used when applying the fitting of the structure function.Long lag times, which are generally used when extracting higher-order moments from DIALs and Raman lidars, lead to gross underestimates of the true atmospheric variance.
-In addition to removing random noise, the autocovariance technique also corrects for limitations of spatiotemporal averaging of the measurements when calculating variance.Within future studies, this technique may be used to provide more robust and accurate estimates of variance, overcoming a low bias due to spatiotemporal averaging that has been documented by (Barlow et al., 2011) and (Fuertes et al., 2014).This will lead to more accurate profiles of vertical velocity variance for verifying theoretical profile scaling techniques.
-Generally, estimates of the vertical velocity variance from the DLs agree with those computed from sonic anemometers at the same measurement height, especially during unstable conditions.Small differences in the measurements can be attributed to differences in averaging volumes and averaging heights.For the Wind-Cube v2, more substantial differences in the measurements are due to the reduced sampling rate of the measurements, as vertical velocity is only measured every 4 s.
-When stable conditions are present, using the autocovariance technique improves estimates of variance when non-negligible noise is present, but often leads to underestimates of variance when little noise contamination is present.This is due to the fact that the inertial subrange is not sufficiently resolved under these conditions.Therefore, under stable conditions when variance is small, it is best to only use the covariance technique if the SNR falls below a certain threshold value for the instrument.
The importance of intercomparison studies for remote sensor measurements is highlighted.In particular, techniques for retrieving various derived statistics can be validated and refined through the intercomparison of remote sensor measurements with high-quality in situ observations.Limitations in the applicability of the techniques can be identified as well.Since it is shown that DL-derived turbulence measurements are generally improved by applying the autocovariance techniques, it is believed that this method can be applied to more measurements taken using more advanced scanning strategies, such as the six-beam technique.

Data availability
Temperature and mean wind observations from the 300 m tower are publicly available at http://www.esrl.noaa.gov/psd/technology/bao/ (NOAA, 2014).The high-rate raw sonic anemometer and Doppler lidar data are available upon request (timothy.bonin@noaa.gov).

Figure 1 .
Figure1.Diagram showing various scales of turbulence compared to the resolution volumes of the DLs and sonic anemometers on the tower.DL beams are denoted by red lines and range gates by black line segments over it.The DL is able to resolve many of the larger turbulent eddies, but vertical velocities associated with eddies smaller than the range-gate size, such as those shown in grey, cannot be resolved.Many of the smaller eddies may be captured by the sonic anemometer, since their resolution volume is much smaller than the DL.Image is not to scale.

Figure 3 .
Figure3.Comparison of t int as calculated from the sonic anemometer and DL M 11 values, using Eq.(9).Red shows values calculated from LLNL WC measurements and blue shows those calculated from the OU DL observations.Black solid line shows a one-toone relationship.The dashed lines show the linear regressions, with the equations for linear regression and coefficients of determination shown in the upper-left corner.

Figure 4 .
Figure 4. (a, c) The dashed black lines are example fittings of the structure (Eq.4) to M 11 values from the DL observations, which are shown by black dots.(b, d) Spectra from the lidar measurements (green) are compared with those from sonic anemometer measurements (blue) for the same periods shown in (a, c) respectively, with the black solid line denoting the theoretical −2/3 slope in the inertial subrange.Spectra are smoothed for clarity using a 5-point moving average.The fitting shown in (a) is expected when white noise is present, as the peak at lag zero is attributed to noise.Within (c), M 11 at lags zero and one are less than is expected from isotropic turbulence.This is attributed to the volume and time-averaging effects of the DL when noise values are very low, visible within the spectra in (d).

Figure 5 .
Figure5.(a, c, e) Sample M 11 from the 300 m 60 Hz sonic data (blue dots) compared with the sonic data averaged to 1 Hz (red dots) and 0.1 Hz (green dots).Dashed lines are the fittings of Eq. (4) to the filtered sonic data, using the specified number of lags discussed in Sect.3.2.Corresponding spectra are shown in (b, d, f), with the black solid line denoting the theoretical −2/3 slope in the inertial subrange.M 11 and the spectra are calculated over 02:00-02:30 UTC (a, b), 20:30-21:00 UTC (c, d), and 01:00-01:30 UTC (e, f) all on 27 March.Values of τ min and τ max are provided in the lower left.Spectra are smoothed for clarity using a 5-point moving average.

Figure 6 .
Figure 6.Relation of w 2 values computed from the raw 60 Hz 30 min sonic time series compared with those estimated as M *11 with the filtered sonic anemometer data at 50 (a) and 300 m (b).Red and brown dots are the estimate from sonic data averaged to 1 and 0.1 Hz respectively, after the fitting is applied using the specified number of lags discussed in Sect.3.2.Magenta and blue dots are estimates of sonic data averaged to 1 and 0.1 Hz respectively, after the fitting is applied using 100 s of lag time.Green dots are the estimates from the 1 Hz sonic

Figure 7 .
Figure7.Time series of w during unstable (a, b) and stable (c, d) conditions, with each stability being the same during 10 min time periods when the sonic anemometer is not waked by the 300 m tower.OU DL data are at 300 m (a, c), and LLNL WC measurements are at 100 m (b, d).Red line shows the DL time series, while the blue line shows the comparable time series from the collocated sonic at the same height.Note that y axis scales for w are different in the top and bottom panels.

Figure 8 .
Figure8.Sample M 11 (left column) and the corresponding spectra (right column) averaged over different 30 min time periods for measurements at 300 m.Measurements shown are those calculated from the raw sonic observations (blue), sonic data averaged to match the lidar averaging time (red), and OU DL observations (green).Dashed lines overlaid on M 11 are the fittings of the structure function fit to the corresponding measurement.The solid black line in the spectra denotes the theoretical −2/3 slope in the inertial subrange.The fitting for the filtered sonic data uses the same lags as for the DL.M 11 and the spectra are computed over 05:30-06:00 UTC (a, b), 17:30-18:00 UTC (c, d), and 23:00-23:30 UTC (e, f) on 27 March.Observations in (a, b) are during stable conditions (Ri = 0.64), while data shown in (c-f) are during near-neutral conditions (|Ri| < 0.05).Values of τ max and τ min for each time period are shown in the lower left of (a, c, e).Spectra are smoothed for clarity using a 5-point moving average.

Figure 9 .
Figure9.Sample M 11 (left column) and the corresponding spectra (right column) averaged over different 30 min time periods for measurements at 100 m.The spectra are made by averaging 10 min spectra within the 30 min time period to reduce the noise within the spectra.Measurements shown are those calculated from the raw sonic observations (blue), sonic data averaged to match the lidar averaging time (red), and LLNL WC observations (green).Dashed lines overlaid on M 11 are the fittings of the structure function fit to the corresponding measurement.The solid black line in the spectra denotes the theoretical −2/3 slope in the inertial subrange.The fitting for the filtered sonic data uses the same lags as for the DL.M 11 and the spectra are computed over 00:30-01:00 UTC (a, b), 21:30-22:00 UTC (c, d), and 17:30-18:00 UTC (e, f) on 27 March.Observations in (a, b) are during weakly stable conditions (Ri = 0.1), while data shown in (c-f) are during near-neutral conditions (|Ri| < 0.05).Values of τ max and τ min for each time period is shown in the lower left of (a, c, e).Spectra are smoothed for clarity using a 5-point moving average.

Figure 10 .
Figure 10.Time series comparison of w 2 measurements from the OU DL at 300 m (a) and the LLNL WC at 100 m (b) over the experimental time period.For each lidar, the raw measurement is calculated directly from the w measurements and is identical to M 11 .Sunrise and sunset were approximately 12:50 and 00:20 UTC respectively.Data were removed at ≈ 07:00 UTC on 28 March due to contamination from precipitation.

Figure 11 .
Figure 11.Comparison of w 2 computed from DL observations with those from the sonic anemometer observations at different heights.Heights were chosen to highlight differences in the quality of observations with height, and where high-quality sonic and lidar observations are available.Observations from the OU DL are shown in (a, b), while LLNL WC measurements are shown in (c, d).Red denotes w 2 computed from the raw DL data (i.e.M 11 (0)), while blue is for values wherein w 2 is taken as M * 11 (0).Equations of the best fits are shown in the upper left of each plot, with R 2 being the coefficient of determination.Values of w 2 are averaged over 30 min windows.

Figure 12 .Figure 13 .
Figure 12.Relation of stability with error in lidar measured w 2 compared to w 2 computed from sonic anemometer measurements, for raw (red) and corrected (blue) measurements.Measurements from the OU DL at 300 m are shown in (a), while LLNL WC measurements at 100 m are shown in (b).For some time periods under stable conditions (e.g.Ri greater than 0.25, shown by vertical black line), uncorrected DL measurements have very high error, the ratio is greater than 2, and points are off the graph.During the study period, the conditions were predominantly near-neutral, which is why there are fewer data points during stable conditions.

Table 1 .
Overview of all instruments and their properties.

Table 2 .
Number of 30 min time periods under various stability conditions, based on the Ri calculated for 100 m.

Table 3 .
Typical values of τ min and τ max that are used during LATTE for the OU DL.