Evaluation of Turbulence Measurement Techniques from a Single Doppler Lidar

Measurements of turbulence are essential to understand and quantify the transport and dispersal of heat, moisture, momentum, and trace gases within the planetary boundary layer. Through the years, various techniques to measure turbulence using Doppler lidar observations have been proposed. However, the accuracy of these measurements has rarely been validated against trusted in situ instrumentation. Herein, data from the eXperimental Planetary boundary layer Instrumentation Assessment (XPIA) are used to verify Doppler lidar turbulence profiles through comparison with sonic anemometer measurements. 5 For 17 days at the end of the experiment, a single scanning Doppler lidar continuously cycled through different turbulence measurement strategies: velocity azimuth display, six-beam, and range height indicators with a vertical stare. Measurements of turbulence kinetic energy, turbulence intensity, and shear velocity from these techniques are compared with sonic anemometer measurements at six heights on a 300-m tower. The six-beam technique is found to generally measure turbulence kinetic energy and turbulence intensity the most accurately at all heights, showing little bias in its observations. 10 Turbulence measurements from the velocity azimuth display method tended to biased low near the surface, as large eddies were not captured by the scan. None of the methods evaluated were able to consistently accurately measure the shear velocity. Each of the scanning strategies assessed had its own strengths and limitations that need to be considered when selecting the method used in future experiments.

generated or parameterized in numerical weather prediction models and simulations. Scanning Doppler lidars are capable of addressing this need, by measuring vertical profiles of turbulent quantities throughout the entire PBL.
Turbulence has been measured using multiple Doppler lidars in recent years. Mann et al. (2009) performed the first intercomparison between triple-Doppler lidar turbulence measurements with sonic anemometry, and investigated the effects of volume averaging. Fuertes et al. (2014) used three Doppler lidars that were staring at the same point for 1-hr to retrieve a 0.5 Hz 5 timeseries of u, v, and w. From these measurements, all six components of the Reynolds stress tensor (u 2 , v 2 , w 2 , u w , v w , u v ) were directly calculated, showing good agreement with sonic anemometer observations. Newman et al. (2016a) used a similar method to measure velocity variances over several days, for comparison with other turbulence measurements.
Additionally, Newman et al. (2016a) extended the use of 'virtual towers', which are constructed using measurements from multiple lidar systems, to measure vertical profiles of turbulence quantities, whereas in previous work 'virtual towers' were 10 used to make profiles of the mean wind (e.g. Calhoun et al., 2006;Gunter et al., 2015). However, the long timeseries of staring data necessary to measure turbulence with meaningful significance (see Lenschow et al., 1994) precludes the possibility of making accurate turbulence profiles routinely, especially in the convective boundary layer where turbulent length scales are large. Doppler lidars are also expensive and many research groups do not have access to multiple systems. Hence, the focus in this paper will be on scanning strategies that provide vertical profiles of turbulence from a single Doppler lidar. 15 Over the years, several different techniques have been proposed to measure turbulence using measurements from a single Doppler lidar (Sathe et al., 2015a). A review of wind lidar turbulence measurements, through 2013, is provided by Sathe and Mann (2013). Since then, much of the recent work has focused on evaluating and improving turbulence measurements from the Doppler beam swinging (DBS) technique, which is often used by low-powered Doppler lidars typically used in the wind energy field (e.g., Lundquist et al., 2015;Newman et al., 2016b;Kumer et al., 2016). Herein, the DBS technique will not 20 be evaluated since the exact methodology for the retrieval of turbulence statistics is still under development, with corrections being applied to reduce the effect of variance contamination and other error sources. Instead, three other scanning strategies were evaluated to determine their accuracies for measurement of turbulent quantities. The earliest proposed method to measure turbulence kinetic energy (TKE), individual velocity variances (u 2 , v 2 , w 2 ), and covariances (u w , v w , u v ) from Doppler lidar observations is discussed by Eberhard et al. (1989), using the residuals of a velocity-azimuth-display (VAD) fitting on 25 a conical plan position indicator (PPI) scan. Banta et al. (2002Banta et al. ( , 2006 used vertical-slice range-height-indicator (RHI) scans to quantify the streamwise velocity variance u 2 within the PBL during nocturnal low-level jets. Vertical stares have been used in numerous studies (e.g. Pearson et al., 2010;Barlow et al., 2011;Bonin et al., 2015;Tonttila et al., 2015) to measure the boundary layer depth, vertical velocity spectra and variance, and dissipation. Recently, Sathe et al. (2015b) has proposed a method of using a six-beam strategy, which is essentially a modified DBS scan that typically uses five beams, to measure 30 velocity variances, covariances, and TKE. Each of these strategies will be described in more detail within Sect. 2, and evaluated herein.
Turbulence measurements are needed to address a range of problems, which involve different breadths of the turbulence spectrum. All applications require accurate measurements of fluctuations by the largest energy-containing turbulent eddies and at least the lowest wavenumbers of the inertial subrange. Within the inertial subrange the magnitude of the fluctuations drops off 35 2 Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-35, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 13 February 2017 c Author(s) 2017. CC-BY 3.0 License. quickly (exponentially) with increasing wavenumber, so high wavenumbers make correspondingly smaller contributions to the total variances (Taylor, 1938). Detailed studies of turbulence dynamics, which may include studies of inflows to wind turbines or turbulence generated by them, may require accurate representation of fluctuations over the entire turbulence spectrum from large-eddy to dissipation scales (e.g., Troldborg and Sørensen, 2014). For such studies, employing the best data acquisition strategies and understanding the errors involved is important. Other studies may not require this degree of precision. For 5 example, evaluating the ability of NWP models to predict TKE involves values of 2-4 m 2 s −2 in convective conditions and 1-2 m 2 s −2 in weakly stable conditions. Such accuracies are achievable without measuring the entire spectrum. Many field programs are employing scanning lidar remote sensing in arrays to investigate spatial and temporal variations of the mean wind, as recommended in Banta et al. (2013). In such cases, the measurement of turbulence is not the primary goal, so the dataacquisition and scanning approaches are not optimized for turbulence measurement. It is still desirable to obtain quantitative 10 turbulence information (e.g., for NWP verification) from the scans that are performed. It is essential to understand the error properties of these techniques to know whether the calculated values are useful for the intended purpose.
To systematically evaluate these different turbulence measurement techniques, a Doppler lidar cycled each hour continuously through the methods during the last two weeks of the eXperimental Planetary boundary layer Instrument Assessment (XPIA) field campaign. These measurements are compared with sonic anemometer measurements at six heights on a 300 m 15 meteorological tower located 540 m from the lidar. Through this comparison, the following questions will be addressed in this study.
-How accurate are the various single-Doppler turbulence measurement strategies in determining turbulence characteristics? Does the accuracy vary depending on the measurement height?
-What main caveats need to be considered when applying each technique? How should random errors and instrument 20 noise be characterized and treated?
-What is the optimal operational scanning strategy to derive turbulence estimates? Should different strategies be used for different objectives?
To address these questions, the paper is organized as follows. An overview of the experiment and the instrumentation used is detailed in Sect. 3. The various scanning strategies and methods to measure turbulence, including specific details of 25 implementation, are described in Sect. 2. Within Sect. 4, the techniques are statistically compared through validation with sonic anemometry. Implications for future studies and possible future research directions are discussed within Sect. 5. A summary and conclusions are provided in Sect 6.

Turbulence measurement strategies
The scanning procedures used most often by Doppler lidars are azimuthal scanning, elevation scanning, and stares at one look 30 angle. Each of these approaches can be used to measure one or more of the velocity variances and covariances. The theory for 3 Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-35, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 13 February 2017 c Author(s) 2017. CC-BY 3.0 License. turbulence measurements is based on the relationship between the observed radial velocity v r and the flow within the resolution volume given by wherein u is the streamwise horizontal velocity, v is the crosswise horizontal velocity, w is the vertical velocity, φ is the elevation angle above the horizon, θ is the angle between u and the azimuth of the lidar, and is uncorrelated random error in 5 the measurement. The value of typically increases with range from the lidar, as the signal-to-noise ratio (SNR) decreases. By squaring Eq. 1 and removing the mean from each quantity, the radial velocity variance is given by v 2 r = u 2 cos 2 θ cos 2 φ+v 2 sin 2 θ cos 2 φ+w 2 sin 2 φ+2u v sin θ cos θ cos 2 φ+2u w cos θ cos φ sin φ+2v w sin θ cos φ sin φ+ 2 , where the covariance terms involving are zero since it is uncorrelated. All of the turbulence measurements techniques are ultimately based on Eq. 2. Brief derivations and details of how these measurements are made, in addition to modifications 10 introduced within this study, are described here. Complete derivations for each method can be found in the works cited.

Velocity azimuth display
While PPI scans have been used to take accurate measurements of the mean wind through VAD analysis (Smith et al., 2006), these scans can also be used to quantify turbulence. Eberhard et al. (1989) details a technique for measuring turbulence from PPI scans, based on pioneering work by Wilson (1970) and Kropfli (1986) wherein turbulence is measured using Doppler radar 15 observations. From PPI scans at two sufficiently different elevation angles, all six components of the Reynolds stress tensor can be retrieved using the residuals of the VAD fitting by utilizing a partial Fourier decomposition of Eq. 2. However, the covariances and momentum fluxes u v , u w , and v w can be measured from any single PPI scan, and TKE can be obtained from a single scan if φ = 35.3 • (for mathematical basis, see Eq. 4a in Eberhard et al., 1989).
A sample PPI scan for a turbulent time period is shown in Fig. 1a. For each range ring, the mean wind speed and direction are 20 determined using VAD analysis. The complete VAD analysis described by Browning and Wexler (1968) includes terms for the vertical velocity of the scatterers as well as horizontal divergence, stretching deformation, and shearing deformation. However, a more simplified variation of the VAD analysis is often used by neglecting divergence and the deformation terms. For the results presented here, the simplified form is used since it yields more accurate estimates of the measured turbulent quantities when compared with sonic anemometer measurements. This may be due to variability from large turbulent motions being 25 incorrectly partitioned into the divergence or deformation terms. However, in complex terrain or other locations these terms may not be negligible. An example of the fitting of this equation and its residuals v r , which are deviations from the expected mean v r , is shown in Fig. 1b. If the mean flow (i.e., u, v, and w) is homogeneous over the scanning circle, then the residuals of the fitting are results of turbulent motions and . This is visualized within Fig. 1c, wherein coherent areas of positive and negative v r represent turbulent eddies. Since turbulent structures are correlated spatially, can be quantified and removed by applying a structure function fit to the autocovariance of v r across radials for a given range gate using the technique outlined by Lenschow et al. (2000). To our knowledge, this is the first time that the autocorrelation technique has been used to remove noise variance from a scan, as it is typically used for a timeseries from prolonged stares (Mayor et al., 1997). This technique can lead to an overestimate of when the inertial subrange is smaller than the distance between adjacent azimuths, which is 5 more likely at long ranges from the lidar as the spatial separation between adjacent beams increases.
Previously, measurements using the technique described by Eberhard et al. (1989) have not been evaluated against in situ observations. Wang et al. (2015) used a variation of this technique by applying it to a 30 • sector PPI and assumed isotropic turbulence to relate v 2 r to TKE. Estimates of TKE from the arc scan showed good agreement (r 2 = 0.89) with those from sonic anemometer on a linear scale. Other studies (e.g., Berg et al., 2013;Sathe et al., 2015b) have used a loose variation of this 10 VAD technique to quantify turbulence by using only a small number of beams (4 − 6) spaced around the entire 360 • , which is substantially different from using more than 100 beams around the sampling ring. Sathe et al. (2015b) propose a technique to measure all six components of the Reynolds stress tensor by continuously cycling between measurements at six different angles. One beam is vertical, and the other five are at a set elevation angle (45 • herein) 15 and are equally spaced 72 • apart in azimuth. For each beam, the time series of v r are linearly detrended over a fixed time window, which is 20-min here, and v r is computed as its residual. A shorter window or a higher-order detrending may result in effectively removing large-scale eddies. Values of v 2 r are computed for each beam separately. Thus, there are six known values of v 2 r , one for each beam, and each is a function of differently weighted velocity variances and covariances based on the scan elevation and azimuth, as in Eq. 2. This can be represented by the matrix relationship

Six-beam
where M is a 6 by 6 matrix of coefficients based on different combinations of θ and φ, as in Eq. 2. Thus, it is possible to solve for the six unknown components of the Reynolds stress tensor through an inversion of Eq. 3.

5
For each beam, the lidar stared at the given location for 1 s before advancing to the next position. Since data were collected at 2 Hz, two samples were collected 0.5 s apart. To remove uncorrelated noise 2 from the observed v 2 r for each beam, the autocovariance at the first lag for the samples that were 0.5 s apart was taken as v 2 r , following the technique presented by Lenschow et al. (2000). This likely results in a slight underestimate of v 2 r , since contributions from small eddies that are uncorrelated over short timescales are removed. In the future, it is recommended that more samples be collected along each 10 beam so that a structure-function or linear fitting may be applied to the autocovariance for a more robust measurement of v 2 r for each beam. On average, the scanner took ≈ 3.6 s to slew between beam positions, so that the scanner returned to the same beam every ≈ 27 s.

RHI scans and vertical stares
Shallow RHI scans have also been used as a means to measure horizontal velocity variances (e.g., Banta et al., 2006;Pichugina 15 et al., 2008b).These scans are conducted by scanning from the horizon up to ≈ 30 • , typically at two angles orthogonal to each other. Since the scans are mostly at low angles, it is assumed that the observed v r are due to the horizontal wind and that the contribution of w is negligible. To ensure that measurements at different elevation angles are comparable, values of v r are 20 where v rH is the radial velocity projected in the horizontal. For each RHI, observations are binned by height (30-m bins used herein) which are used to make a mean profile of v rH . The profile of v rH is used to calculate deviations from the mean flow v rH . This is done by simply taking the difference between v rH and v rH for the given height, where v rH is linearly interpolated between the center of mass of each height grid. The variance of v rH is calculated using the same height grid to produce a profile of v 2 rH , which is the horizontal wind variance within the RHI plane. An example of this process and each derived product is  When one of these scans is oriented with the mean flow and the other transversely, the two measured profiles of v 2 rH can be treated as u 2 and v 2 respectively. If the scans are not oriented in such a way or if large directional shear is present, it is possible to rotate the variances to be aligned with the mean flow by zero if turbulence is assumed to be isotropic. Thus, values of u 2 and v 2 can be computed directly through the rotation. The mean wind profile, including wind speed and direction necessary for the rotation, are directly computed using the two profiles of v rH . Using this technique, there is no straightforward way to remove contamination from 2 in the variances. Thus, data were removed if the SNR< −27 dB to reduce contamination from highly noisy data.
To calculate TKE, values of w 2 also need to be known. For quantification of w 2 , vertical stares were used in conjunction 5 with the shallow RHI scans. Vertical stares are the most straightforward method to measure any vertical turbulent quantity with a Doppler lidar. Since the w profile is continuously measured, it is simple to take the variance of time series of w to obtain w 2 . However, 2 contaminates the measurement and needs to be removed to improve the accuracy of the measurement.
As described earlier, the autocovariance technique described by Lenschow et al. (2000) is used to remove instrument noise.
Herein, values of σ 2 w are taken as the extrapolated −2/3 structure function fit to the autocovariance of the time series at lags 10 1-5. Using this technique removes contamination by 2 and mitigates volume averaging effects, which otherwise reduce the observed w 2 (Bonin et al., 2016).

Experimental overview
The eXperimental Planetary boundary layer Instrument Assessment (XPIA) field campaign was conducted at the now-defunct Boulder Atmospheric Observatory (BAO) in the spring of 2015. The primary goal of the XPIA experiment was to assess the ca-15 pabilities for measuring flow in the planetary boundary layer with current technology such as wind profiling radars, microwave radiometers, sonic anemometers, various Doppler lidar systems, and other additional instrumentation. Herein, observations from a single scanning Doppler lidar system are compared with those from sonic anemometers on a meteorological tower. Pertinent details about the sensors and site are described here, but a complete list of the instrumentation and a thorough description of the experiment along with its various objectives can be found in Lundquist et al. (2016). 20 The BAO was located in Erie, CO, approximately 25 km east of the foothills of the Rocky Mountains, and was designed primarily for PBL research as well as testing and calibration of various atmospheric sensors (Kaimal and Gaynor, 1983  (2016). Specifically, data from the NW and SE sonics are removed when the wind direction is from between 100-170 • and 300-20 • respectively. Turbulence statistics from the sonic anemometers were averaged over 20-min blocks for comparison, similar to the averaging time for the various lidar scanning strategies discussed below. Any 20-min averages where the statistics between the two sonic anemometers at the same height differed by a factor of 2 or more were removed, to ensure the statistics 5 were comparable and not affected by the tower.
The modest complexity of the terrain and the proximity of the site to the mountains present complications in calculating turbulence quantities from remote-sensing data using the techniques described here. Under these conditions, the flow can exhibit non-turbulent variability along a scan that contribute to unknown degree to the calculated variances and covariances producing larger variance and discrepancies between lidar and tower measured turbulent quantities. This variability is not 10 expected over more homogeneous topography (e.g., Pichugina et al., 2008a)  from clouds or other strong targets were occasionally apparent in the signal. These erroneous echoes were removed using a discontinuity-based algorithm described by Bonin and Brewer (2016 where U is the mean horizontal wind speed, is most often used as it is a measure of the variability of the inflow into the turbine and affects the design requirements (International Electrotechnical Commission, 2005). In boundary-layer meteorology and air 15 quality, TKE calculated as is often used as a measure of the turbulent mixing in the atmosphere (Stull, 1988;Arya, 1999). Additionally, the covariance terms u w and v w are the momentum flux and are necessary to test and validate models of atmospheric flow. Since these measures of turbulence are most commonly used, they are the focus of this section. A complete statistical comparison of each 20 measured variable is provided in Appendix A.

Turbulence Kinetic Energy
For the six-beam, VAD with multiple φ, and RHI/vertical stare techniques, TKE was directly computed as the sum of the measured velocity variances using Eq. 8. As discussed within Sect. A sample 24-hr time series of TKE is provided in Fig. 4 to demonstrate the ability of the different methods to capture temporal changes. The diurnal pattern of TKE decreasing overnight between 03:00-12:00 UTC is visible in measurements where x and y are data points on their respective axes in linear units and b is a constant. Transforming the equation back into linear space for ease of interpretation, the equation shown in the upper left of Fig. 5a, c, e, g is y = bx.

5
Each of the techniques evaluated herein generally shows skill in measuring TKE, as indicated by r 2 values greater than 0.6 in Fig. 5. Considering that the sonic anemometer and lidar measurements represent different spatial areas, which vary according to the scanning technique, and that each are subject to sampling error, the authors consider the correlation between the lidar and sonic TKE to be good. Quantifying the sampling error would allow a more statistical determination of it measurements are in agreement within their respective uncertainties. However, no techniques that the authors are aware of can be used to estimate 10 the sampling error from any of the scan types except the vertical stares (see Lenschow et al., 1994). Due to these limitations and for consistency, sampling errors are not estimated for any of the scans. Instead, the relative correlations between sonic The six-beam technique demonstrates the best ability to measure TKE overall, as evident by the largest r 2 and slope of 0.945, close to unity. Additionally, the histogram of the ratio of TKE measurements in Fig. 5f shows a distinct peak around 1 with reduced spread compared to Fig. 5b Fig. 5e. Upon manual inspection of these high outliers, many are due to contamination of range folded echoes. When range folded echoes appear intermittently, the v r time series within each beam position changes 10 erratically, resulting in an anomalously large variance, and spuriously increasing the observed TKE. The discontinuity-based algorithm used to detect range folded echoes largely relies on contextual information from proximate beams in time and space (Bonin and Brewer, 2016) not available from the six-beam technique. These anomalous echoes can typically be detected and removed in PPI, RHI, and stare scans, but these range folded returns persist through the quality-control process of the six-beam measurements and degrade the accuracy of the calculated variances .

15
As discussed in Sect. The low bias is more pronounced for the two-φ technique, although the scatter is reduced slightly as evidenced by the larger r 2 20 in Fig. 5c. This reduced scatter is attributed to smaller sampling errors, since twice the amount of data go into the measurement.
The overall low biases may be due to the inability to capture all the scales of turbulence; the largest eddies may not be fully captured and resolved within the scanning circle. This effect would be more pronounced for higher elevation PPIs, such as at 50.8 • , and explain the more significant low bias in TKE for the two PPIs used herein. If a lower φ were used (i.e., at 25 • ), the low bias may not be as pronounced for the two φ VAD TKE. (i.e., < 0.1 m 2 s −2 ), as apparent in Fig. 5g. The cause of the overestimate during weakly turbulent conditions is unclear, but it may be due to spatial variability of the flow that the sonic cannot detect or the inability to remove instrument noise effects.
These random errors are quantified and removed in the VAD and six-beam techniques as detailed in Sect. 2, but no established technique exists to remove these errors from RHI measurements, which may be addressed in a future study. Within Fig. 5h, this high bias under weakly turbulent conditions manifests itself as right-skewed distribution. The bias of TKE measurements using the RHI and vertical stare method is independent of height and small (Fig. 5a), as the slope is generally around 1. The TKE measurement becomes more accurate with height, indicated by the increase of r 2 with height in Fig. 5b. The cause for the increase in accuracy with height is unclear. The low bias of VAD and VAD 35.3 • TKE observations becomes less significant with height, as shown in Fig. 5a. Coincidentally, the VAD and VAD 35.3 • r 2 values 10 increase with height, representing less scatter and more accurate values at higher altitudes.
To examine the decrease in low bias and increase in accuracy of VAD TKE measurements with height, the VAD circle diameter and typical largest eddy size, the integral length scale l, are compared. First, the integral time scale t int is calculated from a linearly detrended 20-min timeseries of u from the sonic anemometer as  are shown as a function of height and with reference to PPI scan diameters in Fig. 7. Generally, individual 35.3 • and 50.8 • PPI scans do not fully sample the largest turbulent eddies close the ground, since the scan circle diameter is often less than l. The largest eddies are better captured by these scans at higher altitudes, especially for the 35.3 • PPI scan. At 300-m, the integral scale is less than the 35.3 • scan diameter over 90% of the time.
The results shown in Fig. 7 explain why VAD and VAD 35.3 • TKE measurements become more accurate and less biased 5 with height, as the largest turbulence scales are better captured. These effects are not important to the RHI method, since the spatial extent of the average is typically several km, much larger than the typical eddy size. Since the vertical stare and six-beam techniques use time series analysis, the largest scales of turbulence are observed if the time window length exceeds the integral time scale, which is often ∼ 10 − −100 s during daytime (Lenschow et al., 2000;Bonin et al., 2016).

10
Similar to the analysis of TKE presented in Sect. 4.1, measurements of TI measured from the six-beam, VAD (using 2 φ angles), and RHI techniques are compared and validated here. Since TI → ∞ as U → 0 following Eq. 7, TI is only calculated when U > 1 m s −1 . A sample time series of TI is shown in Fig. 8. The diurnal trend in TI is clearly visible in the sonic measurements, as TI is generally low (3-10%) at night until 12:00 UTC and TI is larger (20-70%) during the day. During the morning hours (i.e.,12:00-18:00 UTC), U was less than 2.5 m s −1 , causing TI to become large. Despite some scatter, TI 15 measurements from the Doppler lidar show a similar trend with smaller TI values at night and larger ones during the day.
Nonphysical negative u 2 values due to computational artifacts as described previously by Newman et al. (2016b) have been removed from the analysis. Measurements of TI at all heights over the entire experiment are summarized in Fig. 9. For each of the three techniques analyzed, the r 2 for TI is ≈ 0.2 lower than it is for TKE. This indicates the combined velocity variance components in TKE are more accurately measured than individual velocity variances separately (see also and w 2 comparison statistics). Still, the VAD, six-beam, and RHI techniques each show skill in measuring TI. The VAD and six-beam techniques perform comparably, having a similar r 2 and slope indicated a low bias. Sathe et al. (2015b) also show that the six-beam technique tends to underestimate u 2 . The RHI TI measurements show more scatter than the other two methods, given the lower r 2 , but showed little bias.
The slope and r 2 of the best fit line as a function of height is shown in Fig. 10. Similarly to TKE, VAD measurements of 5 TI are biased low near the ground as indicated by the slope of ≈0.7 at 100 m. By 250 m, the bias becomes small as most of the turbulence scales are resolved, as discussed in Sect. 4.1. The accuracy of the VAD TI measurement does not change significantly with height, as r 2 does not consistently depend on height. The six-beam TI measurement is biased consistently low regardless of height, as indicated by the slope of ≈ 0.83 at all heights, and the scatter of the measurements does not have a consistent trend with height. The slope of RHI TI tends to be larger near the ground and slowly decrease with height, as 10 evidenced in Fig. 10a. Coincidently, the scatter associated with these measurements decreases significantly with height, as the r 2 increases from 0.12 to 0.56.

Shear Velocity
The momentum flux terms u w and v w can be combined through the calculation of a stress velocity scale u * by

15
Of the techniques analyzed, only the six-beam and VAD methods have a theoretical basis for measuring the covariances u w and v w necessary to compute u * . Each PPI scan at any φ can independently provide a measurement of the covariances, so the u * values shown here are taken as the average of all PPI scan at both 35.3 • and 50.8 • . An example of a 24-hr time series  of u * is shown in Fig. 11. The sonic data are not shown for 00:00-03:00 UTC and 14:00 UTC, since the u * measurements on opposing booms were more than a factor of two different from each other, even though neither sonic was waked. Thus, neither is taken as a baseline measurement. For this sample period, the lidar and sonic data show a similar trend, values of u * decreasing for 00:00-12:00 UTC, rapidly increasing for 12:00-15:00 UTC, and remaining nearly constant after 15:00 UTC.
These trends are a result of u * steadily decreasing overnight, increasing in the morning hours, and remaining steady over the 5 day.
The comparison between sonic and lidar u * measurements are summarized in Fig. 12. Both the VAD and six-beam techniques generally overestimate u * , as shown in both the histograms and scatter plots. During time periods when the sonic estimated u * is small (i.e., < 0.1 m s −1 ), the lidar techniques predominately overestimate u * as indicated by the large number of data points above the one-to-one line in Fig. 12a, c. The small r 2 for the best fit lines indicates that there is substantial scatter 10 in the comparison of the lidar and sonic measurements. Thus, the six-beam and VAD methods show little skill in being able to accurately measure u * and the covariances, as shown in Table 2. These results do not significantly change with height (not shown), as the r 2 remains small for both methods at all measurement heights between 50-300 m. The accuracy of covariance 17 Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-35, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 13 February 2017 c Author(s) 2017. CC-BY 3.0 License. Figure 10. Same as Fig. 6 for TI instead of TKE. and u * measurements from the VAD and six-beam methods has not been evaluated in the past, but here the measurements are found to exhibit large error. Over simpler topography, Berg et al. (2013) present results from a DBS technique that produced more accurate measurements of u w and v w . 18 Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-35, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 13 February 2017 c Author(s) 2017. CC-BY 3.0 License. Figure 12. Same as Fig. 5 for u * instead of TKE. Lidar measurements are from TKE from two φ (a, b) and six-beam technique (c, d).

Discussion
From the results shown in Sect. 4, it is clear that TKE can be measured by each of the three techniques analyzed. However, measurements of each individual term of the Reynolds stress tensor are more difficult to accurately measure. The velocity covariances are particularly difficult to quantify, as the six-beam and VAD techniques show little skill in their measurement.
It is thought that the poor comparison for the covariance terms is due to the fact that the sampling error for the measurement 5 exceeds the covariance typical dynamic range. Based on sonic anemometer observations, 80% of |u v |, |u w |, and |v w | were < 0.1 m 2 s −2 . Also, covariance terms having small correlations take much longer to converge to a stable value (Lenschow et al., 1994).

Strengths and limitations of each strategy
Each of the scanning strategies evaluated herein has its own strengths and limitations. One of the biggest limitations for all 10 of the techniques except vertical stares is that turbulence is assumed to be homogeneous over the area of each scan. Thus, these techniques do not always work well in complex terrain or differential land use where turbulence can significantly vary spatially. For the VAD and RHI techniques in particular, spatial variations in the mean wind due to local drainage flows (e.g.,

19
Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-35, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 13 February 2017 c Author(s) 2017. CC-BY 3.0 License. Banta et al., 1997;Choukulkar et al., 2012) can result in large deviations from the spatially averaged mean wind. Since these methods are unable to differentiate turbulent deviations from mean deviations, turbulence is overestimated. In these situations, it may be possible to use arc segments from the PPI scans to compute TKE (Wang et al., 2015) over different radials where the mean flow is homogeneous. With the current technology, multi-Doppler measurements (i.e., Mann et al., 2009;Fuertes et al., 2014;Newman et al., 2016a) are best able to quantify turbulence at specific locations in complex terrain.

5
The spatial height resolution for the PPI, six-beam, vertical stare, and RHI scans largely depends on scan geometry. Direct measurements of w 2 from vertical stares can only be taken starting at the height of the lowest range gate and the spatial resolution is limited to the range gate size. Since the six-beam technique presented here and in previous studies (e.g., Sathe et al., 2015b;Newman et al., 2016b) uses a vertical beam, the spatial resolution is limited by that beam the same as vertical stares. Future studies may try removing the vertical beam and instead use six-beams all at φ = 45 • , or another φ, to take 10 measurements at a lower altitude. The vertical resolution of a PPI scan is dictated by its φ: a larger φ results in a higher minimum measurement height, reduced vertical resolution, greater height coverage, and reduced horizontal scan footprint compared to typical eddy size. The residuals in the PPI scans are more sensitive to w 2 for a larger φ, and are more sensitive to u 2 and v 2 for a smaller φ. The height resolution of an RHI scan is truly customizable, as u 2 and v 2 are computed by user-defined height bins. Since RHI scans typically start or end at the horizon, u 2 and v 2 can be calculated within a few 15 meters of the surface. On the other hand, this technique is especially susceptible to non-turbulent horizontal variability along the scan due to complex terrain and other effects, especially since small φ that cover large distances horizontally are used.
Although a 20-min averaging interval is used here for comparison, measurements for several of the techniques could be made much quicker. Turbulence statistics from vertical stares and the six-beam technique are computed through typical time series analysis; thus, the time series needs to be long enough to ensure that the largest turbulent eddies pass through the resolution 20 volume to be captured (see Lenschow et al., 1994), yet short enough that the flow can be considered stationary (Banta et al., 2013). Since the VAD and RHI methods compute turbulence through quantifying spatial variability, they are not subject to these same sampling error limitations when all scales of turbulence are captured in the scanning volume. With a data rate of 2 Hz, each PPI scan with 240 beams takes 2 min to complete; thus TKE can be measured in 2 min with a scan at φ = 35.3 • and each velocity variance and covariance can be measured from two scans, taking 4 min. Since each RHI scan can be conducted 25 in < 1 min, u 2 and v 2 can be measured in ≈ 2 min.
The methods presented here measure velocity variances, but none currently are able to distinguish atmospheric turbulence from submeso motions, including waves. Since the value of TKE is calculated from u 2 , v 2 , and w 2 , the observed TKE may be a mixture of turbulent and submeso variances and not always a measure of pure atmospheric turbulence. Considering these submeso motions have been predominantly documented within the nocturnal stable PBL when turbulence is typically weak 30 (Mahrt, 2014), the value of TKE defined as a measure of turbulent motion may be overestimated when waves are present.
Numerically differentiating between non-turbulent and turbulent motions is difficult (Stewart, 1969) and is best done through multiresolution decomposition or wavelet analysis (e.g., Cuxart et al., 2002;Vickers and Mahrt, 2003;Viana et al., 2010). This requires a high-resolution (> 1 Hz) time series. Thus, out of the methods analyzed, only the vertical stare has the data necessary separate turbulent from non-turbulent motions using established techniques.

Future directions for improving turbulence estimates
One of the main limitations of the six-beam technique using current commercially available scanning Doppler lidars is the return time between samples at the same beam position (≈ 27 s). When the six-beam technique was performed, the scanner spent 78% of the time slewing from one beam to the next. Thus a 2-axis hemispheric scanner is not be the best option for running the six-beam technique. A wedge scanner that can quickly rotate between beam positions is more appropriate, as the 5 time between beams could be minimized. This would yield a higher temporal resolution time series for each beam, enabling a better method of noise removal through a structure function fit (Lenschow et al., 2000) and possibly differentiating between turbulent and non-turbulent variances through multiresolution decomposition or wavelet analysis.
The RHI technique is best suited for measuring u 2 and v 2 near the surface (< 100 m), since the measurements need to be made at a low angle. However, there is currently no method to remove random errors from RHI measurements. Thus, it may be 10 better in the future to simply perform shallow horizontal stares where φ < 20 • to measure the horizontal variances. Removing noise and correcting for volume averaging effects would be straightforward (Bonin et al., 2016), and it also may be able to distinguish turbulence from non-turbulent motions.

Conclusions
The XPIA field experiment was conducted at the Boulder Atmospheric Observatory in the spring of 2015. For 17 days at 15 the end of the experiment, a Leosphere WINDCUBE 200S ® continuously alternated between a PPI, RHI, vertical stare, and six-beam scanning strategy. Measurements from each scan type were used to calculate components of the Reynolds stress tensor and other measures of turbulence. These Doppler lidar turbulence measurements were compared to those from sonic anemometers on a 300-m tower located 540 m from the lidar to evaluate the accuracy of each technique.
Overall, TKE and velocity variances (i.e., u 2 , v 2 , w 2 ) were more accurately measured by the six-beam technique. Six-beam 20 measurements showed the best agreement with the sonic-anemometer data across all ranges of turbulence magnitude. Additionally, the error and bias of the six-beam turbulence measurements did not significantly change with height. On the other hand, the VAD measurements of TKE and velocity variances tended to become more accurate with height. VAD-measured turbulence tended to be biased low near the surface, and this bias decreased with height. This bias is attributed to the inability of the PPI scan to resolve all scales of turbulence near the surface, since the largest eddies extend beyond the scanning circle. 25 The scanning volume geometrically becomes larger with height; thus, the PPI is better able to resolve all scales of turbulence and make more accurate measurements of turbulent quantities farther from the surface. Although the sonic anemometer observations agreed most poorly with RHI-measured TKE and TI, it showed little bias (slope of linear regression for TKE was 1.003) and still showed considerable skill in measuring turbulence. The inability to quantify and remove random errors from the RHI measurements led to an overestimate under time periods when turbulence was weak (TKE< 0.1 m 2 s −2 ). The methods 30 evaluated herein showed little skill in measuring u * and velocity covariances.
When selecting a scanning strategy in future experiment, one needs to consider the desired turbulence measurements. While the RHI technique may be the least accurate of the three evaluated, it is the only method that can obtain measurements just above the surface. If a rapid update time is desired (i.e., < 5 min), the VAD technique may best address these needs. Vertical stares and the six-beam technique use time series analysis to quantify turbulence. If the temporal resolution is sufficiently high, established techniques may be used to partition turbulent and non-turbulent variance, which no method currently exists for the RHI and VAD data.
Appendix A: Comparison of all measured turbulent quantities 5 A complete statistical comparison of all measures of turbulence is provided for reference in Table 2. For brevity, only a selected portion of these results were closely evaluated within the main body of the manuscript. While the results in Table 2 summarize measurements at all heights, the accuracy and bias of velocity variances and covariances as a function of height are similar to those presented in Sect. 4. The results have been summarized through comparisons of values on both a logarithmic scale (except for the covariances which can be negative) and a linear scale.
Sciences Division for supporting the instrumentation at the BAO. We express appreciation to the National Science Foundation for supporting the CALB deployments of the tower instrumentation (https://www.eol.ucar.edu/field_projects/cabl).