LiSBOA (LiDAR Statistical Barnes Objective Analysis) for optimal design of lidar scans and retrieval of wind statistics – Part 1: Theoretical framework

A LiDAR Statistical Barnes Objective Analysis (LiSBOA) for the optimal design of lidar scans and retrieval of the velocity statistical moments is proposed. LiSBOA represents an adaptation of the classical Barnes scheme for the statistical analysis of unstructured experimental data in N dimensional space, and it is a suitable technique for the evaluation over a structured Cartesian grid of the statistics of scalar fields sampled through scanning lidars. LiSBOA is validated and characterized via a Monte Carlo approach applied to a synthetic velocity field. This revisited theoretical framework for the Barnes objective analysis enables the formulation of guidelines for the optimal design of lidar experiments and efficient application of LiSBOA for the postprocessing of lidar measurements. The optimal design of lidar scans is formulated as a two-cost-function optimization problem, including the minimization of the percentage of the measurement volume not sampled with adequate spatial resolution and the minimization of the error on the mean of the velocity field. The optimal design of the lidar scans also guides the selection of the smoothing parameter and the total number of iterations to use for the Barnes scheme. LiSBOA is assessed against a numerical data set generated using the virtual lidar technique applied to the data obtained from a large eddy simulation (LES). The optimal sampling parameters for a scanning Doppler pulsed wind lidar are retrieved through LiSBOA, and then the estimated statistics are compared with those of the original LES data set, showing a maximum error of about 4 % for both mean velocity and turbulence intensity.


Abstract. A LiDAR Statistical Barnes Objective Analysis
(LiSBOA) for the optimal design of lidar scans and retrieval of the velocity statistical moments is proposed. LiSBOA represents an adaptation of the classical Barnes scheme for the statistical analysis of unstructured experimental data in Ndimensional space, and it is a suitable technique for the evaluation over a structured Cartesian grid of the statistics of scalar fields sampled through scanning lidars. LiSBOA is validated and characterized via a Monte Carlo approach applied to a synthetic velocity field. This revisited theoretical framework for the Barnes objective analysis enables the formulation of guidelines for the optimal design of lidar experiments and efficient application of LiSBOA for the postprocessing of lidar measurements. The optimal design of lidar scans is formulated as a two-cost-function optimization problem, including the minimization of the percentage of the measurement volume not sampled with adequate spatial resolution and the minimization of the error on the mean of the velocity field. The optimal design of the lidar scans also guides the selection of the smoothing parameter and the total number of iterations to use for the Barnes scheme. LiS-BOA is assessed against a numerical data set generated using the virtual lidar technique applied to the data obtained from a large eddy simulation (LES). The optimal sampling parameters for a scanning Doppler pulsed wind lidar are retrieved through LiSBOA, and then the estimated statistics are compared with those of the original LES data set, showing a maximum error of about 4 % for both mean velocity and turbulence intensity.

Introduction
Reliable measurements of the wind velocity vector field are essential for understanding the complex nature of atmospheric turbulence and providing valuable data sets for the validation of theoretical and numerical models. However, field measurements of wind speed are typically characterized by large uncertainties due to the generally unknown and uncontrollable boundary conditions (Braham, 1979), the broad range of timescales and length scales (Cushman-Roisin and Beckers, 1990a), and the complexity of the physics involved (Stull, 1988). Furthermore, the large measurement volume, which typically extends throughout the height of the atmospheric boundary layer, imposes on the experimentalists the selection of the sampling parameters as a trade-off between spatial and temporal resolutions.
Wind speed has been traditionally measured through local sensors, such as mechanical, sonic, and hot-wire anemometers (Liu et al., 2019;Kunkel and Marusic, 2006). Besides their simplicity, mechanical anemometers are affected by errors due to the flow distortion of the supporting structures, drawbacks under harsh weather conditions (Mortensen, 1994), and overspeeding (Busch and Kristensen, 1976). Furthermore, their relatively slow response results in a limited range of the measurable time-length scales, which makes them unsuitable, for instance, for measuring the turbulent flow around urban areas (Pardyjak and Stoll, 2017). Sonic anemometers can measure the three velocity components, with frequencies up to 100 Hz (Cuerva and Sanz-Andrés, 2000), in a probing volume of the order of 10 −4 m 3 , yet measurements might be still affected by the wakes generated Published by Copernicus Publications on behalf of the European Geosciences Union. 2066 S. Letizia et al.: LiSBOA (LiDAR Statistical Barnes Objective Analysis) -Part 1 by the supporting structures, such as met towers and struts, and they are sensitive to temperature variations (Mortensen, 1994). Hot-wire anemometers, although they provide a full characterization of the energy spectrum, require a complicated calibration (Kunkel and Marusic, 2006) and are extremely fragile (Wheeler and Ganji, 2010a). Furthermore, traditional, single-point sensors are unable to provide an adequate characterization of the spatial gradients of the wind velocity vector, which is particularly significant in the vertical direction (Cushman-Roisin and Beckers, 1990b). To overcome this issue, several anemometers arranged in arrays, and supported by meteorological masts, have been deployed in several field campaigns (Haugen et al., 1971;Bradley, 1983;Taylor and Teunissen, 1987;Emeis et al., 1995;Pahlow et al., 2001;Berg et al., 2011;Kunkel and Marusic, 2006).
Besides the mentioned capabilities, lidars present some important limitations, such as reduced range in adverse weather conditions (precipitation, heavy rain, fog, low clouds, or low aerosol concentration; Liu et al., 2019;Mann et al., 2018) and a limited spatiotemporal resolution of this instrument, namely about 20 m in the radial direction and about 10 Hz in sampling frequency. These technical specifications, associated with the nonstationary wind conditions typically encountered for field experiments, pose major challenges in the application of wind lidars for the statistical analysis of turbulent atmospheric flows.
In the realm of wind energy, early lidar measurements were limited to the qualitative analysis of snapshots of the line-of-sight (LOS) velocity, i.e., the velocity component parallel to the laser beam (Käsler et al., 2010;Clive et al., 2011). Fitting of the wake velocity deficit was also success-fully exploited for the extraction of quantitative information about wake evolution from lidar measurements (Aitken and Lundquist, 2014;Wang and Barthelmie, 2015;Kumer et al., 2015;Trujillo et al., 2016;Bodini et al., 2017). To characterize velocity fields with higher statistical significance, the time averages of several lidar scans were calculated for periods with reasonably steady inflow conditions Machefaux et al., 2015;Van Dooren et al., 2016). In the case of data collected under different wind and atmospheric conditions, clustering and bin-averaging of lidar data were carried out (Machefaux et al., 2016;Garcia et al., 2017;Bromm et al., 2018;Zhan et al., 2019Zhan et al., , 2020. Finally, more advanced techniques for first-order statistical analysis, such as variational methods (Xia et al., 2008;Newsom and Banta, 2004), optimal interpolation (Xu and Gong, 2002;Kongara et al., 2012), least squares methods , and Navier-Stokes solvers (Astrup et al., 2017;Sekar et al., 2018), were applied for the reconstruction of the velocity vector field from dual Doppler measurements.
Besides the mean field, the calculation of higher-order statistics from lidar data to investigate atmospheric turbulence is still an open problem. In this regard, Eberhard et al. (1989) re-adapted the postprocessing of the velocity azimuth display (VAD) scans (Lhermitte, 1969;Wilson, 1970;Kropfli, 1986) to estimate all the components of the Reynolds stress tensor by assuming horizontal homogeneity of the mean flow within the scanning volume, which can be a limiting constraint for measurements in complex terrains (Frisch, 1991;Bingöl, 2009). Range height indicator (RHI) scans were used to detect second-order statistics , spectra, skewness, dissipation rate of the velocity field, and even heat flux (Gal-Chen et al., 1992). Recently, in the context of wind radar technology, but readily applicable to lidars as well, a promising method for the estimation of the instantaneous turbulence intensity (i.e., the ratio between standard deviation and mean of streamwise velocity), based on the Taylor hypothesis of frozen turbulence, was proposed by Duncan et al. (2019). More advanced techniques exploit additional information of turbulence carried by the spectrum of the backscattered lidar signal (Smalikho, 1995). However, this approach requires the availability of lidar raw data, which is not generally available for commercial lidars. For a review on turbulence statistical analyses through lidar measurements, the reader can refer to Sathe and Mann (2013). Another typical scanning strategy to obtain high-frequency lidar data consists of performing scans with fixed elevation and azimuthal angles of the laser beam while maximizing the sampling frequency (Mayor et al., 1997;O'Connor et al., 2010;Vakkari et al., 2015;Frehlich and Cornman, 2002;Debnath et al., 2017b;Choukulkar et al., 2017;Lundquist et al., 2017).
For remote sensing instruments, data are typically collected based on a spherical coordinate system, and then interpolated over a Cartesian reference frame oriented with the x axis in the mean wind direction. This interpolation can be a source of error , especially if a linear interpolation method is used (Garcia et al., 2017;Carbajo Fuertes et al., 2018;Beck and Kühn, 2017;Astrup et al., 2017). Delaunay triangulation has also been widely adopted for coordinate transformation (Clive et al., 2011;Trujillo et al., 2011Trujillo et al., , 2016Iungo and Porté-Agel, 2014;Machefaux et al., 2015), yet the accuracy has not been quantified in the case of nonuniformly distributed data. It is reasonable to weight the influence of the experimental points on their statistics according to the distance from the respective grid centroid, such as using uniform ), hyperbolic (Van Dooren et al., 2016, or Gaussian weights (Newsom et al., 2014;Wang and Barthelmie, 2015;Zhan et al., 2019). The use of distance-based Gaussian weights for the interpolation of scattered data over a Cartesian grid is at the base of the Barnes objective analysis (or Barnes scheme; Barnes, 1964), which has been systematically used in meteorology but only sporadically used for lidar data. It represents an iterative statistical ensemble procedure to reconstruct a scalar field arbitrarily sampled in space and is low-pass filtered with a cut-off wavelength that is a function of the parameters of the scheme.
The scope of this work is to define a methodology to postprocess scattered data of a turbulent velocity field measured through a scanning Doppler wind lidar (or eventually other remote sensing instruments) to calculate mean, standard deviation and even higher-order statistical moments on a Cartesian grid. The proposed methodology, referred to as the Li-DAR Statistical Barnes Objective Analysis (LiSBOA), represents an adaptation of the classic Barnes scheme to Ndimensional domains, enabling applications for nonisotropic scalar fields through a coordinate transformation. A major point of novelty of LiSBOA is the estimation of wind velocity variance (and, eventually, higher-order statistics) from the residual field of the mean, which also provides adequate filtering of dispersive stresses due to data variability not connected with the turbulent motion. A criterion for the rejection of statistical data affected by aliasing, due to the undersampling of the spatial wavelengths under investigation, is formulated. LiSBOA is assessed against a synthetic scalar field to validate its theoretical response and the formulated error metric. Detailed guidelines for the optimal design of a lidar experiment and the effective reconstruction of the wind statistics are provided. The effectiveness of the proposed scheme in the identification of the optimal scanning parameters and retrieval of turbulence statistics is quantified using virtual lidar data.
It will be shown in the following that the revisited Barnes scheme offers several advantages compared to the abovecited techniques for lidar data analysis: (i) it allows one to explicitly select the cut-off wavenumber to filter out small-scale variability, while retaining relevant modes in the flow field; (ii) the distance-based weighting function provides smoother fields than linear interpolation or window average, while still being simpler and computationally inexpensive compared to more sophisticated techniques (e.g., optimal interpolation and variational methods); and (iii) it provides guidance for the optimal design of lidar scans to investigate specific wavelengths in the flow. On the other hand, the procedure requires estimates of input parameters for the flow under investigation and the lidar system used. In case these parameters cannot be obtained from existing literature or preliminary tests, a sensitivity study on the variability in the LiSBOA results to the input parameters can be carried out.
The remainder of the paper is organized as follows: in Sect. 2, the extension of the Barnes scheme theory to Ndimensional domains and higher-order statistical moments is presented. In Sect. 3, the theoretical response function of LiSBOA is validated against a synthetic case, while guidelines for proper use of the proposed algorithm and optimal scan design are provided in Sect. 4. In Sect. 5, the accuracy of LiSBOA is tested, using the virtual lidar technique. Challenges in the application of the methodology to field experimental data are then discussed in Sect. 6. Finally, concluding remarks are provided in Sect. 7.
2 The Barnes objective analysis -fundamentals and extension to statistical N-dimensional analysis The Barnes scheme was originally conceived as an iterative algorithm aiming to interpolate a set of sparse data over a Cartesian grid (Barnes, 1964), and it was inspired by the successive correction scheme by Cressman (1959). The first iteration of the algorithm calculates a weighted space-averaged field, g 0 , over a Cartesian grid from the sampled scalar field, f . The mean field is iteratively modified by adding contributions to recover features characterized by shorter wavelengths, which are inevitably damped by the initial averaging process. In this work, we adopt the most classical form of the Barnes scheme as follows: where g m i is the average field at the ith grid node with coordinates r i (bold symbols indicate vectorial quantities) for the mth iteration, f j is the scalar field sampled at the location r j , and φ represents the linear interpolation operator from the Cartesian grid to the sample location. The weights for the sample acquired at the location r j and for the calculation of the statistics of f at the grid node, with coordinates r i , w ij , are defined as follows: where σ is referred to as the smoothing parameter, and |.| indicates Euclidean norm. For practical reasons, the summa-tions over j are performed over the neighboring points included in a ball with a finite radius R max (also called the radius of influence) and centered at the ith grid point. In this work, following Barnes (1964), we select R max = 3σ , which encompasses 99.7 % and 97 % of the volume of the weighting function in 2D and 3D, respectively. In the literature, there is a lack of consensus regarding the selection of the total number of iterations (Barnes, 1964;Achtemeier, 1989;Smith and Leslie, 1984;Seaman, 1989) and the smoothing parameter (Barnes, 1994a;Caracena, 1987;Pauley and Wu, 1990). A reduction of the smoothing parameter, σ , as a function of the iteration, m, was originally proposed by Barnes (1973); however, this approach turned out to be detrimental in terms of noise suppression (Barnes, 1994c).
In the frequency domain, the Barnes objective analysis is tractable as a low-pass filter applied to a scalar field, f , with a response as a function of the spatial wavelength, depending on the smoothing parameter, σ , and the number of iterations, m. This feature has been exploited in meteorology to separate small-scale from mesoscale motions (Doswell, 1977;Maddox, 1980;Gomis and Alonso, 1990). The spectral behavior of the Barnes scheme has been traditionally characterized by calculating the so-called continuous response at the mth iteration, D m (k), with k being the wavenumber vector. D m (k) is defined as the ratio between the amplitude of the Fourier mode e ik·x (with i = √ −1) for the reconstructed field, g m , to its amplitude in the input field, f , in the limit of a continuous distribution of samples and an infinite domain. The analytical expression for the continuous response was provided by Barnes (1964) and Pauley and Wu (1990) for 1D and 2D domains, respectively, while, in the context of LiSBOA, it is extended to N dimensions to enhance its applicability. Furthermore, besides the spatial variability of f , the temporal coordinate, t, is introduced to determine the response of the statistical moments of f .
We consider a continuous scalar field, f (x, t), which is defined over an N-dimensional domain, x. It is further assumed that the field f is ergodic in time. In practice, ergodic data can be obtained by selecting samples collected for a temporal window exhibiting stationary boundary conditions or, more generally, through a cluster analysis of discontinuous data (Machefaux et al., 2016;Bromm et al., 2018;Iungo et al., 2018;Zhan et al., 2019Zhan et al., , 2020. By adopting the approach proposed by Pauley and Wu (1990), and by taking advantage of the isotropy of the Gaussian weights (Eq. 2), we can define the LiSBOA operator at the 0th iteration as follows: where t 1 and t 2 are initial and final time. The term within the square brackets represents the mean of f over the considered sampling interval [t 1 , t 2 ], which is indicated as f . Moreover, to reconstruct a generic qth central statistical moment of the scalar field, f , it is sufficient to apply the LiSBOA operator of Eq.
(3) to the fluctuations over f to the qth power as follows: For practical applications, the mean field f is generally not known, but it can be approximated by the LiSBOA output, g m , interpolated at the sample location through the operator φ. By comparing Eq. (4) with Eq. (3), it is understandable that the response function of any central moment with an order higher than one is equal to that of the 0th iteration response of the mean, g 0 . Indeed, Eq.
(3) can be interpreted as the 0th iteration of the LiSBOA spatial operator (see Eq. 3) applied to the fluctuation field to the qth power. By leveraging the convolution theorem, it is possible to calculate the response function of the mean of the 0th iteration of LiSBOA in the frequency domain (see Appendix A for more details). This result, combined with the recursive formula of Barnes (1964) for the response at the generic iteration, m, provides the spectral response of LiSBOA for the mean as follows: where n is the half-wavelength vector associated with k. Equation (5) states that, for a given wavenumber (i.e., half wavelength), the respective amplitude of the interpolated scalar field, g m , is equal to that of the original scalar field damped with a function of the smoothing parameter, σ , and the number of iterations, m. This implies that the parameters σ and m should be selected properly to avoid significant damping for wavelengths of interest or dominating the spatial variability of the scalar field under investigation.
For real applications, the actual LiSBOA response function can depart from the abovementioned theoretical response (Eq. 5) for the following reasons: the convolution integral in Eq. (3) is calculated over a ball of finite radius R max ; f is sampled over a discrete domain and, thus, introduces related limitations, such as the risk of aliasing (Pauley and Wu, 1990); the distribution of the sampling points is usually irregular and nonuniform, leading to larger errors where a lower sample density is present (Smith et al., 1986;Smith and Leslie, 1984;Buzzi et al., 1991;Barnes, 1994a) or in proximity to the domain boundaries (Achtemeier, 1986); an error is introduced by the back-interpolation function, φ, from the Cartesian grid, r i , to the location of the samples, r j (Eq. 1) (Pauley and Wu, 1990).
Before proceeding with further analysis, it is necessary to address the applicability of LiSBOA to anisotropic and multi-chromatic scalar fields. Generally, the application of LiSBOA with an isotropic weighting function is not recommended in the case of severe anisotropy of the field and/or the data distribution. At the early stages of objective analysis techniques, the use of an anisotropic weighting function was proved to be beneficial for increasing accuracy while highlighting patterns elongated along a specific direction, based on empirical (Endlich and Mancuso, 1968) and theoretical arguments (Sasaki, 1971). Furthermore, the adoption of a directional smoothing parameter, σ p , where p is a generic direction, allows maximizing the utilization of the data retrieved through inherently anisotropic measurements, such as the line-of-sight fields detected by remote sensing instruments (Askelson et al., 2000;Trapp and Doswell, 2000). With this in mind, we propose a linear scaling of the physical coordinates before the application of LiSBOA to recover a pseudo-isotropic velocity field. The scaling reads as follows: where x * is the origin of the scaled reference frame, and n 0,p is the scaling factor for the pth direction. Hereinafter, · refers to the scaled frame of reference. From a physical standpoint, the scaling is equivalent to the adoption of an anisotropic weighting function, while the rescaling approach is preferred to ensure generality with respect to the mathematical formulation outlined in this section. The scaling factor, n 0 , is an important parameter in the present framework and is referred to as the fundamental half wavelength, while the associated Fourier mode is denoted as the fundamental mode. The selection of the fundamental half wavelength should be guided by a priori knowledge of the dominant length scales of the flow in various directions. Modes exhibiting degrees of anisotropy different to that of the selected fundamental mode will not be isotropic in the scaled mapping, which leads to the following two consequences: first, their response will not be optimal, in the sense that the shortest directional wavelength can produce excessive damping of the specific mode (Askelson et al., 2000); second, the shape preservation of such nonspherical features in the field reconstructed through LiSBOA is not ensured (Trapp and Doswell, 2000).
Regarding the reconstruction of the flow statistics through LiSBOA, two categories of error can be identified. The first is the statistical error due to the finite number of samples of the scalar field, f , available in time. This error is strictly connected with the local turbulence statistics, the sampling rate, and the duration of the experiment. The second error category is the spatial sampling error, which is due to the discrete sampling of f in the spatial domain x. The Petersen-Middleton theorem (Petersen and Middleton, 1962) states that the reconstruction of a continuous and band-limited signal from its samples is possible if, and only if, the spacing of the sampling points is small enough to ensure nonoverlapping of the spectrum of the signal with the replicas distributed over the so-called reciprocal lattice (or grid). The latter is defined as the Fourier transform of the specific sampling lattice. The 1D version of this theorem is the wellknown Shannon-Nyquist theorem (Shannon, 1984). An application of this theorem to nonuniform distributed samples, like those measured by remote sensing instruments, is unfeasible due to the lack of periodicity of the sampling points. To circumvent this issue, we adopted the approach suggested by Koch et al. (1983), who defined the random data spacing, d, as the equivalent distance that a certain number of samples enclosed in a certain region, N exp , would have if they were uniformly distributed over a structured Cartesian grid. The generalized form of the random data spacing reads as follows: where V is the volume of the hyper sphere, with radius R max = 3σ centered at the specific grid point, and N exp represents the number of not colocated sample locations included within the hyper sphere. Then, the Petersen-Middleton theorem for the reconstruction of the generic Fourier mode of half-wavelength n can be translated as the following constraint: Violation of the inequality (Eq. 8) will lead to local aliasing, with the energy content of the undersampled wavelengths being added to the low-frequency part of the spectrum.

LiSBOA assessment through Monte Carlo simulations
The spectral response of LiSBOA is studied through the Monte Carlo method. The goal of the present section is twofold, namely validating the analytical response of mean and variance (Eq. 5) and characterizing the sampling error of LiSBOA as a function of the random data spacing. For these aims, a synthetic 3D scalar field is generated, while its temporal variability is reproduced locally by randomly sampling a normal probability density function. Specifically, the synthetic scalar field is as follows: x sin π n y sin π n z + 1 + sin π n x sin π n y sin π n z 0.5 ℵ(0, 1), (9) where ℵ is a generator of random numbers with a normal probability density function with mean value of zero and standard deviation equal to one. The constant of one in the two terms on the right-hand side (RHS) of Eq. (9) does not affect LiSBOA response and is introduced to obtain both mean and variance of f equal to the following function: It is noteworthy that f −1 is a monochromatic isotropic function.
An experimental sampling process is mimicked by evaluating the scalar field f through randomly and uniformly distributed samples collected at the locations r j . The latter are distributed within a cube spanning the range ±10σ in the three Cartesian directions. The total number of sampling points considered for each realization, N s , is varied from 500 up to 20 000 to explore the effects of the sample density on the error. The sampling process is repeated L times for each given distribution of N s points to capture the variability in the field introduced by the operator ℵ. The whole procedure can be considered as an idealized lidar experiment, where a scan including N s sampling points is performed L times to probe an ergodic turbulent velocity field.
Since the response is only a function of n/σ and m (Eq. 5), for the spectral characterization of LiSBOA, the parameter n/σ is varied among the following values: [1,2,3,4,5]. An implementation of LiSBOA algorithm for discrete samples is then applied to reconstruct the mean, g m , and variance, v m , of the scalar field, f , over a Cartesian structured grid, r i , with a resolution of 0.25. Figure 1 depicts an example of the reconstruction of the mean scalar field, g m , and its variance, v m , from the Monte Carlo synthetic data set.
For the error quantification, the 95th percentile of the absolute error calculated at each grid point r i (AE 95 hereinafter) is adopted as follows: The AE 95 quantifies the discrepancy between the outcome of LiSBOA and the analytical input damped by the theoretical response evaluated over the Cartesian grid. As highlighted in Eq. (11), the expected value of AE 95 is a function of the half wavelength over the smoothing parameter, n/σ , the number of iterations, m, the number of samples, N s , and the number of realizations, L. To investigate the link between AE 95 and the abovementioned parameters, the Pearson correlation coefficients are analyzed ( Table 1). The number of samples N s , which is inversely proportional to the data spacing d (Eq. 7), is the variable exhibiting the strongest correlation with the error for both mean and variance. This indicates, as expected, that a larger number of samples for each measurement realization is always beneficial for the estimates of the statistics of the scalar field, f . Furthermore, the negative sign of correlations ρ(AE 95 , N s ), and ρ(AE 95 , n/σ ), corroborate the hypothesis that the ratio d/ n, i.e., the number of samples per half wavelength, is the main driving factor for the sampling error (Koch et al., 1983;Barnes, 1994a;Caracena et al., 1984).
The small positive correlation ρ(AE 95 , m) detected for the mean is due to an amplification of the error occurring during the iterative process (Barnes, 1964). The issue will be discussed in more detail in Sect. 4. For the variance, ρ(AE 95 , m) is practically negligible, confirming that the response of the higher-order statistics is insensitive to the number of iterations, m. Finally, the negative correlations with L show that the statistical error is inversely proportional to the number of realizations collected. The dependence ρ(AE 95 , L) is mainly due to the statistical error connected with the temporal sampling, and thus, the number of realizations, L, is progressively increased until convergence of the AE 95 is achieved. Figure 2 displays the behavior of the error as a function of N s and L. The values displayed represent the median for all the wavelengths and iterations, with the AE 95 just mildly dependent on these parameters. As Fig. 2 shows, increasing the number of realizations, L, beyond 100 has a negligible effect on the error; thus, a final value of L = 200 is selected for the remainder of this analysis.
To verify the analytical response of the mean and variance of the scalar field, f , a numerical estimator of the response is defined as the median in the space of the ratio between the field reconstructed via LiSBOA and the expected value of the synthetic input, as follows: In the calculation of the numerical response through Eq. (12), the influence of the edges is removed by rejecting points closer than R max to the boundaries of the numerical domain. Furthermore, the zero crossings of the synthetic sine function (|f − 1| < 0.1) are excluded to avoid singularities. A comparison between the actual and the theoretical response (Eq. 5) for several wavelengths of the input function is reported in Fig. 3 for the case with the highest number of samples N s = 20 000. An excellent agreement is observed between the theoretical prediction and the Monte Carlo out- come, which indicates that, in the limit of negligible statistical error (large L) and adequate sampling (large N s and nearuniform distributed samples), the response approaches the predictions obtained from the developed theoretical framework.
The trend of the response of the mean (Fig. 3a) suggests that, for a given wavelength, the same response can be achieved for an infinite number of combinations σ − m, and specifically, a larger σ requires a larger number of iterations, m, to achieve a certain response, D m . It is noteworthy that, for a smaller number of iterations, m, the slope of the response function is lower. This feature can be beneficial for practical applications for which the LiSBOA response will have small changes for small variations of n. However, a lower slope of the response function can be disadvantageous for short wavelength noise suppression. Figure 3b confirms that the response of the variance and, similarly for higherorder statistics, is not a function of the total number of iterations, m, and is equal to the response of the mean for the 0th iteration, D 0 .
Finally, the link between error and the random data spacing, d, is investigated. In Fig. 4, the discrepancy with respect to theory quantified by the AE 95 is plotted versus the random data spacing normalized by the half wavelength for a fixed total number of iterations m = 5. The values displayed on the x axis represent the median over all grid points, r i . This analysis reveals a strong correlation between the normalized random data spacing and the error. This analysis corroborates that, in the limit of negligible statistical error (i.e., a high number of realizations, L), uncertainty is mainly driven by the local data density normalized by the wavelength, which is related to the Petersen-Middleton cri-terion. Indeed, the cases satisfying the Petersen-Middleton constraint (Eq. 8) are those exhibiting an AE 95 smaller than ∼ 40 % of the amplitude of the harmonic function f for both the mean and variance. However, if a smaller error is needed, it will be necessary to reduce the maximum threshold value for d/ n.

Guidelines for an efficient application of LiSBOA to wind lidar data
An efficient application of LiSBOA to lidar data relies on the appropriate selection of the parameters of the algorithm, namely the fundamental half wavelengths, n 0 , the smoothing parameter, σ , the number of iterations, m, and the spatial discretization of the Cartesian grid, dx. Furthermore, the data collection strategy must be designed to ensure adequate sampling of the spatial wavelengths of interest so that the Petersen-Middleton constraint (Eq. 8) is satisfied. In this section, we show that the underpinning theory of LiSBOA, along with an estimate of the properties of the flow under investigation, can guide the optimal design of a lidar experiment and evaluation of the statistics for a turbulent ergodic flow. The whole procedure can be divided into three phases, namely characterization of the flow, design of the experiment, and reconstruction of the statistics from the collected data set. First, the integral quantities of the flow under investigation required for the application of LiSBOA need to be estimated, such as extension of the spatial domain of interest, characteristic length scales, integral timescale, τ , characteristic temporal variance of the velocity, u 2 , and expected total sampling time, T , which depends on the typical duration of stationary  boundary conditions over the domain. These estimates can be based on previous studies available in the literature, numerical simulations, or preliminary measurements.
Then, it is necessary to define the fundamental half wavelengths, n 0 , which are required for the coordinate scaling (Eq. 6). Imposing the fundamental half wavelengths equal to (or even smaller than) the estimated characteristic length scales of the smallest spatial features of interest in the flow is advisable. This ensures isotropy of the mode associated with the fundamental half wavelength (and all the modes characterized by the same degree of anisotropy) and guides the selection of the main input parameters of the LiSBOA algorithm, i.e., smoothing parameter, σ , and number of iterations, m. Indeed, n 0 can be considered as the cut-off half wavelength of the spatial low-pass filter represented by the LiS-BOA operator. To this end, it is necessary to select σ and m to obtain a response of the mean associated with the fundamental mode, D m ( ñ 0 ), as close as possible to one. After the coordinate scaling (Eq. 6), the response of the fundamental mode is universal, and it is reported in Fig. 5. For instance, if we select a response equal to 0.95, then all the points lying on the isocontour defined by the equality D m ( ñ 0 ) = 0.95 give, in theory, the same response for the mean of the scalar field f . This implies that an infinite number of combinations σ − m allow us to obtain a response of the mean equal to the selected value. However, with increasing σ , the response at the 0th iteration, D 0 ( ñ 0 ), reduces, which indicates a lower response for higher-order statistics. For the LiSBOA application, the following aspects should be also considered: the smaller σ , the smaller the radius of influence of LiS-BOA, R max , and, thus, the lower the number of samples averaged per grid node, N exp , and the greater the statistical uncertainty; an excessively large m can lead to overfitting of the experimental data and noise amplification (Barnes, 1964); the higher m, the higher the slope of the response function (see Fig. 3), which improves the damping of highfrequency noise, but it produces a larger variation in the response of the mean with different spatial wavelengths; and the radius of influence R max (and therefore σ ) can affect the data spacing d in case of nonuniform data distribution.
A few handy combinations of smoothing parameters and total iterations for D m ( ñ 0 ) = 0.95 are provided in Table 2.
As mentioned above, all these σ −m pairs allow us to achieve roughly the same response for the mean, while the response for the higher-order statistics reduces with an increasing number of iterations, m.
As a final remark about the selection of n 0 , we should consider that, if the fundamental half wavelength is too large compared to the dominant modes in the flow, small-scale spatial oscillations of f will be smoothed out during the calculation of the mean, with the consequence of underestimated gradients and incorrect estimates of the high-order statistics due to the dispersive stresses (Arenas et al., 2019). On the other hand, the selection of an overly small n 0 would require an excessively fine data spacing to satisfy the Petersen-Middleton constraint (Eq. 8), which may lead to an overly long sampling time, or it may even exceed the sampling capabilities of the lidar.
The optimal lidar scanning strategy aimed to characterize atmospheric turbulent flows implies finding a trade-off between a sufficiently fine data spacing, which is quantified through d in the present work (Eq. 7), and an adequate number of time realizations, L, to reduce the temporal statistical uncertainty. Considering a total sampling period, T , for which statistical stationarity can be assumed, and a pulsed lidar that scans N r points evenly spaced along the lidar laser beam, with a range gate r and accumulation time τ a , the total number of collected velocity samples is then equal to N s = N r · T /τ a . The angular resolution of the lidar scanning head in azimuth ( θ for plan position indicators, PPIs), elevation ( β, for RHIs), or both axes (for volumetric scans) can be selected to modify the angular spacing between consecutive lines of sight (i.e., the data spacing) and the total sampling period for a single scan, τ s (i.e., the number of realizations, L).
The design of a lidar scan aiming to reconstruct turbulent statistics of an ergodic flow through LiSBOA can be formalized as a two-objective (or Pareto front) optimization problem. The first cost function of the Pareto front, which is referred to as I , is the percentage of grid nodes for which the Petersen-Middleton constraint, applied to the smallest half wavelength of interest (i.e., n 0 ), is not satisfied. With respect to the scaled reference frame, this can be expressed as follows: where the square brackets are Iverson brackets, and N i is the total number of nodes in the Cartesian grid, r i . For a more conservative formulation, rejecting all the points with a distance smaller than R max from an undersampled grid node, i.e., with d > 1, is recommended. This condition will ensure that the statistics are based solely on regions that are adequately sampled. The cost function I depends not only on the angular resolution but also on R max , which is equal to 3σ in this work. In general, increasing σ results in a larger number of samples considered for the calculation of the statistics at each grid point r i and, thus, in a reduction in I . Therefore, a larger σ entails a larger percentage of the spatial domain fulfilling the Petersen-Middleton constraint. The smoothing parameter, σ , also plays a fundamental role in the response of higher-order statistical moments. Specifically, if the reconstruction of the variance or higher-order statistics is important, the response D 0 ( ñ 0 ) should be included in the Pareto front analysis as an additional constraint. The second cost function for the optimal design of lidar scans, II , is equal to the standard deviation of the sample mean, which, for an autocorrelated signal, is (Bayley and  Table 2. Table 2. Selected combinations of σ and m for achieving a ∼ 95 % recovery of the mean of the selected fundamental half wavelength and associated response of the higher-order moments (HOM).
where ρ p is the autocorrelation function at lag p, τ is the integral timescale, and the approximation is based on George et al. (1978). The velocity variance, u 2 , and the autocorrelation, ρ p , are functions of space; however, to a good degree of approximation, they can be replaced by a representative value and be considered as being uniform in space. Figure 6 shows the standard deviation of the sample mean normalized by the standard deviation of the velocity as a function of the number of realizations, L, and for different integral timescales, τ . It is noteworthy that the standard deviation of the sample mean represents the uncertainty of the time average of each measurement point, r j , while the final uncertainty of the mean field at the grid nodes r i is generally reduced due to the spatial averaging process intrinsic to LiSBOA. It is noteworthy that the estimates of the statistical error obtained through LiSBOA do not consider other sources of error, such as accuracy of the instruments and spatial averaging due to the lidar measuring process (Rye and Hardesty, 1993;O'Connor et al., 2010;Puccioni and Iungo, 2020). Eventually, other error estimates can be coupled with the sampling error estimated through LiSBOA for a more comprehensive error analysis (Wheeler and Ganji, 2010b). Furthermore, LiSBOA allows the calculation of velocity statistics, including contributions of eddies with different sizes, which span from the largest eddy advected within the total sampling time to the smallest eddy detectable for a given accumulation time (Puccioni and Iungo, 2020). Therefore, a careful preprocessing of the lidar data should eventually be performed to remove contributions due to nonturbulent mesoscale eddies (Högström et al., 2002;Metzger et al., 2007;O'Connor et al., 2010). The whole procedure for the design of a lidar scan and retrieval of the statistics is reported in the flow chart of Fig. 7. Summarizing, from a preliminary analysis of the velocity field under investigation, we estimate the maximum total sampling time, T , the characteristic integral timescale, τ , the characteristic velocity variance, u 2 , and the fundamental half wavelengths, n 0 . This information, together with the settings of the lidar (namely the accumulation time, τ a , the number of points per beam, N r , and the gate length, r), allow the generation of the Pareto front as a function of θ and/or β and for different values of σ . Based on the specific goals of the lidar campaign in terms of the coverage of the selected domain (i.e., I ), the statistical significance of the data (i.e., II ) and, eventually, the response of the higher-order statisti- cal moments (i.e., D 0 ( ñ 0 )), the LiSBOA user should select the optimal angular resolution, θ and/or β, and the set of allowable σ values. Due to the abovementioned nonideal effects on LiSBOA, the selection of σ is finalized during the postprocessing phase when the lidar data set is available and the statistics can be calculated for different pairs of σ − m values. For the resolution of the Cartesian grid, Koch et al. (1983) suggested that it should be chosen as a fraction of the data spacing, which, in turn, is linked to the fundamental half wavelength. The same author suggested a grid spacing included in the range dx ∈ [ n 0 /3, n 0 /2]. In this work, we have used dx = n 0 /4, which ensures a good grid resolution with acceptable computational costs. By following the steps outlined in the present section, the mean, variance, or even higher-order statistical moments of the velocity field can be accurately reconstructed for the wavelengths of interest. It is worth mentioning that the LiS-BOA of wind lidar data should always be combined with a robust quality control process of the raw measurements. Indeed, the space-time averaging operated by LiSBOA makes the data analysis sensitive to the presence of data outliers, which need to be identified and rejected beforehand to prevent contamination of the final statistics. The interested reader is referred to Manninen et al. (2016), Beck and Kühn (2017), and Vakkari et al. (2019) for more information on quality control of lidar data. On a final note, for applications of LiSBOA, the uncontrollable environmental conditions and the uncertainty in the flow characteristics needed, as the input of LiSBOA may pose some challenges, will be discussed more in detail in Sect. 6.

LiSBOA validation against virtual lidar data
The LiSBOA algorithm is applied to a synthetic data set generated through the virtual lidar technique to assess accuracy in the calculation of statistics for a wind turbine wake probed through a scanning lidar installed on the turbine nacelle. For this purpose, a simulator of a scanning Doppler pulsed wind lidar is implemented to extract the line-of-sight velocity from a numerical velocity field produced through highfidelity large eddy simulations (LES). Due to their simplicity and low computational costs, lidar simulators have been widely used for the assessment of postprocessing algorithms of lidar data and scan design procedures Stawiarski et al., 2015;Lundquist et al., 2015;Mirocha et al., 2015).
As a case study, we use the LES data set of the flow past of a single turbine with the same characteristics of the 5-MW NREL (National Renewable Energy Laboratory) reference wind turbine (Jonkman et al., 2009). The rotor is three bladed and has a diameter D = 126 m. The tip-to-speed ratio of the turbine is set to its optimal value of 7.5. A uniform incoming wind with a free stream velocity of U ∞ = 10 m s −1 and turbulence intensity of 3.6 % is considered. The rotor is simulated through an actuator disk with rotation, while the drag of the nacelle is taken into account using an immersed boundary method (Ciri et al., 2017). More details on the LES solver can be found in Santoni et al. (2015). The computational domain has dimensions (L x × L y × L z = 12D × 6D × 6D) in the streamwise, spanwise, and vertical directions, respectively, and it is discretized with 960 × 256 × 300 uniformly spaced grid points, respectively, resulting in a spacing of dx = 0.0125D, dy = 0.025D and dz = 0.0202D. A radiative condition is imposed at the outlet (Orlanski, 1976), while periodicity is applied in the spanwise direction. For the sake of generality, a uniform incoming wind is generated by imposing free-slip conditions at the top and bottom of the numerical domain. Ergodic velocity vector fields are available for a total time of T = 750 s.
For the estimation of the flow characteristics necessary for the scan design, the azimuthally averaged mean and standard deviation of streamwise velocity, as well as the integral timescale are considered (Fig. 8). The use of cylindrical coordinates is justified by the axisymmetry of the statistics of the wake velocity field generated by a turbine operating in a uniform velocity field (Iungo et al., 2013a;Viola et al., 2014;Ashton et al., 2016).
The streamwise LES velocity field shows the presence of a higher-velocity jet surrounding the nacelle, while u/U ∞ exhibits a clear minimum placed at y/D ∼ 0.25 (Fig. 8a). These flow features are consistent with the double Gaussian velocity profile typically observed in the near-wake region (Aitken and Lundquist, 2014). In Fig. 8b, the standard deviation of the streamwise velocity has high values in the very near wake (x/D < 1) in the proximity of the rotor axis, which is most probably connected with the vorticity struc-  tures generated in proximity of the rotor hub and their dynamics (Iungo et al., 2013a;Viola et al., 2014;Ashton et al., 2016). Similarly, enhanced values of the velocity standard deviation occur at the wake boundary (r/D ≈ 0.5), which are connected with the formation and dynamics of the helicoidal tip vortices (Ivanell et al., 2010;Debnath et al., 2017c). A peak of u 2 /U ∞ is observed around (x/D ≈ 3), which can be considered as being the formation length of the tip vortices. The integral timescale is evaluated by integrating the sample biased autocorrelation function of the time series of u up to the first zero crossing (Zieba and Ramza, 2011). The integral timescale is generally smaller within the wake than for the typical values observed in the free stream, which is consistent with the smaller dimensions of the wake vortic-ity structures compared to the larger energy-containing structures present in the incoming turbulent wind.
To reconstruct the mentioned flow features, the fundamental half wavelengths in the spanwise and vertical directions selected for this application of LiSBOA are n 0,y = n 0,z = 0.5D, which allows the retrieval of spatial features of the velocity field as small as the rotor blade in the crossstream direction, which are typically observed in the near wake (Aitken and Lundquist, 2014;Santoni et al., 2017). Furthermore, considering the streamwise elongation of the isocontours of the flow statistics shown in Fig. 8a, a conservative value of the fundamental half wavelength in the x direction n 0,x = 2.5D is selected. This information could also have been inferred from previous studies (e.g., Chamorro and Porté-Agel, 2010;Abkar and Porté-Agel, 2013;Zhan et al., 2019).
The availability of the LES data set allows the testing of the relevance of the selected n 0 by evaluating the 3D energy spectrum of u/U ∞ and u 2 /U 2 ∞ in the physical and scaled reference frames (Eq. 6). The spectra are azimuthally averaged by exploiting the axisymmetry of the wake. The spectra in the physical reference frame ( Fig. 9a and b) reveal the clear signature of a streamwise elongation of the energy-containing scales for both velocity mean and variance, with the energy being spread over a larger range of frequencies in the radial direction compared to the streamwise direction. After the scaling (Fig. 9c and d), the spectra become more isotropic in the spectral domain, namely the energy is distributed equally along thek x andk r axes. In Fig. 9c, the blue dashed line represents the intersection with thek x -k r plane of the spherical isosurface that, in the wavenumber space, is characterized by D m ( ñ 0 ) = 0.95. All the modes contained within that sphere are reconstructed with a response D m > 0.95, while higher-frequency features lying outside will be damped. Numerical integration of the 3D energy spectrum shows that 94 % of the total spatial variance of the mean is contained within that sphere, which ensures that the energy-containing modes in the mean flow are adequately reconstructed with the selected parameters.
The analysis of the flow statistics reported in Fig. 8 enables estimates of flow parameters needed as input for LiS-BOA. For instance, the wake region is characterized by u 2 /U ∞ ≈ 0.1 and τ U ∞ /D ≈ 0.4 (τ ≈ 5 s). A main limitation of lidars is represented by the spatiotemporal averaging of the velocity field, which is connected with the acquisition process. Three different types of smoothing mechanisms can occur during the lidar sampling. The first is the averaging along the laser beam direction within each range gate, which has commonly been modeled through the convolution of the actual velocity field with a weighting function within the measurement volume (Smalikho, 1995;Frehlich, 1997;Sathe et al., 2011). The second process is the time averaging associated with the sampling period required to achieve a backscattered signal with adequate inten-sity (O'Connor et al., 2010;Sathe et al., 2011), while the last one is the transverse averaging (azimuth-wise or elevationwise averaging) occurring in case of a scanning lidar operating in continuous mode (Stawiarski et al., 2013). These filtering processes lead to a significant underestimation of the turbulence intensity (Sathe et al., 2011), an overestimation of integral length scales (Stawiarski et al., 2015), and a damping of energy spectra for increasing wavenumbers (Risan et al., 2018;Puccioni and Iungo, 2020).
A total of three versions of a lidar simulator are implemented for this work. The simplest one is referred to as ideal lidar, which samples the LES velocity field at the experimental points through a nearest-neighbor interpolation. This method minimizes the turbulence damping while retaining the geometry of the scan and the projection of the wind velocity vector onto the laser beam direction. The second version of the lidar simulator reproduces a step-stare lidar, i.e., the lidar scans for the entire duration of the accumulation time at a fixed direction of the lidar laser beam. A total of two filtering processes take place for this configuration, namely beam-wise convolution and time averaging. To model the beam-wise average, the retrieval process of the Doppler lidar is reproduced using a spatial convolution  as follows: where n is the lidar laser beam direction, u is the instantaneous velocity vector, and the dot indicates scalar product. A triangular weighting function φ(s) was proposed by Mann et al. (2010) as follows: where r is the gate length. The former expression is valid, assuming matching time windowing, i.e., gate length equal to the pulse width, and the velocity value is retrieved based on the first momentum of the backscattering spectrum. Despite its simplicity, Eq. (16) has shown to estimate realistic turbulence attenuation due to the beam-wise averaging process of a pulsed Doppler wind lidar (Mann et al., 2009). Furthermore, time averaging occurs due to the accumulation time necessary for the lidar to acquire a velocity signal with sufficient intensity and, thus, signal-to-noise-ratio. This process is modeled through a window average within the acquisition interval of each beam. For the sampling of the LES velocity field in space and time, a nearest-neighbor interpolation method is used. The third version of the lidar simulator mimics a pulsed lidar operating in continuous mode and performing PPI scans, where, in addition to the beam-wise convolution and time averaging, azimuth-wise averaging occurs due to the variation in the lidar azimuth angle of the scanning head during the scan. The latter is taken into account by adding an azimuthal averaging to the time average, among all data points included within the following angular sector: where r is the radial distance from the emitter, while θ and β are the associated azimuth and elevation angles, respectively. The subscript p refers to the pth lidar data point. Following the suggestions by Stawiarski et al. (2013), the out-of-plane thickness, z, is considered equal to the length of the diagonal of a cell of the computational grid.
It is noteworthy that the accuracy estimated through the present analysis only includes error due to the sampling in time and space and data retrieval. Other error sources, such as the accuracy of the instrument (Rye and Hardesty, 1993;O'Connor et al., 2010), are not included and should be coupled to the LiSBOA estimates for a more general error quantification (Wheeler and Ganji, 2010b). Figure 10a shows a snapshot of the streamwise velocity field over the horizontal plane at hub height obtained from the LES. The respective data of the radial velocity obtained from the three versions of the lidar simulator, by considering a scanning pulsed wind lidar deployed at the turbine location and at hub height, highlight the increased spatial smoothing of the radial velocity field by adding the various averaging processes connected with the lidar measuring process, namely beam-wise, temporal, and azimuthal averaging (Fig. 10).
The application of LiSBOA requires the provision of technical specifications of the lidar, specifically accumulation time, τ a , number of gates, N r , and gate length, r. For this work, these parameters are selected based on the typical settings of the WindCube 200S and StreamLine XR lidars (El-Asha et al., 2017;Zhan et al., 2019Zhan et al., , 2020, namely τ a = 0.5 s, N r = 39, and r = 25 m. Furthermore, to probe the wake region, a volumetric scan, including several PPI scans, with azimuth and elevation angles uniformly spanning the range ±10 • , with a constant angular resolution in both azimuth and elevation being selected, is conducted, while the virtual lidar is placed at the turbine hub.
With the information provided about the flow under investigation and the lidar system, it is possible to draw the Pareto front for the optimization of the lidar scan as a function of different combinations of angular resolutions of the lidar scanning head, θ and β, and the smoothing parameter of LiSBOA, σ , as shown in Fig. 11 for the case under investigation.
For the optimization of the lidar scan, the lidar angular resolution, θ, is evenly varied for a total number of seven cases, from 0.75 to 4 • , whereas three values of the ratio β/ θ , namely 0.5, 1, and 2, are tested separately. The four values of σ recommended in Table 2, to achieve a re-  sponse of the mean D m ( ñ 0 ) = 0.95, are considered here. In Fig. 11, markers indicate the different σ and, thus, the response of high-order statistical moments, D 0 ( ñ 0 ). Changing the ratio β/ θ affects the optimal θ (circled in black in Fig. 11); however, it has a negligible effect on the magnitude of the optimal I and II . For the rest of the discussion, we select the setup β/ θ = 1, as suggested by Fuertes Carbajo and Porté-Agel (2018). The Pareto front for β/ θ = 1 (Fig. 11b) shows that increasing θ from 0.75 up to 2.5 • drastically reduces the uncertainty on the mean ( II ) by roughly 70 % but does not significantly affect data loss consequent to the enforcement of the Petersen-Middleton constraint ( I ). For larger angular resolutions, the statistical significance improves just marginally but at the cost of a relevant data loss. For θ ≥ 2 • , in particular, I becomes extremely sensitive to σ , with the most severe data loss occurring for small σ (i.e., small R max ). The Pareto front also shows that, to achieve a higher response for the higherorder statistics, D 0 ( ñ 0 ) generally entails an increased data loss and/or statistical uncertainty of the mean. This analysis suggests that the optimal lidar scan for the reconstruction of the mean velocity field should be performed with θ = 2.5 • and σ = 1/4 or 1/6. Virtual lidar simulations are performed for all the values of angular resolution utilized in the Pareto front reported in Fig. 11. The streamwise component is estimated from the line-of-sight velocity through an equivalent velocity approach (Zhan et al., 2019). The latter states that, for small elevation angles (i.e., β 1) and under the assumption of negligible vertical velocity compared to the horizontal component (i.e., |w| √ u 2 + v 2 ) and uniform wind direction, θ w , a proxy for the streamwise velocity can be calculated as follows: The mean velocity and turbulence intensity are reconstructed through LiSBOA. The maximum error is quantified through the 95th percentile of the absolute error, AE 95 , using as reference the LES statistics interpolated on the LiSBOA grid. Figure 12 reports the AE 95 for the flow statistics for all the virtual experiments. The error for the mean field (Fig. 12ac) is mostly governed by the angular resolution, with a higher error occurring for slower scans. This is a clear consequence of the increased statistical uncertainty due to the limited number of scan repetitions, L, that are achievable for small θ values and a fixed total sampling period, T , while AE 95 stabilizes for θ ≥ 2.5 • . The trend of the AE 95 for u/U ∞ , with the pair-smoothing-parameter number of iterations, σ − m, is less significant since the theoretical response of the fundamental mode is ideally equal for all four cases. Conversely, the error on the turbulence intensity ( Fig. 12d-f) shows low sensitivity to the angular resolution but a steep increase for small σ values, which is due to the reduction in the radius of influence, R max , and the number of points averaged per grid node.
From a more technical standpoint, the error on the mean velocity field, u/U ∞ , appears to be relatively insensitive to the type of lidar scan, with the spatial and temporal filtering operated by the step-stare and continuous lidar even being beneficial in some cases. In contrast, the error on the turbulence intensity exhibits a more consistent and opposite trend, with the continuous lidar showing the most severe turbulence damping. This feature has been extensively documented in previous studies, see e.g., Sathe et al. (2011).
The 3D fields of mean velocity and turbulence intensity calculated over T = 750 s through the first optimal configuration, (i.e., θ = 2.5 • , σ = 1/4, and m = 5), are rendered in Figs. 13 and 14,respectively. Furthermore,in Fig. 15,imuthally averaged profiles at three downstream locations are also provided for a more insightful comparison. The mean velocity field is reconstructed fairly well, regardless of the type of lidar scan, due to the careful choice of the fundamental half wavelength, n 0 , for this specific flow. On the other hand, the reconstructed turbulence intensity is highly affected by the lidar processing, which leads to visible damping of the velocity variance for the step stare and even more for the continuous mode. The ideal lidar scan, whose acquisition is inherently devoid of any space-time averaging, allows the retrieval of the correct level of turbulence intensity for locations for x ≥ 4D, while in the near wake it struggles to recover the thin turbulent ring observed in the wake shear layer. Indeed, such a short wavelength feature has a small response for the chosen settings of LiSBOA, particularly n 0 and σ (see Fig. 3b). On the other hand, any attempt to in-  crease the response of the higher-order moments, for instance by reducing the fundamental half wavelengths or decreasing the smoothing and the number of iterations, would result in higher data loss and fewer experimental points per grid node.
Finally, Figs. 16 and 17 show u/U ∞ and u 2 /u over several cross-flow planes and for all the combinations of σ − m tested for the ideal lidar and the optimal angular resolution. For the mean velocity, the most noticeable effect is the in-creasingly severe data loss as a consequence of the reduction in σ , which indicates σ = 1/4 − m = 5 as being the most effective setting. The turbulence intensity exhibits, in addition to the data loss, a moderate increase in the maximum value for smaller σ , which is due to the higher response of the higher-order statistics (see Table 2). However, this effect is negligible in the far wake, where the radial diffusion of the initially sharp turbulent shear layer results in a shift of the en-  ergy content towards scales with larger n, which are fairly well recovered -even for σ = 1/4.

Notes on LiSBOA applications
LiSBOA can be applied to lidar data sets that are statistically homogeneous as a function time, t. This statistical property can be ensured with two approaches. The first approach consists of considering lidar data collected continuously in time, with a given sampling frequency, for a period where environmental parameters, such as wind speed and direction, Obukhov length, and bulk Richardson number for the atmospheric stability regime, are constrained within prefixed intervals (e.g., Banta et al., 2006;Iungo et al., 2013b;Kumer et al., 2015;Puccioni and Iungo, 2020). For instance, the statistical stationarity of a generic flow signal, α, can be verified through the nonstationary index (IST; Liu et al., 2017) as follows: where · represents time averaging and α α m is the mean value of the variance calculated over consecutive nonoverlapping subperiods. The IST values should be lower than a selected threshold, depending on the specific flow parameter considered (Foken et al., 2004). A second approach to ensure statistical homogeneity of the lidar data set consists of performing a cluster analysis based on environmental parameters, such as those mentioned above. This can be a fruitful alternative when the application of the first approach leads to too short periods with statistical stationarity and, thus, with low accuracy in the calculation of the turbulent statistics. With the clustering approach, larger data sets can be achieved for each cluster, enabling an enhanced statistical convergence (see, e.g., applications of clustering analysis to lidar measurements of wind turbine wakes; Machefaux et al., 2016;Bromm et al., 2018;Iungo et al., 2018;Zhan et al., 2019Zhan et al., , 2020. The results of LiSBOA for the optimal design of wind lidar scans are affected by the selection of the input parameters, such as the total sampling time, T , the integral timescale, τ , the velocity variance, u 2 , and the fundamental half wavelength, n 0 . In this section, we will discuss the sensitivity of LiSBOA to these input parameters by considering, as a reference case, the volumetric scan performed with the virtual lidar technique on the LES data set analyzed in Sect. 5. The respective results are summarized in Fig. 18. The total sampling time, T , directly affects the objective function II through parameter L, which represents the number of realizations. In Fig. 18a-d, different Pareto fronts are generated for the case under investigation, by varying T from 3 min to 1 h. The various Pareto fronts exhibit similar trends for the various values of T and generally higher values of II , so lower statistical accuracy, for smaller T . For all the cases, the optimal configuration is still that selected in Sect. 5, namely θ = 2.5 • -σ = 1/4. For this sensitivity study, the characteristic integral timescale, τ , has been varied between 0 s (completely random uncorrelated data) up to 35 s, with the upper value being based on the largest integral length scale in the atmospheric boundary layer (ABL), according to (ESDU, 1975), and considering an advection velocity of 8 m s −1 . The respec-tive Pareto fronts reported in Fig. 18e-h show that the optimal lidar scan is weakly affected by variations of τ , which is an advantageous feature of LiSBOA for applications in which this parameter cannot be estimated from previous investigations or the literature.
Regarding the characteristic velocity variance, u 2 , it is a multiplicative parameter for the objective function II (Eq. 14). Therefore, even though it affects the accuracy of the statistics retrieved, it does not alter the selection of the optimal scanning parameters.
The choice of the fundamental half wavelength, n 0 , deserves special attention since it affects both the optimal scan design and retrieval of data statistics. The fundamental half wavelength can be considered as being the cut-off wavelength of the spatial low-pass filtering operated by LiSBOA (Sect. 4). The selection of n 0 depends mostly on the length of the smallest spatial feature of interest in the flow under investigation, so the Pareto front is likely to be rather sensitive to changes in n 0 . As mentioned in Sect. 4, if the fundamental half wavelengths are too large compared to the predominant spatial modes, the turbulence statistics may be contaminated by over-smoothing and dispersive stresses, whereas overly small n 0 may require angular and radial resolutions that are too small, a longer sampling period, and, thus, a smaller number of repetitions for a given T . Figure 18il shows the Pareto fronts calculated at different n 0,x . The previously selected optimal setup ( θ = 2.5 • −σ = 1/4) still belongs to the optimality frontier. Finally, Fig. 18p-m displays the effect of varying n 0,y on the Pareto front. Unlike the other cases, the shape of the front is very sensitive to n 0,y , with a significant increase in data loss, I , consequent to refinements of the angular resolution, θ . The Pareto front correctly indicates that finer angular resolutions are needed to adequately sample a velocity field characterized by smaller wavelengths.
For the sake of completeness, the influence of the different fundamental half wavelengths on the statistics is assessed by calculating the AE 95 between LiSBOA and LES for the statistics reconstructed using several combinations of n 0,x and n 0,y (Fig. 19). Larger values of n 0,x and n 0,y produce detrimental effects on the accuracy for both mean velocity and turbulence intensity due to the over-smoothing of the mean velocity and turbulence intensity field and dispersive stresses for the turbulence intensity only. On the other hand, excessively small values of n 0,y exhibit a slight increase in error as a consequence of the smaller number of samples per grid node. Nonetheless, the most relevant effect, in this case, is represented by the high data loss, as already identified in the Pareto front (e.g., Fig. 18m, n). It is worth noting how the choice of n 0 = [2.5, 0.5, 0.5]D in Sect. 5, which was purely based on physical considerations about the expected relevant modes in the near wake, turned out to be the optimal configuration in terms of the overall error of u/U ∞ and u 2 /u. We acknowledge that the technical specifications required by LiSBOA (namely τ a , N r , and r) are dependent on the specific lidar system used, the contingent atmospheric conditions, and the best practices followed by the user. Since these parameters are greatly case dependent, they will not be discussed further in this context. In general, the selection of the accumulation time and gate length is a trade-off between the need to achieve a target maximum range, while keeping a sufficiently fine radial resolution and a sufficient intensity of the backscattered lidar signal. In the case of uncertain environmental conditions, checking, before the deployment, the influence of selected combinations of τ a , N r , and r on the Pareto front is recommended.
For the sake of completeness, LiSBOA is compared with other widely used techniques for statistical postprocessing of wind lidar data, specifically the Delaunay triangulation (see, e.g., Clive et al., 2011;Trujillo et al., 2011Trujillo et al., , 2016Iungo and Porté-Agel, 2014;Machefaux et al., 2015), linear interpolation in spherical coordinates (see, e.g., Mohr and Vaughan, 1979;Fuertes Carbajo and Porté-Agel, 2018), and window averaging (see, e.g., Newsom et al., 2008). Figure 20 shows the mean velocity and turbulence intensity fields retrieved from the considered LES data set through the various techniques for a step-stare virtual lidar scan. The various methods use the same grid as for LiSBOA (see Sect. 5). The voids in the 3D rendering correspond to regions outside of the data distribution for the Delaunay triangulation and linear interpolation (i.e., extrapolation cannot be performed) or bins having a standard error on the mean higher than 15 % of the incoming wind speed for the window average (for an analogy with LiSBOA, see Fig. 4). A qualitative comparison of the results reported in Fig. 20 with those for LiSBOA in Figs. 13c and 14c reveals that LiSBOA is the method enabling the largest spatial coverage for the retrieved statistics for the same lidar scan. From the parameter I , which is reported in Ta-  ble 3 for the various methods, it is noted that LiSBOA is the method with the lowest data rejection rate ( I = 33 %), while the largest is for the window average ( I = 66 %). Overall, all the methods, except for the window average, have similar accuracy in the retrieval of the mean velocity (see the mean absolute percentage error, MAPE, in Table 3), yet LiSBOA is the method with the lowest error. Furthermore, LiSBOA is the only method not showing artifacts for the retrieval of turbulence intensity over space, such as enhanced turbulence intensity and unexpected peaks, as the significantly lower error on turbulence intensity confirms. This result is in agreement with Trapp and Doswell (2000), where the effectiveness of the Barnes scheme in the suppression of short-wavelength noise compared to linear interpolation was already noted. Finally, the computational time, using MAT-LAB on a quad-core i7 laptop, is negligible and comparable for all the algorithms considered (∼ 1-2 s), with only the linear interpolation being considerably faster (Table 3). On a final note, it is noteworthy that LiSBOA is currently formulated for a single scalar field, namely a velocity component (radial or equivalent horizontal). However, in princi-ple, this procedure can be extended to vector fields, such as fully 3D velocity fields. Furthermore, other constraints can be added for the optimal scanning design, such as imposing a divergence-free velocity field for incompressible flows. Also, some modifications could extend the application of LiSBOA to other remote sensing instruments, such as sodars and scanning radars.

Conclusions
A revisited Barnes objective analysis for sparse, nonuniform distributed, and stationary lidar data has been formulated to calculate mean, variance, and higher-order statistics of the wind velocity field over a structured N -dimensional Cartesian grid. This LiDAR Statistical Barnes Objective Analysis (LiSBOA) provides a theoretical framework to quantify the response in the reconstruction of the velocity statistics as a function of the spatial wavelengths of the velocity field under the investigation and quantification of the sampling error.
LiSBOA has been validated against volumetric synthetic 3D data generated through Monte Carlo simulations. The results of this test have shown that the sampling error for a monochromatic scalar field is mainly driven by the data spacing being normalized by the half wavelength.
The LiSBOA framework provides guidelines for the optimal design of scans performed with a scanning Doppler pulsed wind lidar and the calculation of wind velocity statistics. The optimization problem consists of providing background information about the turbulent flow under investigation, such as total sampling time, expected velocity variance, and integral length scales, technical specifications of the lidar, such as range gate and accumulation time, and spatial wavelengths of interest for the velocity field. The formulated optimization problem has two cost functions, namely the percentage of grid nodes not satisfying the Petersen-Middleton constraint for the smallest half wavelength of interest (i.e., lacking adequate spatial resolution to avoid aliasing in the statistics), and the standard deviation of the sample mean. The outputs of the optimization problem are the lidar angular resolution and, for a given response of the mean field, the allowable smoothing parameters and number of iterations to use for LiSBOA.
LiSBOA has been validated using a data set obtained through the virtual lidar technique, namely by numerically sampling the turbulent velocity field behind the rotor of a 5 MW turbine obtained from a large eddy simulation (LES). The 3D mean streamwise velocity and turbulence intensity fields have shown a maximum error with respect to the LES data set of about 4 % of the undisturbed wind speed for the mean streamwise velocity and 4 % of the streamwise turbulence intensity in absolute terms. Wake features, such as the high-velocity stream in the hub region and the turbulent shear layer at the wake boundary, have been accurately reconstructed.
In the companion paper , LiSBOA is used to reconstruct the turbulence statistics of utility-scale turbine wakes probed with scanning pulsed Doppler lidars. That study also illustrates the detailed preconditioning applied to the raw lidar data to extract statistically stationary and normalized radial velocity data and showcases the potential of LiSBOA for wind energy research.
Appendix A: Derivation of the analytical response function of LiSBOA The first iteration of LiSBOA produces a weighted average in space of the scalar field, f , with the weights being Gaussian functions centered at the specific grid nodes, x. In the limit of a continuous function defined over an infinite domain, Eq. (3) represents the convolution between the scalar field, f , and the Gaussian weights, w. Therefore, the response function of LiSBOA can be expressed in the spectral domain as follows (Pauley and Wu, 1990): where the operator F indicates the Fourier transform (FT). The FT of the weighting function in Eq. (A1) can be conveniently recast as the product of N 1D FT as follows: where k p is the directional wavenumber and i = √ −1. Hence, by leveraging the closed form FT of the Gaussian function (Greenberg, 1998) as follows: we obtain the desired results, i.e., the following: Response at the mth iteration r i position in the N-dimensional space of the ith grid node of the Cartesian grid r j Position in the N-dimensional space of the j th sample I Cost function I (data loss) II Cost function II (standard deviation of the sample mean) N i Total number of nodes of the Cartesian grid τ Integral timescalẽ .
Spatial variable in the scaled frame of reference