LiSBOA: LiDAR Statistical Barnes Objective Analysis for optimal design of LiDAR scans and retrieval of wind statistics. Part I: Theoretical framework

A LiDAR Statistical Barnes Objective Analysis (LiSBOA) for optimal design of LiDAR scans and retrieval of the velocity statistical moments is proposed. The LiSBOA represents an adaptation of the classical Barnes scheme for the statistical analysis of unstructured experimental data in N-dimensional spaces and it is a suitable technique for the evaluation over a structured Cartesian grid of the statistics of scalar fields sampled through scanning LiDARs. The LiSBOA is validated and characterized via a Monte Carlo approach applied to a synthetic velocity field. This revisited theoretical framework for the Barnes objective analysis enables the formulation of guidelines for optimal design of LiDAR experiments and efficient application of the LiSBOA for the post-processing of LiDAR measurements. The optimal design of LiDAR scans is formulated as a two cost-function optimization problem including the minimization of the percentage of the measurement volume not sampled with adequate spatial resolution and the minimization of the error on the mean of the velocity field. The optimal design of the LiDAR scans also guides the selection of the smoothing parameter and the total number of iterations to use for the Barnes scheme.


Introduction
Reliable measurements of the wind-velocity vector field are essential to understand the complex nature of atmospheric turbulence and provide valuable datasets for the validation of theoretical and numerical models. However, field measurements of wind speed are typically characterized by large uncertainties due to the generally unknown and uncontrollable boundary conditions (Braham 1979), the broad range of time and length-scales (Cushman-Roisin and Beckers 1990a), and the complexity of the physics involved (Stull 1988). Furthermore, the large measurement volume, which typically extends throughout the height of the atmospheric boundary layer, imposes to the experimentalists the selection of the sampling parameters as a trade-off between spatial and temporal resolutions.
Wind speed has been traditionally measured through local sensors, such as mechanical, sonic, and hot-wire anemometers (Liu et al. 2019;Kunkel and Marusic 2006). Besides their simplicity, mechanical anemometers are affected by errors due to the flow distortion of the supporting structures and harsh weather conditions (e.g. ice) (Mortensen 1994). Furthermore, their relatively slow response results in a limited range of the measurable time-length scales, which makes them unsuitable, for instance, to measure the turbulent flow around urban areas (Pardyjak and Stoll 2017). Sonic anemometers can measure the three velocity component with frequencies up to 100 Hz (Cuerva and Sanz-Andrés 2000) in a probing volume of the order of 0.01 m, yet measurements might be still affected by the wakes generated by the supporting structures, such as met-towers and struts and they are sensitive to temperature variations (Mortensen 1994). Hot-wire anemometers, although they provide a full characterization of the energy spectrum, require a complicated calibration (Kunkel and Marusic 2006) and are extremely fragile (Wheeler 2004). Furthermore, traditional single-point sensors are unable to provide an adequate characterization of the spatial gradients of the wind velocity vector, which is particularly significant in the vertical direction (Cushman-Roisin and Beckers 1990b). To overcome this issue, several anemometers arranged in arrays and supported by meteorological masts have been deployed in several field campaigns (Haugen et al. 1971;Bradley 1983;Taylor and Teunissen 1987;Emeis et al. 1995;Pashow et al. 2001;Berg et al. 2011;Kunkel and Marusic 2006).
Besides the mentioned capabilities, LiDARs present some important limitations, such as reduced range in adverse weather conditions (precipitations, heavy rain or low aerosol concentration) (Liu et al. 2019) and a limited spatio-temporal resolution of this instrument, namely about 20 meters in the radial direction and about 10 Hz in sampling frequency. These technical specifications, associated with the non-stationary wind conditions typically encountered for field experiments, pose major challenges to apply wind LiDARs for the statistical analysis of turbulent atmospheric flows.
In the realm of wind energy, early LiDAR measurements were limited to the qualitative analysis of snapshots of the line-of-sight (LOS) velocity, i.e. the velocity component parallel to the laser beam (Käsler et al. 2010;Clive et al. 2011). Fitting of the wake velocity deficit was also successfully exploited to extract quantitative information about wake evolution from LiDAR measurements (Aitken and Lundquist 2014;Wang and Barthelmie 2015;Kumer et al. 2015;Trujillo et al. 2016;Bodini et al. 2017). To characterize velocity fields with higher statistical significance, the time averages of several LiDAR scans were calculated for time periods with reasonably steady inflow conditions (Iungo and Porté-Agel 2014;Machefaux et al. 2016;Van Dooren et al. 2016). In the case of data collected under different wind and atmospheric conditions, clustering and bin-averaging of LiDAR data were carried out (Machefaux et al. 2016;Garcia et al. 2017;Bromm et al. 2018;Zhan et al. 2019Zhan et al. , 2020. Finally, more advanced techniques for first-order statistical analysis, such as variational methods (Xia et al. 2008;Newsom and Banta 2004), optimal interpolation (Xu and Gong 2002;Kongara et al. 2012), least-squares methods , Navier-Stokes solvers (Astrup et al. 2017;Sekar et al. 2018) were applied for the reconstruction of the velocity vector field from dual-Doppler measurements.
Besides the mean field, the calculation of higher-order statistics from LiDAR data to investigate atmospheric turbulence is still an open problem. In this regard, Eberhard et al. (1989) re-adapted the post-processing of the velocity azimuth display (VAD) scans (Lhermitte 1969;Wilson 1970;Kropfli 1986) to estimate all the components of the Reynolds stress tensor by assuming horizontal homogeneity of the flow within the scanning volume, which can be a limiting constraint for measurements in complex terrains (Bingöl 2009;Frisch 1991). Range height indicator (RHI) scans were used to detect second-order statistics (Bonin et al. 2017), spectra, skewness, dissipation rate of the velocity field, and even heat flux (Gal-Chen et al. 1992).
A typical scanning strategy to obtain high-frequency LiDAR data consists in performing scans with fixed elevation and azimuthal angles of the laser beam while maximizing the sampling frequency (Mayor et al. 1997;O'Connor et al. 2010;Vakkari et al. 2015;Frehlich and Cornman 2002;Debnath et al. 2017a;Choukulkar et al. 2017;Lundquist et al. 2017). Recently, in the context of wind radar technology, but readily applicable to LiDARs as well, a promising method for the estimation of the instantaneous turbulence intensity based on the Taylor hypothesis of frozen turbulence was proposed by Duncan et al. (2019). More advanced techniques exploit additional information of turbulence carried by the spectrum of the back-scattered LiDAR signal (Smalikho 1995). However, this approach requires the availability of LiDAR raw data, which is not generally granted for commercial LiDARs. For a review on turbulence statistical analyses through LiDAR measurements, the reader can refer to Sathe and Mann (2013).
For remote sensing instruments, data are typically collected based on a spherical coordinate system, then interpolated over a Cartesian reference frame oriented with the x-axis in the mean wind-direction. This interpolation can be a source of error (Fuertes Carbajo and Porté-Agel 2018), especially if a linear interpolation method is used (Garcia et al. 2017;Carbajo Fuertes et al. 2018;Beck and Kühn 2017;Astrup et al. 2017). Delaunay triangulation has also been widely adopted for coordinate transformation (Clive et al. 2011;Trujillo et al. 2011;Iungo and Porté-Agel 2014;Trujillo et al. 2016;Machefaux et al. 2016), yet with accuracy not quantified in case of nonuniformly distributed data. It is reasonable to weight the influence of the experimental points on their statistics by the distance from the respective grid centroid, such as using uniform ), hyperbolic (Van Dooren et al. 2016 or Gaussian weights (Newsom et al. 2014;Wang and Barthelmie 2015;Zhan et al. 2019). The use of distance-based Gaussian weights for the interpolation of scattered data over a Cartesian grid is at the base of the Barnes objective analysis (or Barnes scheme) (Barnes 1964), which has been extensively used in meteorology. It represents an iterative statistical ensemble procedure to reconstruct a scalar field arbitrarily sampled in space and low-pass filtered with a cut-off wavelength that is a function of the parameters of the scheme.
The scope of this work is to define a methodology to post-process scattered data of a turbulent velocity field measured through a scanning Doppler wind LiDAR to calculate mean, standard deviation and even higher-order statistical moments on a Cartesian grid. The proposed methodology, referred to as LiDAR Statistical Barnes Objective Analysis (LiSBOA), represents an adaptation of the classic Barnes scheme to N-dimensional domains enabling applications for non-isotropic scalar fields through a coordinate transformation. A major point of novelty of the LiSBOA is the estimation of wind-velocity variance (and eventually higher-order statistics) from the residual field of the mean, which also provides adequate filtering of dispersive stresses due to data variability not connected with the turbulent motion. A criterion for rejection of statistical data affected by aliasing due to the undersampling of the spatial wavelengths under investigation is formulated.
The LiSBOA is assessed against a synthetic scalar field to validate its theoretical response and the formulated error metric. Finally, detailed guidelines for the optimal design of a LiDAR experiment and effective reconstruction of the wind statistics are provided.
The remainder of the manuscript is organized as follows: in section 2 the extension of the Barnes scheme theory to N-dimensional domains and higher-order statistical moments is presented. In section 3, the theoretical response function of the LiSBOA is validated against a synthetic case, while guidelines for proper use of the proposed algorithm are provided in section 4. Finally, concluding remarks are provided in section 5.
2. The Barnes Objective Analysis: fundamentals and extension to statistical N-dimensional analysis The Barnes scheme was originally conceived as an iterative algorithm aiming to interpolate a set of sparse data over a Cartesian grid (Barnes 1964) and it was inspired by the successive correction scheme by Cressman (1959). The first iteration of the algorithm calculates a weightedspace-averaged field, g 0 , over a Cartesian grid from the sampled scalar field, f . The mean field is iteratively modified by adding contributions to recover features characterized by shorter wavelengths, which are inevitably damped by the initial averaging process. In this work, we adopt the most classical form of the Barnes scheme as follows: where g m i is the average field at the i-th grid node with coordinates r i for the m-th iteration, f j is the scalar field sampled at the location r j and φ represents the linear interpolation operator from the Cartesian grid to the sample location. The weights for the sample acquired at the location r j and for the calculation of the statistics of f at the grid node with coordinates r i , w i j , are defined as: where σ is referred to as smoothing parameter and |.| indicates Euclidean norm. For practical reasons, the summations over j are performed over the neighboring points included in a ball with a finite radius R max (also called the radius of influence) and centered at the i-th grid point. In this work, following Barnes (1964), we selected R max = 3σ, which encompasses 99.7% and 97% of the volume of the weighting function in 2D and 3D, respectively.
In literature, there is a lack of consensus for the selection of the total number of iterations (Barnes 1964;Achtemeier 1989;Smith and Leslie 1984;Seaman 1989) and the smoothing parameter (Barnes 1994a;Caracena 1987;Pauley and Wu 1990). A reduction of the smoothing parameter, σ, as a function of the iteration, m, was originally proposed by Barnes (1973); however, this approach resulted to be detrimental in terms of noise suppression (Barnes 1994c).
In the frequency domain, the Barnes objective analysis is tractable as a low-pass filter applied to a scalar field, f , with a response as a function of the spatial wavelength depending on the smoothing parameter, σ, and the number of iterations, m. This feature has been exploited in meteorology to separate small-scale from mesoscale motions (Doswell 1977;Maddox 1980;Gomis and Alonso 1990). The spectral behavior of the Barnes scheme has been traditionally characterized by calculating the so-called continuous response at the m-th iteration, D m (k), with k being the wavenumber vector. D m (k) is defined as the ratio between the amplitude of the Fourier mode e ik·x (with i = √ −1) for the reconstructed field, g m , to its amplitude for in input field, f , in the limit of a continuous distribution of samples and infinite domain. The analytical expression for the continuous response was provided by Barnes (1964) and Pauley and Wu (1990) for 1D and 2D domains, respectively, while in the context of the LiSBOA it is extended to N dimensions to enhance its applicability. Furthermore, besides the spatial variability of f , the temporal coordinate, t, is introduced to determine the response of the statistical moments of f . Although the present discussion hinges on the choice of time as a non-spatial variable, the same approach can be applied to other non-spatial variables, such as Obukhov length to characterize atmospheric stability, operative conditions of a wind turbine, wind direction, assuming that statistical homogeneity holds along that coordinate.
We consider a continuous scalar field, f (x, t), which is defined over an N-dimensional domain, x. It is further assumed that the field f is ergodic in time. By adopting the approach proposed by Pauley and Wu (1990) and by taking advantage of the isotropy of the Gaussian weights (Eq. (2)), we can define the LiSBOA operator at the 0-th iteration as: where t 1 and t 2 are initial and final time. The term within the square brackets represents the mean of f over the considered sampling interval [t 1 , t 2 ], which is indicated as f . Moreover, to reconstruct a generic q-th central statistical moment of the scalar field, f , it is sufficient to apply the LiSBOA operator of Eq. (3) to the fluctuations over f to the q-th power: For practical applications, the mean field f is generally not known, but it can be approximated by the LiSBOA output, g m , interpolated at the sample location through the operator φ. By comparing Eq. (4) with Eq. (3), it is understandable that the response function of any central moment with order higher than one is equal to that of the 0-th iteration response of the mean, g 0 . Indeed, Eq.
(3) can be interpreted as the 0-th iteration of the LiSBOA spatial operator (viz. Eq. (3)) applied to the fluctuation field to the q-th power.
By leveraging the convolution theorem, it is possible to calculate the response function of the mean of the 0-th iteration of the LiSBOA in the frequency domain (see Appendix A for more details). This result, combined with the recursive formula of Barnes (1964) for the response at the generic iteration m, provides the spectral response of the LiSBOA for the mean: where ∆n is the half-wavelength vector associated with k. Equation 5 states that, for a given wavenumber (i.e. half-wavelength), the respective amplitude of the interpolated scalar field, g m , is equal to that of the original scalar field, damped with a function of the smoothing parameter, σ, and the number of iterations, m. This implies that the parameters σ and m should be selected properly to avoid significant damping for wavelengths of interest or dominating the spatial variability of the scalar field under investigation.
For real applications, the actual LiSBOA response function can depart from the above-mentioned theoretical response (Eq. (5)) for the following reasons: • the convolution integral in Eq. (3) is calculated over a ball of finite radius R max ; • f is sampled over a discrete domain and, thus, introducing related limitations, such as the risk of aliasing (Pauley and Wu 1990); • the distribution of the sampling points is usually irregular and non-uniform leading to larger errors where a lower sample density is present (Smith et al. 1986;Smith and Leslie 1984;Buzzi et al. 1991;Barnes 1994a) or in proximity to the domain boundaries (Achtemeier 1986); • an error is introduced by the back-interpolation function, φ, from the Cartesian grid, r i , to the location of the samples, r j (Eq. (1)) (Pauley and Wu 1990).
Before proceeding with further analysis, it is necessary to address the applicability of the LiSBOA to anisotropic and multi-chromatic scalar fields. Generally, the application of the LiSBOA with an isotropic weighting function is not recommended in case of severe anisotropy of the field and/or the data distribution. At the early stages of objective analysis techniques, the use of an anisotropic weighting function was proved to be beneficial to increase accuracy while highlighting patterns elongated along a specific direction, based on empirical (Endlich and Mancuso 1968) and theoretical arguments (Sasaki 1971). Furthermore, the adoption of a directional smoothing parameter, σ p , where p is a generic direction, allows maximizing the utilization of the data retrieved through inherently anisotropic measurements, such as the line-of-sight fields detected by remote sensing instruments (Askelson et al. 2000;Trapp and Doswell 2000). With this in mind, we propose a linear scaling of the physical coordinates before the application of the LiSBOA to recover a pseudo-isotropic velocity field. The scaling reads as: where x * is the origin of the scaled reference frame and ∆n 0,p is the scaling factor for the p-th direction. Hereinafter,· refers to the scaled frame of reference. From a physical standpoint, the scaling is equivalent to the adoption of an anisotropic weighting function, while the re-scaling approach is preferred to ensure generality to the mathematical formulation outlined in this section.
The scaling factor ∆n 0 is an important parameter in the present framework and is referred to as fundamental half-wavelength, while the associated Fourier mode is denoted as the fundamental mode. The selection of the fundamental half-wavelength should be guided by a priori knowledge of the dominant length-scales of the flow in various directions. Modes exhibiting degrees of anisotropy different than that of the selected fundamental mode, will not be isotropic in the scaled mapping, which leads two consequences: first, their response will not be optimal, in the sense that the shortest directional wavelength can produce excessive damping of the specific mode (Askelson et al. 2000); second, the shape preservation of such non-spherical features in the field reconstructed through the LiSBOA is not ensured (Trapp and Doswell 2000).
Regarding the reconstruction of the flow statistics through the LiSBOA, two categories of error can be identified. The first is the statistical error due to the finite number of samples of the scalar field, f , available in time. This error is strictly connected with the local turbulence statistics, the sampling rate, and the duration of the experiment. The second error category is the spatial sampling error, which is due to the discrete sampling of f in the spatial domain x. The Petersen-Middleton theorem (Petersen and Middleton 1962) states that the reconstruction of a continuous and band-limited signal from its samples is possible if and only if the spacing of the sampling points is small enough to ensure non-overlapping of the spectrum of the signal with his replicas distributed over the so-called reciprocal lattice (or grid). The latter is defined as the Fourier transform of the specific sampling lattice. The 1D version of this theorem is the well-known Shannon-Nyquist theorem (Shannon 1984). An application of this theorem to non-uniformly distributed samples, like those measured by remote sensing instruments, is unfeasible due to the lack of periodicity of the sampling points. To circumvent this issue, we adopted the approach suggested by Koch et al. (1983), who defined the random data spacing, ∆d, as the equivalent distance that a certain number of samples enclosed in a certain region, N exp , would have if they were uniformly distributed over a structured Cartesian grid. The generalized form of the random data spacing reads: where V is the volume of the hyper-sphere with radius R max = 3σ centered at the specific grid point and N exp represents the number of not co-located sample locations included within the hypersphere. Then, the Petersen-Middleton theorem for the reconstruction of the generic Fourier mode of half-wavelength ∆n can be translated as the following constraint: ∆d(r i ) < ∆n p , p = 1, 2, ..., N.
Violation of the inequality (8) will lead to local aliasing, with the energy content of the undersampled wavelengths being added to the low-frequency part of the spectrum.

LiSBOA assessment through Monte Carlo simulations
The spectral response of the LiSBOA is studied through the Monte Carlo method. The goal of the present section is twofold: validating the analytical response of mean and variance (Eq. (5)) and characterizing the sampling error of the LiSBOA as a function of the random data spacing.
For these aims, a synthetic 3D scalar field is generated, while its temporal variability is reproduced locally by randomly sampling a normal probability density function. Specifically, the synthetic scalar field is: where ℵ is a generator of random numbers with normal probability density function with mean value 0 and standard deviation equal to 1. The constant 1 in the two terms on the RHS of Eq.
(9) does not affect the LiSBOA response and is introduced to obtain both mean and variance of f equal to the following function: It is noteworthy that f − 1 is a monochromatic isotropic function.
An experimental sampling process is mimicked by evaluating the scalar field f through randomly and uniformly distributed samples collected at the locations r j . The latter are distributed within a cube spanning the range ±10σ in the three Cartesian directions. The total number of sampling points considered for each realization, N s , is varied from 500 up to 20,000 to explore the effects of the sample density on the error. The sampling process is repeated L times for each given distribution of N s points to capture the variability in the field introduced by the operator ℵ. The whole procedure can be considered as an idealized LiDAR experiment where a scan including N s sampling points is performed L times to probe an ergodic turbulent velocity field.
Since the response is only a function of ∆n/σ and m (Eq. (5)), for the spectral characterization of the LiSBOA, the parameter ∆n/σ is varied among the following values: [1,2,3,4,5]. An implementation of the LiSBOA algorithm for discrete samples is then applied to reconstruct the mean g m and variance v m of the scalar field f over a Cartesian structured grid, r i , with a resolution of 0.25. Fig. 1 depicts an example of the reconstruction of the mean scalar field, g m , and its variance, v m , from the Monte Carlo synthetic dataset.
For the error quantification, the 95-th percentile of the absolute error calculated at each grid point r i (AE 95 hereinafter) is adopted: The AE 95 quantifies the discrepancy between the outcome of the LiSBOA and the analytical input damped by the theoretical response evaluated over the Cartesian grid. As highlighted in Eq. (11), the expected value of AE 95 is a function of the half-wavelength over the smoothing parameter, ∆n/σ, the number of iterations, m, the number of samples, N s , and the number of realizations, L.
To investigate the link between AE 95 and the above-mentioned parameters, the Pearson correlation coefficients are analyzed ( Table 1). The number of samples N s , which is inversely proportional to the data spacing ∆d (Eq. 7), is the variable exhibiting the strongest correlation with the error for both mean and variance. This indicates, as expected, that a larger number of samples for each measurement realization is always beneficial for the estimates of the statistics of the scalar field, f . Furthermore, the negative sign of correlations ρ(AE 95 , N s ) and ρ(AE 95 , ∆n/σ), corroborate the hypothesis that the ratio ∆d/∆n, i.e. the number of samples per half-wavelength, is the main driving factor for the sampling error (Koch et al. 1983;Barnes 1994a;Caracena et al. 1984).
The small positive correlation ρ(AE 95 , m) detected for the mean is due to an amplification of the error occurring during the iterative process (Barnes 1964). The issue will be discussed more in detail in section 4. For the variance, ρ(AE 95 , m) is practically negligible, confirming that the response of the higher-order statistics is insensitive to the number of iterations, m. Finally, the negative correlations with L show that the statistical error is inversely proportional to the number of realizations collected. The dependence ρ(AE 95 , L) is mainly due to the statistical error connected with the temporal sampling and, thus, the number of realizations, L, is progressively increased until convergence of the AE 95 is achieved. Fig. 2 displays the behavior of the error as a function of N s and L. The values displayed represent the median for all the wavelength and iterations, being the AE 95 just mildly dependent on these parameters. As Fig. 2 shows, increasing the number of realizations, L, beyond 100 has a negligible effect on the error.
To verify the analytical response of mean and variance of the scalar field, f , a numerical estimator of the response is defined as the median in space of the ratio between the field reconstructed via LiSBOA and the expected value of the synthetic input, as: In the calculation of the numerical response through Eq. (12), the influence of the edges is removed by rejecting points closer than R max to the boundaries of the numerical domain. Furthermore, the zero-crossings of the synthetic sine function (| f − 1| < 0.1) are excluded to avoid singularities. A comparison between the actual and the theoretical response (Eq. (5)) for several wavelengths of the input function is reported in Fig. 3 for the case with the highest number of samples N s = 20, 000. An excellent agreement is observed between the theoretical prediction and the Monte Carlo outcome, which indicates that in the limit of negligible statistical error (large L) and adequate sampling (large N s and near-uniformly distributed samples) the response approaches the predictions obtained from the developed theoretical framework.
The trend of the response of the mean (Fig. 3a) suggests that, for a given wavelength, the same response can be achieved for an infinite number of combinations σ − m and, specifically, a larger σ requires a larger number of iterations, m, to achieve a certain response D m . It is noteworthy that for a smaller number of iterations, m, the slope of the response function is lower. This feature can be beneficial for practical applications for which the LiSBOA response will have small changes for small variations of ∆n. However, a lower slope of the response function can be disadvantageous for short-wavelength noise suppression. Fig. 3b confirms that the response of the variance, and similarly for higher-order statistics, is not a function of the total number of iterations, m, and is equal to the response of the mean for the 0-th iteration, D 0 .
Finally, the link between error and the random data spacing, ∆d, is investigated. In Fig. 4, the discrepancy with respect to theory quantified by the AE 95 is plotted versus the random data spacing normalized by the half-wavelength for a fixed total number of iterations m = 5. The values displayed on the x-axis represent the median over all grid points, r i . This analysis reveals a strong correlation between the normalized random data spacing and the error. This analysis corroborates that, in the limit of negligible statistical error (viz. a high number of realizations, L), uncertainty is mainly driven by the local data density normalized by the wavelength, which is related to the Petersen-Middleton criterion. Indeed, the cases satisfying the Petersen-Middleton constraint (Eq. (8)) are those exhibiting an AE 95 smaller than ∼ 40% of the amplitude of the harmonic function f for both mean and variance. However, if a smaller error is needed, it will be necessary to reduce the maximum threshold value for ∆d/∆n.

Guidelines for an efficient application of the LiSBOA to wind LiDAR data
An efficient application of the LiSBOA to LiDAR data relies on the appropriate selection of the parameters of the algorithm, namely the fundamental half-wavelengths, ∆n 0 , the smoothing parameter, σ, the number of iterations, m, and the spatial discretization of the Cartesian grid, dx.
Furthermore, the data collection strategy must be designed to ensure adequate sampling of the spatial wavelengths of interest, so that the Petersen-Middleton constraint (Eq. (8) with increasing σ the response at the 0-th iteration, D 0 (∆ñ 0 ), reduces, which indicates a lower response for higher-order statistics. For the LiSBOA application, the following aspects should be also considered: • the smaller σ, the smaller the radius of influence of the LiSBOA, R max , and, thus, the lower the number of samples averaged per grid node, N exp , and the greater the statistical uncertainty; • an excessively large m can lead to overfitting of the experimental data and noise amplification (Barnes 1964); • the higher m, the higher the slope of the response function (see Fig. 3), which improves the damping of high-frequency noise, but it produces a larger variation of the response of the mean with different spatial wavelengths; • the radius of influence R max (and therefore σ) can affect the data spacing ∆d in case of non-uniform data distribution.
Few handy combinations of smoothing parameter and total iterations for D m (∆ñ 0 ) = 0.95 are provided in Table 2. As mentioned above, all these σ − m pairs allow achieving roughly the same response for the mean, while the response for the higher-order statistics reduces with an increasing number of iterations, m.
As a final remark about the selection of ∆n 0 , we should consider that, if the fundamental halfwavelength is too large compared to the dominant modes in the flow, small-scale spatial oscillations of f will be smoothed out during the calculation of the mean, with consequent underestimated gradients and incorrect estimates of the high-order statistics due to the dispersive stresses (Arenas et al. 2019). On the other hand, the selection of an overly small ∆n 0 , would require an excessively fine data spacing to satisfy the Petersen-Middleton constraint (Eq. (8)), which may lead to an overly long sampling time or even exceed the sampling capabilities of the LiDAR.
The optimal LiDAR scanning strategy aimed to characterize atmospheric turbulent flows implies finding a trade-off between a sufficiently fine data spacing, which is quantified through ∆d in the present work (Eq. (7)), and an adequate number of time realizations, L, to reduce the temporal statistical uncertainty. Considering a total sampling period T, for which statistical stationarity can be assumed, and a pulsed LiDAR that scans N r points evenly spaced along the LiDAR laser beam, with a range gate ∆r and accumulation time τ a , the total number of collected velocity samples is then equal to N s = N r · T/τ a . The angular resolution of the LiDAR scanning head, ∆θ, can be selected to modify the angular spacing between consecutive line-of-sights (i.e. the data spacing) and the total sampling period for a single scan, τ s (i.e. the number of realizations, L).
The design of a LiDAR scan aiming to reconstruct turbulent statistics of an ergodic flow thorough LiSBOA can be formalized as a two-objective (or Pareto front) optimization problem. The first cost function of the Pareto front, which is referred to as I , is the percentage of grid nodes for which the Petersen-Middleton constraint applied to the smallest half-wavelength of interest (i.e. ∆n 0 ), is not satisfied. Regarding the scaled reference frame, this can be formalized as: where the square brackets are Iverson brackets and N i is the total number of nodes in the Cartesian grid, r i . For a more conservative formulation, it is recommended to reject all the points with a distance smaller than R max from an under-sampled grid node, i.e. with ∆d > 1. This condition will ensure that the statistics are based solely on regions that are adequately sampled. The cost function I depends not only on the angular resolution but also on R max , which is equal to 3σ in this work. In general, increasing σ results in a larger number of samples considered for the calculation of the statistics at each grid point r i and, thus, in a reduction of I . Therefore, a larger σ entails a larger percentage of the spatial domain fulfilling the Petersen-Middleton constraint. The smoothing parameter, σ, also plays a fundamental role in the response of higher-order statistical moments. Specifically, if the reconstruction of the variance or higher-order statistics is important, the response D 0 (∆ñ 0 ) should be included in the Pareto front analysis as an additional constraint.
The second cost function for the optimal design of LiDAR scans, I I , is equal to the standard deviation of the sample mean, which, for an autocorrelated signal, is (Bayley and Hammersley 1946): where ρ p is the autocorrelation function at lag p, τ is the integral time-scale and the approximation is based on George et al. (1978). The velocity variance, u 2 , and the autocorrelation, ρ p , are functions of space; however, to a good degree of approximation, they can be replaced by a representative value and considered as uniform in space. Fig. 6 shows the standard deviation of the sample mean normalized by the standard deviation of the velocity as a function of the number of realizations, L, and for different integral time-scales, τ. It is noteworthy that the standard deviation of the sample mean represents the uncertainty of the time-average of each measurement point, r j , while the final uncertainty of the mean field at the grid nodes r i is generally reduced due to the spatial averaging process intrinsic to the LiSBOA.
The whole procedure for the design of a LiDAR scan and retrieval of the statistics is reported in the flow chart of Fig. 7. Summarizing, from a preliminary analysis of the velocity field under investigation, we determine the maximum total sampling time, T, the characteristic integral timescale, τ, the characteristic velocity variance, u 2 , the fundamental half-wavelengths ∆n 0 . This information, together with the settings of the LiDAR (namely, the accumulation time, τ a , the number of points per beam, N r , and the gate length, ∆r), allow for generating the Pareto front as a function of ∆θ and for different values of σ. Based on the specific goals of the LiDAR campaign in terms of coverage of the selected domain (i.e. I ), the statistical significance of the data (i.e. I I ) and, eventually, the response of the higher-order statistical moments (i.e. D 0 (∆ñ 0 )), the LiSBOA user should select the optimal angular resolution, ∆θ, and the set of allowable σ values. Due to the above-mentioned non-ideal effects on the LiSBOA, the selection of σ is finalized during the post-processing phase when the LiDAR dataset is available and the statistics can be calculated for different pairs of σ − m values. For the resolution of the Cartesian grid, Koch et al. (1983) suggested that it should be chosen as a fraction of the data spacing, which, in turn, is linked to the fundamental half-wavelength. The same author suggested a grid spacing included in the range dx ∈ [∆n 0 /3, ∆n 0 /2]. In this work, we have used dx = ∆n 0 /4, which ensures a good grid resolution with acceptable computational costs.
By following the steps outlined in the present section, the mean, variance, or even higher-order statistical moments of the velocity field can be accurately reconstructed for the wavelengths of interest. It is worth mentioning that the LiSBOA of wind LiDAR data should always be combined with a robust quality-control process of the raw measurements. Indeed, the space-time averaging operated by the LiSBOA makes the data analysis sensitive to the presence of data outliers, which need to be identified and rejected beforehand to prevent contamination of the final statistics.

Conclusions
A revisited Barnes objective analysis for sparse and non-uniformly distributed LiDAR data has been formulated to calculate mean, variance, and higher-order statistics of the wind velocity field over a structured N-dimensional Cartesian grid. This LiDAR Statistical Barnes Objective Analysis In the companion paper (Letizia et al.), the LiSBOA is applied to velocity fields associated with wind turbine wakes obtained through large-eddy simulations and LiDAR measurements. As a final note, this work is intended to contribute to the improvement and standardization of the LiDAR data collection and analysis methodology. We believe that the formulation of validated tools for quantitative analysis of LiDAR data represents an important process to unleash the full potential of the scanning LiDAR technology for investigations of atmospheric turbulent flows.

Derivation of the analytical response function of the LiSBOA
The first iteration of the LiSBOA produces a weighted average in space of the scalar field, f , with the weights being Gaussian functions centered at the specific grid nodes, x. In the limit of a continuous function defined over an infinite domain, Eq. (3) represents the convolution between the scalar field, f , and the Gaussian weights, w. Therefore, the response function of the LiSBOA, can be expressed in the spectral domain as (Pauley and Wu 1990): where the operator F indicates the Fourier transform (FT). The FT of the weighting function in Eq.
(A1) can be conveniently recast as the product of N one-dimensional FT: where k p is the directional wavenumber and i = √ −1. Hence, by leveraging the closed-form FT of the Gaussian function (Greenberg 1998): we get the desired results, i.e.:   Table 2. Selected combinations of σ and m to achieve a ∼95% recovery of the mean of the selected fundamental half-wavelength and associated response of the higher-order moments ( Table 2. F . 6. Standard deviation of the sample mean normalized by the standard deviation of velocity as a function of the number of realizations, L, and for different values of the ratio between the integral time-scale and the sampling time, τ/τ s . F . 7. Schematic of the LiSBOA procedure for the optimal design of LiDAR scans and reconstruction of the statistics for a turbulent ergodic flow.