Sampling Error in Aircraft Flux Measurements Based on a High-Resolution Large Eddy Simulation of the Marine Boundary Layer

,

sults are compared with the theoretically derived formula of Lenschow and Stankov (1986).In support of these comparisons, we also evaluate and document the relevant integral length scales and correlations and show that for heights up to approximately 100 m (z/z i = 0.12), the length scales are accurately predicted by empirical expressions of the form I f = Az b .The Lenschow and Stankov expression is found to be remarkably accurate at predicting the random error for shorter flight tracks, but our empirically determined errors decay more rapidly with L than the L −1/2 relationship pre-

Introduction
The eddy correlation (EC) method has long been the principal means of making field measurements of turbulent fluxes of energy, momentum, and trace gases in the planetary boundary layer (PBL).While towers are commonly used to measure fluxes over longer time periods (weeks to years) at fixed locations, aircraft are the preferred platform for obtain-ing estimates of fluxes over larger areas, albeit for far shorter periods of time.
By their nature, EC estimates of fluxes are averages of quantities that randomly fluctuate on time scales ranging from fractions of a second to hours and over distance scales from centimeters to kilometers.Thus, in addition to the usual factors affecting measurement quality from any instrument, such as calibration, precision, time response, and site representativeness, the quality of eddy correlation estimates are critically subject to statistical sampling error as well as to potentially restrictive assumptions about temporal stationarity and/or spatial homogeneity.
In this paper, we are concerned exclusively with the problem of random sampling error in the context of aircraft flux measurements relying on in situ (e.g., gust probe) measurements of fluctuating wind vector and scalar quantities directly along the flight path.The key question that has been asked for decades is, how long of a flight path -whether consisting of a single extended leg or a series of shorter legs over a smaller region -is sufficient to obtain turbulent flux estimates of useful precision?This problem was examined theoretically, based on statistical models of turbulence, by Lenschow and Stankov (1986), Lenschow et al. (1994), and Mann and Lenschow (1994), with additional refinement of these ideas by Mahrt (1998).Others took an empirical approach, comparing aircraft flux estimates with those from nearby fixed towers (Desjardins et al., 1989;Grossman, 1992;Mahrt, 1998).Building on the theoretical arguments of Mahrt (1998), Hollinger and Richardson (2005) examined random errors in tower flux measurements; their findings have potential applicability to aircraft measurements as well.Finkelstein and Sims (2001) used what was characterized as a more general yet rigorous derivation to arrive at a differ- ent expression for the flux sampling error.Based on the properties of test data sets for several variables, their computed flux errors were typically about a factor of two larger than those predicted by the expressions of Mann and Lenschow (1994).
More recently, it has become possible to use large eddy simulation (LES) models to resolve turbulent fluctuations of mass, energy, and motion in the planetary boundary layer at fairly fine scales.Virtual tower or aircraft can then sample these fluctuations in a manner consistent with the platform (Schröter et al., 2000;Sühring and Raasch, 2013;Sühring et al., 2019).An attractive feature of this approach is the unique ability to compare the resulting flux estimate with the known "true" domain-averaged value, in contrast to the case for real-world comparisons between inherently dissimilar and imperfect aircraft and tower measurements of the area-averaged flux.The LES approach is also gaining considerable interest for its ability to examine the measurement and interpretation of flux measurements in the presence of spatial inhomogeneities (Sühring and Raasch, 2013;Sühring et al., 2019).
The single most important limitation of the LES approach has been the difficulty of accommodating the full range of spatial scales relevant to turbulent flux measurements.It is therefore noteworthy that Matheou (2018) has undertaken an LES run of exceptional grid size (4096×4096×1200) and fine resolution (1.25 m) to simulate a nocturnal cloud-topped marine boundary layer over a uniform 5.12×5.12km domain.
The present paper examines the aircraft sampling problem utilizing the single final time step of that simulation.The nature of the simulation allows us to cleanly isolate and study the statistical sampling problem over a fairly wide range of spatial wavelengths without the need to consider complicating factors such as spatial inhomogeneity or subgrid-scale flux contributions.
Because the simulation utilized cyclic lateral boundary conditions, it is possible to construct simulated flight paths that are continuous over much longer distances and yet do not resample the same locations in the domain.We are thus able to focus narrowly on the precision of aircraft flux measurements as a function of flight track length in the particular environment represented by this simulation.Our methods and motivations are in many ways similar to those of Schröter et al. (2000) and Sühring and Raasch (2013).However, we place greater emphasis on documenting the specific properties of the environment that most strongly affect sampling error; in particular, the integral length scale I f and the spatial correlation ρ w,ψ between vertical velocity w and an arbitrary scalar variable ψ, both of which appear in theoretical expressions for sampling error derived by Lenschow and Stankov (1986) and Lenschow et al. (1994).
For ease of comparison with the earlier error studies using LES, Table 1 summarizes key features of those studies.The present study benefits from both the finest grid resolution and largest grid size, but the overall horizontal domain dimension is smaller than that used by the two most recent previous studies.Ours is also unique in using an LES to ex- amine the aircraft flux sampling problem in a marine rather than continental environment.

LES description
The LES utilizes a buoyancy-adjusted stretched-vortex model (BASVM) described in detail by Chung and Matheou (2014) and Matheou and Chung (2014).In what is possibly the largest LES model run ever published, Matheou (2018) (Stevens et al., 2003).The observed stratocumulus case was described by mission participants as "very homogeneous" (Earth Observing Laboratory, 2007).
The LES grid resolution is 1.25 m in x, y, and z, and the grid size is 4096×4096×1200, corresponding to a The single final time step of the simulation, comprising a total of ∼800 GB of gridded data, was utilized for this study.Output included all grid-resolved model variables, including temperature T , specific humidity q, wind velocity components u, v, and w, and cloud water content q l .

Environment
Consistent with the observed environment on the flight date, the model was initialized with a boundary layer depth z i =840 m delineated by a sharp 8.5 K temperature inversion.Below z i , the initial potential temperature θ was 289 K and the specific humidity q was 9 g kg −1 .These values did not change significantly during the simulated evolution of the boundary layer (Fig. 1).A prescribed sea surface temperature of 292.5 K was used, leading to domain-averaged sensible and latent heat fluxes of 15 W m −2 and 115 W m −2 , respectively.The wind was initialized as geostrophic; at the conclusion of the simulation, the mean horizontal wind components u and v below z i were nearly uniform at 6.7 m s −1 and −5.0 m s −1 , respectively, corresponding to a wind di- The cloud base and top, defined here as the lowest and highest levels at which non-zero cloud liquid water occurred at any point in the horizontal domain, were found at 500 m and 885 m, respectively.We are concerned in this study primarily with the sub-cloud boundary layer at and below 400 m or 0.48z i .

Turbulent structure and domain-averaged fluxes
Many details concerning the model run and the overall tur-10 bulent structure of the simulated boundary layer are given by Matheou (2018).Here we focus on those grid-resolvable turbulent properties most directly relevant to an understanding of the simulated flux measurements.
Figure 2a shows that, at 400 m height (0.48z i ), the LES produces a vertical velocity spectrum approximately obeying a Kolmogorov power law for the vertical velocity field over more than two decades, from wavelengths greater than 1 km down to ∼8 m or about three times the Nyquist wavelength of 2.5 m.At this height, peak energy in the spectrum is near 1 km wavelength.At the much lower height of 40 m (0.05z i ), peak energy is found near 150 m, and there is substantially less energy at longer wavelengths.In both cases, the peak energy in w is thus found near ∼4 times the physical distance z from the lower boundary, consistent with dominant contributions from circular eddies of radius z centered at height z.
The spectra for temperature (Fig. 2b) and specific humidity (Fig. 2c) reveal considerably more energy at the longest wavelengths, even at 40 m height, while also rolling off earlier than w at the short wavelength end.The spectrum for temperature is also somewhat flatter than for either w or q.As expected, the spectrum for horizontal wind speed U (Fig. 2d) is notable in having no significant dependence on height.Overall, the spectra of scalar variables give reason for some confidence in the realism of the LES for wavelengths spanning the range from ∼10 m to ∼1 km.
Throughout this paper, we will highlight results at four representative heights: 10 m, 40 m, 100 m, and 400 m.These correspond to approximately 0.01z i , 0.05z i , 0.12z i , and 0.48z i , respectively.
Horizontal cross-sections of temperature T , specific humidity q , and vertical velocity w are presented for these heights in Figs.3-5, where the primed quantities indicate deviations from the domain-averaged value at each height.These show the expected evolution from predominantly finescale structure near the surface to consolidated perturbations of order 1 km diameter at 400 m.In the w field near the surface, there is a visually obvious alignment of structures with the mean wind.This directional anisotropy is far less apparent in q and is completely absent from T at low levels.
As shown in Fig. 6, the amplitudes of turbulent fluctuations also vary with height, with σ T decreasing from 0.12 K at 10 m to 0.04 K at 400 m and σ q decreasing from 0.36 g kg −2 to 0.13 g kg −2 .σ w , on the other hand, increases from 0.32 m s −1 to 0.72 m s −1 .
The local contributions to the turbulent fluxes of sensible and latent heat are (1) 10 and where ρ is the air density, C p = 1005 J kg −1 K −1 is the specific heat capacity at constant pressure, and L ≈ 2.5 × 10 6 J kg −1 is the latent heat of vaporization of water.Cross- sections of H and E are shown in Figs.7 and 8.We additionally compute the local u and v components of momentum flux as ( Where appropriate in the discussion below, the horizontal 20 wind vector (u, v) is rotated into a coordinate system aligned with the mean wind direction so as to isolate the along-wind and crosswind components (U , U ⊥ ).Domain-averaged vertical fluxes and turbulent kinetic energy (TKE) are depicted in Fig. 9.These do not include parameterized subgrid-scale fluxes, which become negligible relative to resolved eddy fluxes above about the 5th grid level, or 8 m.These fluxes serve as the "truth" values against which simulated aircraft flux measurements will be evaluated.As expected, the profile of sensible heat flux H (Fig. 9a) is quite linear except within cloud, where latent heating due to condensation sharply increases the positive correlation between T and w..It crosses through zero near 445 m (0.53z i ).The profile of latent heat flux is noticeably less smooth, as are τ u and τ v (Fig. 9b).Turbulent kinetic energy varies only weakly below cloud and has a sharp maximum at cloud top, as expected.(Fig. 9c).
Figure 10 presents ogive (cumulative distribution) plots of the spectral contributions to each flux.For all four flux variables, there is a pronounced shift from predominantly shortwavelength contributions at the lowest levels to much lowerfrequency contributions at 400 m.In particular, we find that 90% of the total sensible heat flux contribution comes from the wavelength ranges [6 m, 146 m], [8 m, 0.51 km], [12 m, 1.0 km], and [17 m, 2.6 km] at heights of 10 m, 40 m, 100 m, and 400 m, respectively, with qualitatively similar ranges for latent heat flux and momentum flux.Overall, these results suggest that 100 m is the height at which spectra of flux contributions for all variables are least likely to be biased by either the unphysical spectral roll-off below 10 m (see Fig. 2) or the LES domain size of 5.12 km.
3 Integral length scales

Significance
Central to the problem of estimating random error in aircraft measurements of turbulent fluxes is the integral length scale, also known as the autocorrelation length, of the fields being measured (Lenschow and Stankov, 1986;Lenschow et al., 1994;Sühring and Raasch, 2013): where A f (r) is the autocorrelation of generic spatial function f (x, y) at displacement r = x 2 + y 2 in any specified direction, A f (0) ≡ 1, and r 0 is the distance to the first zero crossing or one-half of the domain diameter (in this case 2.56 km), whichever is smaller.In the ideal "red noise" case, A f (r) = exp(−r/I f ), and A f (I f ) = e −1 .The length scale is thus useful for characterizing the minimum track length and/or spatial separation required in order to be able to treat two sets of measurements as statistically independent.

Comparison with proxy estimates
Apparent in the above results is that ( 6) and ( 7) generally yield large overestimates of I wψ for any scalar variable ψ.
For example, at 100 m height and measured in the crosswind direction, I q = 220 m and I w = 73 m.I q I w is then 127 m, a factor of 2.5 larger than the directly computed I wq = 51 m.In general, the minimum overestimate at or below 400 m was found to be about a factor of 2 or greater, with ratios in excess of 3-5 typical at lower heights.The difference increases further if one uses the correlation coefficient ρ w,ψ < 1 in the denominator per (7).Specific combinations of variables and heights can be easily examined using the relationships in Table 2.

Correlations
Also required in the evaluation of random flux errors from ( 5) is the correlation ρ w,ψ between the vertical velocity w and the scalar variable ψ of interest.Computed profiles of ρ w,ψ are given in Fig. 12.For horizontal momentum, we computed ρ w,U = ρ 2 w,u + ρ 2 w,v .
4 Simulated aircraft measurements

Track definitions
The central purpose of this paper is to investigate the empirical relationship between sampling error in turbulent flux measurements and the length of a continuous, ideal aircraft track through the virtual atmosphere represented by the LES domain.We require each track to be perfectly cyclic, albeit with a period L substantially longer than the dimensions of the domain.Specifically, track angles relative the x-axis are defined by φ n = arctan(n).For φ 1 = 45 • , the track makes a single pass through the domain before returning to its starting point, and the track length is 7.2 km.A continuation of the track would simply resample the same grid points.For φ 2 = 63.4 • , the track makes two widely separated passes through the domain before returning to its starting point, and the non-repeating track length is 11.4 km.The maximum value of n used was 20, for a 102.5 km continuous, nonoverlapping path.
For any given n, starting points for different tracks of the same length are uniformly distributed along the boundary of the domain, maintaining prescribed minimum separation (measured perpendicular to the paths) intended to ensure a reasonable degree of statistical independence.These minimum separations were chosen to be greater than I f for all relevant variables at the level in question; we used 20 m, 50 m, 100 m, and 150 m at heights of 10 m, 40 m, 100 m, and 400 m, respectively.
The effect of the minimum separation requirement is to achieve a roughly constant overall sampling density for the ensemble of tracks of a given length.Thus, N n is small for long tracks and larger for short tracks.By taking mirror images and 90 • rotations of the tracks defined in this way, the total number of distinct tracks available is actually 4N n , except for n = 1, in which case it is 2N n owing to duplication when rotating and flipping the 45 • track patterns.Figure 13 depicts an example track for n = 3 (top) and the full corresponding set of 32 distinct tracks (bottom) for a relatively wide minimum separation of 200 m used for visual clarity; the number of distinct tracks is significantly greater in the actual analysis.
Note that the algorithm described above is more elaborate than the fixed-angle track definitions utilized by Schröter et al. (2000) and Sühring et al. (2019).To reiterate, we are requiring each track of length L i to be cyclic with period L i , and we are distributing the realizations of a given track length so as to both maximize and equalize their separation, subject to the aforementioned minimum separations.This allows us to obtain an optimal set of quasi-independent tracks.Constraining the tracks to be cyclic also means that there is by definition no trend or unsampled low frequency contribution to be concerned about-"measured" fluxes are the same regardless of the starting point along the track, as long as the total distance followed by the virtual aircraft is L i .

Virtual flux observations
The "truth" value for each comparison is the domainaveraged flux value computed at each level (Fig. 9).These values utilized the deviations (primed quantities) from the true ensemble mean values for the entire domain.As measured against the true values, aircraft measurements are subject to two statistical sources of error: 1) error in the determination of the mean values relative to which primed quantities will be calculated , and 2) insufficient sampling of the fluctuating primed quantities themselves.Our virtual measurements simulate both effects by utilizing the track-determined mean values rather than the domain-averaged values.Both sources of error decrease asymptotically to zero in the limit of an infinitely long flight track, and the flux estimate will converge on the true ensemble value.
In the finite 5.12 km×5.12 km horizontal domain of our LES, there is the additional issue that a single long flight track can effectively sample the entire domain, so that the aircraft measurement can converge almost perfectly on the "true" value from measurements along a long but finite path.
In principle, then, the results obtained herein for long flight tracks should be considered lower bounds on the expected random error.However, because of the minimum separation maintained between parallel segments of a given tracks, we believe this potential bias is unimportant for the flight track lengths considered here.More important may be the inability of the finite LES domain to reproduce flux variations with wavelengths greater than the domain size.As previously discussed for Fig. 10, this seems likely to be an issue mainly at heights of ∼400 m and above.
Finally, a notable constraint rooted in the availability of only a single time step from the LES is that the virtual aircraft effectively traverses each path instantaneously.We are in effect relying on Taylor's "frozen turbulence" hypothesis and, following Lenschow and Stankov (1986), postulate that this hypothesis is valid in the present case if I f < V T f , where I f is the integral length scale, V is the speed of the aircraft, and T f is the autocorrelation time of the flux field being measured.Unfortunately, we are unable to determine T f from our single time-step, but for a real-world V ≈ 85 m sec −1 similar to that of the Univerity of Wyoming KingAir, we would only require T f > 1 s to satisfy the above inequality for flux measurements at a height of 100 m, based on the values of I f seen in Fig. 11.

Results
Raw sample results are presented in Figs.14-16, with each blue dot representing the estimated flux from a single flight track of the indicated length.The true domain-averaged values are indicated as dashed black lines, and the root-meansquare (RMS) deviation from the true value is depicted as a  red dashed line.For all flux variables, there is the expected decay in sampling error as the track length becomes longer.
Noteworthy features include the following: -Overall scatter is largest at the highest altitudes, consistent with the longer integral length scales I f and with 55 the common observation by previous authors that error tends to increase with height.
-For sensible heat flux at 400 m (Fig. 14d), the magnitude of the scatter is often far in excess of the very low (1.8 W m −2 ) mean flux itself, so that shorter track 60  lengths cannot even distinguish the sign of the mean flux.
-Relative to the mean values, scatter is particularly large for momentum flux.Because this quantity is positive-  definite (τ = τ 2 u + τ 2 v ), the distribution of errors at 400 m is strongly skewed and positively biased.
In Figs.17-19, we take a closer look at the dependence of sampling error on path length and compare our results with the random error predicted by (5).The black dots correspond to the empirical RMS error values previously represented in Figs.14-16 as red dashed curves.The black dashed lines represent fits to those values.The accompanying power-law expression for each fit appears in black.In red we have the minimum track length L 10 required for 10% relative accuracy.Finally, the blue line represents (5), with the parameter values required by that equation appearing in blue as well, where both I f and the correlation coefficients ρ w,ψ are computed directly from the LES-generated fields Noteworthy findings include the following: -Equation ( 5) due to Lenschow and Stankov (1986) and using the independently computed parameter values is remarkably accurate at predicting the random error for shorter track lengths.
-Empirical fits generally show a steeper decrease with track length than the L −1/2 relationship predicted by 10 (5), with exponents ranging from −0.64 to −0.70 at 10 m and 40 m, to nearly −1 for sensible and latent heat fluxes measured at a height 400 m.It is not clear whether the more rapid decay in error is primarily related to the finite domain size-with this issue being most likely problematic at 400 m-or to other departures from the idealized statistical model of turbulence assumed by Lenschow et al. (1994).
-For flights at and below 100 m, track lengths of approximately 30 km or less are sufficient to achieve 10% pre-20 cision in sensible and latent heat flux estimates in this particular environment.Required track lengths for momentum flux τ are considerably longer; nearly 90 km at 100 m height.
-At a height of 400 m, no reasonable track length is long enough to adequately measure the low sensible heat and momentum fluxes encountered there.

Conclusions
In this paper, we utilized the high-resolution large-eddy simulation of the marine boundary layer by Matheou (2018) to determine random flux sampling errors by a virtual aircraft flying tracks of various lengths at heights of 10 m, 40 m, 100 m, and 400 m.We eliminated the need for detrending by constructing these tracks to be cyclic with periods of the specified lengths.
We compared these empirical results with the theoretically derived expression of Lenschow and Stankov (1986).In support of these comparisons, we computed the required integral length scales I f and found that these are well described by expressions of the form I f = Az b for z ≤ 100 m, with coefficents A and b given in Table 2.The empirical exponents b are generally between about 2/3 and 1 for fluxes of sensible heat, latent heat, and momentum, in contrast to Lenschow and Stankov (1986), who found exponents of 1/3 to 1/2.The reason for this apparent discrepancy is unknown.We also evaluated the commonly cited approximation for the integral length scale for fluxes given by ( 6) or ( 7) and found that it consistently overestimated the true value by at least a factor of 2-5.
Using directly computed parameter values, we found that (5) is remarkably accurate at predicting random errors for short track lengths.However, it predicts a decay in error proportional to L −1/2 , whereas our empirical results show substantially more rapid decay in many cases.It is uncertain to what degree the difference is due to the finite domain size of the LES, to departures from the idealized statistical assumptions of Lenschow and Stankov (1986), or perhaps to a combination of both.It seems less likely that the first of these issues would affect the results for measurements at lower heights, as there is then very little contribution to total fluxes by wavelengths greater than 1 km, according to Fig. 10.
In general, we find that track lengths of ∼30 km or less are sufficient for measuring sensible and latent heat fluxes in this simulated environment to better than 10% precision, provided that flights are conducted at or below 100 m.Substantially longer flight tracks are required for momentum flux, a result traceable to the low correlation ρ U,w (and thus low corresponding fluxes) between the horizontal and vertical wind components even near the surface.

Figure 1 .
Figure 1.Domain-averaged profiles of a) potential temperature θ and b) specific humidity q at the end of the LES run.Dashed lines indicate the heights of the minimum and maximum altitudes at which non-zero cloud water occurred at any point in the domain.
used the BASVM to simulate a nocturnal stratocumulus case over the Pacific ocean southwest of Los Angles 10 corresponding to the 10 July 2001 research flight (RF01) of the second Dynamics and Chemistry of Marine Stratocumulus (DYCOMS-II) field study

Figure 2 .
Figure 2. Radially averaged power spectra for selected scalar variables at two representative heights 40 m = 0.05zi and 400 m = 0.48zi: a) vertical velocity, b) temperature, c) specific humidity, d) horizontal wind speed.For reference, black dashed lines depict the power-law exponent of −2/3 expected for a Kolmogorov spectrum in the inertial subrange.

Figure 14 .
Figure 14.Flux estimates vs. track lengths for sensible heat H. Red dashed curves depict the root-mean-squared deviation of estimates from the true domain-averaged flux (horizontal dashed line).

Figure 15 .
Figure 15.Flux estimates vs. track lengths for latent heat E. Red dashed curves depict the root-mean-squared deviation of estimates from the true domain-averaged flux (horizontal dashed line).

Figure 16 .
Figure 16.Flux estimates vs. track lengths for horizontal momentum τ (Reynolds stress).Red dashed curves depict the root-meansquared deviation of estimates from the true domain-averaged flux (horizontal dashed line).

Figure 17 .
Figure17.Relationship between ensemble RMS error and track length for sensible heat flux H. Black dots correspond to the red curves in Fig.14.An empirical power-law fit is depicted by the dashed black line, with the coefficients of the fit indicated in the expression for σ f .L10 (in red) indicates the track length required to achieve a 10% relative error.The solid blue line represents (5) using the parameter values given in blue.

Figure 18 .
Figure 18.Same as Fig. 17 but for latent heat flux E.

Table 1 .
Studies examining the sampling error problem in aircraft flux measurements with the aid of large eddy simulations (LES).

Table 2 .
Coefficients for the integral length scale I f ≈ Az b , valid up to at least 100 m altitude in this simulation, depending on the variable (see Fig.11), where both IL and z are in meters.Also given is the approximation I f ≈ Ā for selected variables.