Detecting turbulent structures on single Doppler lidar large datasets: an automated classification method for horizontal scans

10 Turbulent structures can be observed using horizontal scans from single Doppler lidar or radar systems. Despite the ability to detect the structures manually on the images, this method would be time-consuming on large datasets, thus limiting the possibilities to perform studies of the turbulent structures properties over more than a few days. In order to overcome this problem, an automated classification method was developed, based on the observations recorded by a scanning Doppler lidar (LEOSPHERE WLS100) and installed atop a 75-m tower in Paris city centre (France) during a 2-months campaign 15 (September-October 2014). The lidar recorded 4577 quasi-horizontal scans for which the turbulent component of the radial wind speed was determined using the velocity azimuth display method. Three turbulent structures types were identified by visual examination of the wind fields: unaligned thermals, rolls and streaks. A learning ensemble of 150 turbulent patterns was classified manually relying on in-situ and satellite data. The differences between the three types of structures were highlighted by enhancing the contrast of the images and computing four texture parameters (correlation, contrast, homogeneity and energy) 20 that were provided to the supervised machine learning algorithm (quadratic discriminate analysis). Using the 10-fold cross validation method, the classification error was estimated to be about 9.2% for the training ensemble and 3.3% in particular for streaks. The trained algorithm applied to the whole scan ensemble detected turbulent structures on 54 % of the scans, among which 34 % were coherent turbulent structures (rolls, streaks). 25 https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c © Author(s) 2020. CC BY 4.0 License.


Introduction
The understanding of the connection between atmospheric physics and air pollutants' dispersion is a necessary step for improving regulation and monitoring of atmospheric pollution. The level of pollution is highly dependent on the weather and particularly on turbulence (Roth, 2007). The pollution peaks occur during weather conditions where the pollutants' dispersion is restrained e.g. stagnant conditions, low-altitude thermal inversion, low turbulence (Kallos et al., 1993).

30
Turbulent flows are motions characterized by high unpredictability. Nevertheless, coherent structures are developed in these flows (Tur and Levich, 1992). The principal aspect that determines a coherent structure is the instantaneously space and phase correlated vorticity of the turbulent fluid mass over the spatial extend of the flow structure. Furthermore, a coherent structure must maintain its form for a time period sufficient for time-averaged statistics calculations (Hussain, 1983). The most typical types of coherent structures are presented in the review of Young et al (2002), who classified structures into three 35 characteristic types: turbulent streaks, convective rolls and gravity waves.
The high wind shear between the surface layer and lower planetary boundary layer (PBL) can lead to the formation of the turbulent streaks in the surface layer that may extend to the mixed layer. Neutral or near-neutral stratification favours the formation of streaks though they may also form during stable and unstable conditions (Khanna and Brasseur, 1998). The physics behind their formation differs as the contribution of buoyancy varies in relation to the atmospheric conditions (Moeng 40 and Sullivan, 1994). Turbulent streaks are aligned with the horizontal wind with alternating stripes of stronger horizontal wind associated with a subsidence and stripes of weaker horizontal wind associated with an ascendance (Khanna and Brasseur, 1998). Formation, evolution and decay of streaks are rather short, equivalent to several tens of minutes, before they regenerate.
The average streak spacing is usually hundreds of meters (Drobinski and Foster, 2003).
In the mixed layer, horizontal roll vortices, also known as convective rolls, develop roughly aligned with the mean wind 45 (Lemone, 1972). Favourable conditions for the development and maintenance of convective rolls are the spatial variations of surface-layer heat flux, the low-level wind shear and the relatively homogeneous surface characteristics (Weckwerth and Parsons, 2006). As the rolls rotate in the vertical plane, they generate ascending and descending motions. These motions under convective conditions can form clouds in rows separated by clear sky areas known as cloud streets which is a characteristic visual feature used to identify rolls (Lohou et al., 1998). The rolls usually extend from the surface to the capping inversion 50 with a large variety of horizontal sizes from few kilometers to few tens of kilometers. They are characterized by long lifespan of hours or even days as opposed to the short lifespan of the streaks (Drobinski and Foster, 2003). Young et al (2002) distinguish rolls in narrow mixed-layer rolls, where the ascending air masses are one thermal wide (Weckwerth et al., 1999) and wide mixed-layer rolls, where multiple thermals are grouped within each ascending area (Brümmer, 1999). As Young et al (2002) stated, both types of rolls can be distinguished visually, with the narrow rolls having the form of a "string of pearls" 55 whereas the wide rolls look like a "band of froth".
Remote sensors are exceptionally useful for the identification of coherent turbulent structures. Their ability to scan large areas in a short period is advantageous compared to in situ measurements (Kunkel et al., 1980). Lhermitte (1962), Browning & Wexler (1968) were the first to implement the velocity azimuth display (VAD) technique, also known as plan position indicator (PPI) method, using Doppler radars. The PPI technique provides conical scans or even horizontal surface 60 scans with the appropriate combination of elevation and azimuth angles. Kropfli & Kohn in 1978 were able to study horizontal roll structures by using a dual-Doppler radar in order to observe the wind field in the three dimensions. Several studies followed for different type of radars with more efficient configurations (Kelly, 1982;Lohou et al., 1998;Reinking et al., 1981). Weckwerth et al. (1999) were able to study the evolution of horizontal convective rolls by combining Doppler radar observations with meteorological measurements, radiosondes, flight measurements and satellite images.

65
In recent years, various studies have been carried out by using lidars only. The PPI method can also be applied to Doppler lidars. Depending on the selected scanning method of the Doppler lidar it is possible to observe coherent turbulent structures in the atmospheric surface layer (Drobinski et al., 2004) as well as in the mixed-layer (Drobinski et al., 1998).
https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License. Newsom et al. (2008) and Iwai et al. (2008) introduced the dual-Doppler lidar method and revealed its benefits in the observation of coherent turbulent structures. This method was further improved by Träumner et al. (2015) using an optimized 70 dual-Doppler technique. They were able to identify different type of turbulent structures including elongated areas resembling turbulent streaks. They combined a two-dimensional autocorrelation function with the observation of the scans by eye.
However, the subjective classification by observing the images is a time-consuming approach. Furthermore, the use of two Doppler lidars is limited to the institutes that can afford such a high cost and collaborations on short-term campaigns. A much less expensive approach, and suitable for long periods, is to detect the passage of turbulent structures on sonic anemometer 75 time series. For instance, Barthlott et al. (2007), analysed 10 month of data from a meteorological tower located in the surface layer 20 km south of Paris, France and they observed coherent turbulent structures for 36% of the cases.
This study aims to identify the coherent turbulent structures on single Doppler lidar horizontal scans and develop an automatic classification process in order to handle large datasets. The structures are determined by combining texture analysis and machine learning technique. It relies on the observations of radial wind speed recorded using a scanning Doppler lidar 80 settled atop a 75 m-high tower in the centre of Paris, during a 2-month period in late summer/early fall. Section 2 presents the experimental set up of the study. The methodology for the identification and classification of the turbulent structures is demonstrated in Section 0. Subsequently, the results of the classification for the training ensemble as well as for the whole dataset are displayed in Section 4. Finally, the key points of the paper are summarized in Section 5.  The Doppler shift frequency between the emitted laser beam and the light backscattered by the aerosols is measured by heterodyne detection associated with Fast Fourier Transform as explained analytically by Kumer et al. 95 (2014) and Veselovskii et al. (2016). The lidar is sensitive only to the radial wind speed i.e. the wind projection along the light beam (counted positive when going away from the lidar). Table 2 showcases the scanning sequence as it was implemented during the VEGILOT campaign. For the classification of the turbulent structures, we focused in the current study on the almost horizontal PPI scans (1° elevation angle). During those scans, the lidar emitted beams in azimuth angles from 0° to 360° for a 2° resolution. This scenario was repeated every 18 minutes hence providing 4577 PPI scans 100 during the whole campaign. The duration of each scan was 3 minutes which is sufficiently short for the observation of https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License.

Experimental set up
structures. The maximum range of the scans reached 5 km (see white circle of Figure 1b) with a spatial resolution of 50 m. Due to the 1° elevation, the beam rise by about 87 m between the central point and the point at the 5 km.

Plan Position Indicator (PPI)
Almost horizontal scans near surface

Line of Sight (LOS)
Fixed direction shots

115
Assuming a homogeneous wind field for horizontal PPI scans, the radial wind measurements taken for the different beams at a given distance from the lidar should follow a cosine function of the azimuth angle, due to the projection of the wind along the beam direction (Eberhard et al., 1989). For instance, the observations at 2 km from the lidar (black ring on Figure 2a) are displayed on Figure 2(b) and can be fitted by a cosine function in the form of Eq. (1): where is the mean wind speed, max is the wind direction, is the azimuth angle of the beam and is the offset (Browning 120 and Wexler, 1968;Lhermitte, 1962). It is noteworthy that the value of is much smaller than for our data. It is possible to retrieve the mean wind from all the "rings" and subsequently calculate the mean wind projected on the beam direction which is displayed on Figure 2  of the patterns, the sign of the ′ in the current study is positive when the radial wind speed is stronger than the mean wind 125 speed and negative when it is weaker as it is illustrated in the sign convention of Figure 2(b). Jussie site is located in an urban area nearby hills, hence the surface roughness or the orography can affect the homogeneity of the wind field. As a result, in some cases the radial wind field does not follow a cosine function, and therefore the VAD method cannot be applied. This is apparent especially at night when low winds (below 2 m/s) do not have a defined direction (Wilson et al., 1976). Figure 3 presents a case where the radial wind field is not homogeneous. The radial wind speed 135 values e.g. at 2 km did not follow a cosine function (Figure 3b). Finally concerning streaks, a driving factor for their formation is the existence of a strong wind shear near the surface.
The observation of the horizontal wind profiles from the DBS scans revealed when the wind shear was higher than 2 m/s, which is defined as the threshold for nocturnal low level jet events (Stull, 1988) (Figure 4f). For many cases, the wind shear was accompanied by turbulent streaks patterns (Figure 4c) so or the training ensemble, only night cases of streaks were selected 165 to ensure that wind shear was the primary factor for the generation of turbulence. In total, 30 cases of each structure type were selected for the training ensemble with an extra category representing all the patterns that are not classified in the other three categories, such as chaotic patterns or cases when the VAD method cannot be applied (Figure 3).

175
In order to retrieve comparable texture analysis parameters from the turbulent wind field of the scans, the radial turbulent wind field was rotated so that the mean wind direction was aligned to the vertical (0° corresponds to a wind blowing from the North). Then, the coordinates were converted from polar to Cartesian. It was also important to adjust the contrast of the image so that the difference between the areas of positive and negative turbulent wind speed became more prominent.  For the automated classification of patterns, we need to map them to a space of corresponding numerical parameters. Each 185 reconstructed turbulent wind field is represented by a matrix (cells corresponds to pixels) from which 8×8 co-occurrence matrices (CM) can be constructed (Haralick et al., 1973).  Table 3: Co-occurrence matrix after the image pre-processing (Figure 5b) for the first neighbour ( = 1) and for a cell pair aligned with the mean wind and oriented in the same direction (azimuth = 0°  On the other hand, Table 4 shows the CM of Figure 5(b) for the third neighbour ( = 3) and for a cell pair oriented perpendicularly to the mean wind (transverse direction) with a clockwise rotation (azimuth angle = +90°). In this case, the occurrences have been distributed to the cells [1,1] and [8,8], as well as to the cells [1,8] and [8,1]. As we can see on Figure 5(b), the structures alternate between positive and negative values in the direction transverse to the mean wind, thus creating this difference in the CM compared to Table 3. 210

Texture analysis parameters for the classification of the turbulent structures
It is possible to compute several texture analysis parameters from each CM. Srivastava et al. (2018) were able to distinguish different synthetic patterns by using four texture analysis parameters: correlation, contrast, homogeneity and energy (Haralick et al., 1973;Yang et al., 2012). In their study, the striped patterns resemble the elongated patterns of streaks and rolls that we observe in the radial turbulent wind field. Therefore, the same texture analysis parameters were selected for calculation in our 215 dataset. More particularly, these parameters were computed by the Eq.
Correlation: At a given neighbour order , it is then possible to study the dependence of the texture parameters to the azimuth angle (see an example of such a dependence on Figure 6). The streaks and rolls have a more prominent peak in the longitudinal https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License. direction ( = 0°) compared to the unaligned thermals and "others" patterns. As streaks and rolls are aligned with the mean 225 wind (azimuth = 0°), those peaks result from the elongated shapes of these patterns. Three parameters of the curve in Figure 6 were selected in order to distinguish the different types of structures. For instance, fore the homogeneity curves, these parameters are defined by the Eq. (6), (7) and (8): 230 Amplitude: . Integral: . Symmetry: .
These three curve parameters were calculated for the four texture analysis parameters and for each of the thirty neighbour orders, which gives 360 parameters. In addition to these parameters, the UTC hour (close to solar time in Paris), the average mean wind speed and the root-mean-square error of the cosine fit ( Figure 2b) were included in the classification parameters.

235
The total number of classification parameters associated with each scan was therefore 363.

Algorithm training and classification error
In order to classify the turbulent structures according to the aforementioned texture analysis parameters, the supervised machine learning methodology was applied. The Quadratic Discriminant Analysis (QDA) algorithm was used, that minimizes 240 the total error probability of the classification, assuming that features of each class have a multidimensional Gaussian distribution. The QDA was trained (Hastie et al., 2009;Sokolov et al., 2020) with the 150-case ensemble described in Section 3.1: 30 cases of streaks, 30 cases of rolls, 30 cases of unaligned thermals and 60 cases of "others". The category of others was represented by twice more cases since it is expected to be the dominant category in the classification, as it includes the chaotic radial turbulent wind fields and the cases where the turbulent wind field was not computed successfully by the VAD method.

245
The classification error of the QDA technique could be estimated for the training ensemble by means of the 10-fold cross validation. In this method, the algorithm is trained using 90% of the training ensemble (135 cases), then it is applied to the remaining 10% (15 cases) and the resulting (output) classes are compared to the expected (target) classes. The process is https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License.
repeated 10 times, each time extracting a different 10% sample for test, until the entire training ensemble has been tested. The classification error is the total number of misclassified cases, expressed as a fraction of the training ensemble.

250
As the number of dimensions of the feature space (363) was significantly higher than the number of patterns of the training ensemble (150), the application of all the features leads to the curse of the dimensionality problem, when the classification works well only for the training data and fails for the test set. In order to deal with this problem, we reduced the feature space by selecting the most informative components using the stepwise forward selection algorithm (Sokolov et al., 2020). The resulting sequence of these components and the decrease of the 10-fold cross validation classification error are presented in 255 Figure 7. The classification error reached a minimum of about 9.2% when five parameters were used; taking more into account increased the classification error.
In particular, these parameters were the amplitude and symmetry of the homogeneity curve, the integral and the amplitude of the contrast curve and the integral from the correlation curve. As explained in Section 0, elongated structures show a more prominent peak in the homogeneity (or the opposite for contrast) curve and as a result, they are characterized by a higher 260 amplitude compared to unaligned thermals and "others" patterns. Similarly, the values of the integral of the curves with prominent peaks differentiate from quasi constant curves that occur for patterns with large enclosed areas (thermals) or for chaotic ones ("others"). The significant parameters cover various distances from the 2nd neighbour (100 m) to the 18th (900 m), hence demonstrating the ability of the algorithm to distinguish structures with different sizes. It is noteworthy that the curve parameters play a more significant role in the classification of the structures in comparison to time, mean wind field 265 and cosine fit RMSE. The detailed results of the cross-validation of QDA classification for the algorithm with five predictors are displayed in Table 5. The algorithm performs the most precise classification for the streaks with a classification error of 270 only 3.3% as one case was misclassified as rolls. Regarding the category "others", the results are equivalently accurate with a classification error of 3.3% as two cases were misclassified as thermals. Moreover, the performance of the algorithm for rolls was good with a classification error of 10% with 3 cases were misclassified as thermals. Thermals were the most troublesome type for classification by the algorithm, the algorithm classified correctly 24 cases. Four cases were misclassified as rolls and 2 cases as "others" showing a classification error of 20%. 275 https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License. Table 5: Confusion matrix calculated for the training dataset. The "target class" corresponds to the eye-made classification while the "output class" corresponds to the class attributed by the algorithm. Therefore, the cells in the "roll" column, for instance, give the number of roll cases that were classified properly (roll line) or improperly (other lines) in the different categories.

285
The algorithm classifies 54% of the two-month dataset as containing turbulent structures and 34% in particular as coherent turbulent structures (streaks, rolls). The most frequent cases of turbulent structures were streaks (25%) and the least frequent were rolls (9%). It is important to note that, in our classification, we consider only thermals and rolls during daytime. Figure   9 illustrates the number of occurrences for each type of structure at a particular time of the day during the two months of the campaign. It is evident that despite time was a much less significant classifier compared to the curves parameters, the structures 290 show a time preference. There were scarcely any rolls cases observed at night, though a few unaligned thermals were classified at night. This stems from the training process, where some cases of thermal were improperly classified as "other" and the reverse. On the contrary "others" cases were mostly observed during the night. This was expected since the cases of low winds https://doi.org/10.5194/amt-2020-82 Preprint. Discussion started: 30 April 2020 c Author(s) 2020. CC BY 4.0 License.
with no defined direction -when the VAD method cannot be applied-occur mainly during the night. We also see that streaks were observed more frequently during the night, when mechanical turbulence becomes dominant. This was also expected as 295 the nocturnal low level jets is a main driving factor for the formation of streaks.

Conclusions
The current study showcases that it is possible to identify and classify turbulent structures such as streaks, rolls and 300 unaligned thermals with horizontal scans from a single Doppler lidar. More particularly a two-months campaign in Paris (VEGILOT) provided radial wind speed observation for 4577 horizontal scans with a 1° elevation. The VAD method was used to retrieve the radial turbulent wind field. By analysing patterns of the radial turbulent wind field, it was evident that the elongated patterns (streaks, rolls) showed a prominent peak for the curves of the texture analysis parameters compared to more chaotic or enclosed patterns (unaligned thermals).

305
In order to apply the supervised machine learning methodology for the classification of the ensemble, a training ensemble of 150 cases was selected by combining visual examination of the patterns and studying characteristic physical properties corresponding to streaks, rolls and unaligned thermals. For this purpose, the QDA algorithm with stepwise forward selection of the features was applied and its performance was estimated using the cross-validation technique.
The algorithm allowed classifying correctly about 91% of the dataset using five texture analysis parameters as predictors.

310
The algorithm performed best for the category of turbulent streaks with a classification error of only 3.3%. Time, mean wind speed and the cosine fit RMSE of the VAD method were not selected by the algorithm for the classification. The whole ensemble of the 4577 scans was classified by the trained QDA algorithm using the five selected texture analysis parameters.
The results showed that 54% of cases were classified as structures among which 34% were coherent turbulent structures (streaks, rolls). The streaks were mostly observed during night whereas the thermals and rolls were almost exclusively observed 315 during the day, with only a few cases classified between sunset and sunrise. The classified ensemble can be used for statistical studies of the turbulent structures physical parameters, such as structure size as a function of weather conditions (PBL height, temperature, wind speed, radiation etc.). Moreover, the development of the structures can be analysed and comprehended.

Data availability
All lidar data used in the study are property of the Laboratoire de Physico-Chimie de l'Atmosphere (LPCA), Dunkirk,