Detection of supercooled liquid water clouds with ceilometers: Development and evaluation of deterministic and data-driven retrievals

. Cloud and aerosol lidars measuring backscatter and depolarization ratio are most suitable instruments to detect cloud phase (liquid, ice, or mixed phase). However, such instruments are not widely deployed as part of operational networks. In this study, we propose a new algorithm to detect supercooled liquid water clouds based solely on ceilometers measuring only 15 co-polarisation backscatter. We utilise observations collected at Davis, Antarctica, where low-level, mixed phase clouds, including supercooled liquid water (SLW) droplets and ice crystals remain poorly understood, due to the paucity of ground-based observations. A 3-month set of observations were collected during the austral summer of November 2018 – February 2019, with a variety of instruments including a depolarization lidar and a W-Band cloud radar which were used to build a 2-dimensional cloud phase mask distinguishing SLW and mixed phase clouds. This cloud phase mask is used as the reference 20 to develop a new algorithm based on the observations of a single polarisation ceilometer operating in the vicinity for the same period. Deterministic and data-driven retrieval approaches were evaluated: an extreme gradient boosting (XGBoost) framework ingesting backscatter average characteristics was the most effective method at reproducing the classification obtained with the combined radar-lidar approach with an accuracy as high as 0.91. This study provides a new SLW retrieval approach based solely on ceilometer data and highlights the considerable benefits of these instruments to provide intelligence 25 on cloud phase in polar regions that usually suffer from a paucity of observations. the cloud base and okta have been derived. In this study, raw data was collected with the University of Canterbury Vaisala CL51, e.g. , the full backscattered profile with a range of 15 km, a vertical resolution of 10 m and a time resolution of 15 s. These observations covered the PLATO period (November 2018 February 2019) for which the depolarization lidar and the W-Band radar operated and extended till October 2019. The data was pre-processed using a dedicated software developed by Kuma et al. (2021), namely the Automatic Lidar and Ceilometer Framework (ALCF). This software allows the processing of raw data from a variety of lidars and serves as a platform for comparing observations and models. Here, we used the version 1.1 of the software for processing the raw data generated by the Vaisala CL51 to: (1) daily netCDF files from the hourly Vaisala file format; (2) remove noise by applying a noise removal algorithm and subsampling the data to 5 min, 50 bins; (3) calibrate the attenuated backscatter using the approach of Hopkin et al. The


Introduction
Mixed-phase clouds play a critical role in the earth radiation budget, through their complex interactions with incoming and outgoing shortwave and longwave radiation. This effect is particularly important at higher latitudes with variation in radiation resolved profiles of cloud phase, which broadly followed the algorithms developed by Alexander and Protat (2018) and Noh et al. (2019). The raw 355nm lidar backscatter profiles were first processed to remove background noise and correct for beam overlap. We calibrated the lidar following the method of O'Connor et al. (2004), who demonstrated that the lidar ratio is constant within optically thick liquid non-precipitating stratocumulus clouds. We scaled the raw signal until the observed lidar ratio matched the theoretical lidar ratio within these clouds. Then, the calibration values obtained during stratocumulus for the limited number of optically thick clouds present above Davis were used to calibrate the three months of data collected during 125 the summer.
Following calibration, we used a speckle removal technique to flag spurious noise which is ubiquitous at high altitudes in both the parallel and perpendicular channels. We followed the method of Alexander et al. (2021) who used a first pass of the algorithm to extract bright clouds (with large vertical gradient in backscatter) in the co-polarised channel, and then assigned cloud phase based upon the layer-averaged backscatter and depolarisation (Hu et al., 2010). We isolated additional 130 hydrometeors and aerosols based on pixels which had depolarisation ratios exceeding molecular backscatter and variances within empirically determined thresholds. A region-of-interest analysis to extract conjoined regions removed any spurious pixels initially flagged as hydrometeors or aerosols. The result of these steps was a much greater detection of ice virga than only using the parallel backscatter because thin ice virga has large depolarisation ratios, making them readily detectable in the perpendicular channel. It also allowed attribution of liquid precipitation reaching the surface, because this second stage of the 135 algorithm didn't require vertical gradients of backscatter to determine the presence of hydrometeors.

Vaisala CL51 ceilometer observations
In operational settings, ceilometers usually report cloud base heights and oktas (percentage of cloud cover over a given area) without providing information on cloud phase. However, the instruments operating in the near infrared spectrum record the full backscattered profile from which the cloud base and okta have been derived. In this study, raw data was collected with the final pre-processed products were daily netCDF files including the total attenuated volume backscattering coefficient (ß, m -1 sr -1 ) at a resolution of 5 min and bin vertical resolution of 50 m.

ECMWF ERA5
The latest-generation reanalysis product ERA5 from the European Centre for Medium-Range Weather Forecasts (ECMWF) was used in this study (Hersbach et al., 2020). The ERA5 hourly data on pressure levels were extracted via the Copernicus portal (https://cds.climate.copernicus.eu) as monthly netCDF files containing the geopotential, potential vorticity (pv, K m 2 kg -1 s -1 ), relative humidity (r, %), air temperature (t, K), the specific cloud ice water content (ciwc, kg kg -1 ), the specific cloud 155 liquid water content (clwc, kg kg -1 ), the specific rain water content (crwc, kg kg -1 ), the specific snow water content (cswc, kg kg -1 ), the horizontal components of the wind speed (u and v, m s -1 ) and the vertical velocity (w, Pa s -1 ). During the YOPP, enhanced observations were conducted including four radio soundings per day at Davis, instead of two during normal periods.
The YOPP covered approximately the period with a concomitant operation of the W-Band radar and depolarisation lidar.

160
The nearest ERA5 grid point (located at 68.5 ºS, 78.0 ºE) to the location of the ceilometer, W-Band radar and depolarisation lidar was used as the centre of 9 neighbouring grid points forming a square. All the extracted variable fields were averaged to reduce potential spatial variability effects and reduce noise. A sensitivity to this averaging approach was also performed using only the central grid point and the averaging effect on the temperature and humidity fields was considered negligible. The vertical pressure level fields were linearly interpolated to the 50 m vertical resolution grid of the ceilometer, and the hourly 165 variables were linearly interpolated to 5 min to match the ceilometer time resolution.

Radar-lidar merged cloud phase mask
This cloud phase product is obtained by merging information obtained from the W-Band radar and the depolarisation lidar.

170
The principle is the same as the approach from Delanoë and Hogan (2010) with satellite-based sensors, which combined observations from CloudSat and CALIPSO, taking advantages of the different sensitivities of the radar and the lidar. The underlying principle for SLW versus mixed-phase classification of a grid point is that the W-Band radar is not sensitive enough to detect very small, supercooled liquid water droplets. As a result, when a value of reflectivity is measured for the grid point labelled as SLW by the lidar, it implies that there must be ice particles in the volume generating a backscattered radar signal, 175 mixed with SLW droplets as detected by the lidar. In this paper, the pipeline to produce the cloud phase mask was based on the procedure described in Noh et al. (2019), with some modifications.
In the first stage, the lidar and radar are re-gridded to the same temporal and vertical resolution, to create a new merged grid at 15-m vertical resolution, and 1-min temporal resolution. ERA5 reanalysis data (hourly on pressure levels) were extracted for the closest grid point to Davis and linearly interpolated in time and space onto the merged radar-lidar grid.

180
The second stage is to incorporate the cloud phase product from the RMAN lidar, as described in detail in Section 2.1.1 above.
In the third stage, the original lidar-only cloud phase labelling is refined utilising the cloud radar reflectivity field. If there is no measurement of reflectivity above the noise level for the grid point, we assume that there is no ice. In that case, points labelled "SLW" in the lidar-only cloud phase classification remain labelled as "SLW". Conversely, the presence of an observed radar reflectivity implies that there is ice in the volume as well, which triggers a new classification of the grid point as "Mixed-185 phase". Finally, the last stage consists in utilising the radar reflectivity to identify signals at subfreezing temperatures, while the lidar backscatter is fully attenuated by lower clouds and doesn't provide any information on the cloud phase. This case triggers the grid point to be labelled "Unknown" as there is no possibility to distinguish Ice particles-only from a mixed-phase, although there is certainty that these grid points are not containing only SLW (Noh et al., 2019). "Unknown" could therefore be interpreted as "Ice or mixed-phase" if needed.

Ceilometer cloud phase mask based on T19
The first cloud phase mask presented herein is based solely on ceilometer observations, following the work from T19. Liquid water droplets generate very high values of the ceilometer backscatter signal, and subsequent strong attenuation in the vertical 195 profile above the altitude of liquid water. T19 proposed a modification from the Cloudnet approach (Illingworth et al., 2007), utilising the shape of the attenuated backscatter profile, instead of using a single threshold value. The input to the technique is the pre-processed ceilometer dataset, e.g., the 50 m gate resolution, 5 min calibrated attenuated backscatter processed with ALCF as explained previously.

200
The exact approach proposed by T19 was implemented herein: the maximum of a localised peak value in the vertical profile of the backscatter is found, instead of the first value above a given threshold as in Cloudnet. However, the maximum of the peak value needs to exceed the same threshold value as in Cloudnet, namely the pivot ß value of ß = 2 x 10 -5 m -1 sr -1 , together with a maximum peak width at half height set at 150 m. The combination of these two criteria allows the identification of a rapidly attenuating signal, which is typical of liquid water layers. In the case of T19, this method of identification enabled 205 capture of the base of precipitating clouds, but the authors also noted its potential application for in-cloud icing detection. The authors also showed the possibility of identifying multiple peaks within the same profile with this method. However, they did not specify how to label the 50 m bins. We decided that based on the above, the altitude bin corresponding to the location of the peak (if found) was labelled as liquid water. Further to this, a reclassification was done to distinguish supercooled liquid https://doi.org/10.5194/amt-2022-10 Preprint. water from other liquid water based on the interpolated re-analyses temperature fields: if the temperature T was between 0 ℃ 210 and above -38 ℃, the grid points were classified as supercooled liquid water, otherwise they were classified as liquid water.
In addition to liquid water, precipitation and ice clouds were identified following the same approach as T19, by selecting grid points with values of backscatter above ß = 3 x 10 -6 m -1 sr -1 with a thickness of at least 350 m, e.g. 7 consecutive grid points satisfying this criteria, therefore showing no attenuation within at least 350 m. The base of these clouds was accordingly the 215 lowest grid point of the points within the profile satisfying these criteria. As noted by T19, liquid layers can be identified within precipitation and ice clouds as defined utilising our algorithm.
Fog is a phenomenon that probably occurs relatively frequently in the Southern Ocean and some regions of Antarctica (Lazzara, 2008), although few studies are available in the literature to accurately quantify its occurrence (Kuma et al., 2020). The same 220 method as T19 was again used here, detecting fog layers by identifying values of backscatter above ß = 10 -5 m -1 sr -1 for the lowest grid point (corresponding to 0-50 m above the surface) and a ß value 250 m above the instrument of ß < 3 x 10 -7 m -1 sr -Based on detailed observations of the cloud phase mask from T19 for days with substantial amounts of clouds, a large amount of speckle in the retrieved SLW phase was observed, corresponding to timesteps for which the radar-lidar cloud mask did not observe any SLW. This led us to investigate if an alternative algorithm could perform better for these conditions. Importantly, the concomitant high-resolution and robust observations of cloud phase using the combination of radar and lidar provided us with a reference that could be used to develop and validate our new algorithm.

230
This new algorithm relies on an initial signal analysis of each attenuated backscattered profile, as in T19, but also makes use of the statistical properties of the full dataset. It is based on a data-driven model including a learning and testing phase using the reference radar-lidar cloud mask.

235
< Figure 2 here > The first step is to build a dataset and design, train and test a supervised model. This first step is summarised in the flowchart in Figure 2. First, we detected all peaks in the dataset that had at least a width of 50 m, and a peak value of ß = 2 x 10 -5 m -1 sr -features were attributed to characterise peak properties: the value of the backscatter at peak location, the peak width, the value of the backscatter at peak width, the peak prominence (e.g. the difference between the peak value and the surrounding baseline), the peak altitude above ground level, the total number of detected peaks for a given profile and if several peaks, the order of the peak within that total number, with the lowest peak taking the number '0'. In addition to this, the peak temperature was also extracted using interpolated ERA5 fields. This 8-feature dataset of peak properties was then labelled for each row with a Boolean attribute based on the detection of either supercooled liquid water or mixed phase by the radar-lidar mask for that timestep (True: detection of SLW/Mixed Phase, False: no detection of SLW/Mixed Phase). We also accounted for the problem caused by signal extinction in multi-layer SLW situations: peak properties of a single peak corresponding to SLW with no extinction other than molecular in the lower levels, cannot be directly compared in terms of backscatter values to peak properties of a presumed SLW peak at higher altitude for which the signal would have undergone substantial extinction by 250 clouds or the presence of SLW of mixed phase below. The properties of a peak that would have undergone extinction will therefore see lower values of the value of the attenuated backscatter at peak location. To account for this extinction effect, we compared the value of the backscatter at peak location for single peaks, and for multiple peaks that would have experienced extinction (peaks with a peak order > 0). For single peaks, SLW data-only were selected based on the Boolean condition defined using the radar-lidar cloud mask. For multiple peaks, an arbitrary set of conditions must be defined to extract only 255 potential SLW peaks from the multiple peaks. These conditions were based on the observed statistical distribution of peak properties and were empirically set as: the width of the peak must be < 4, the peak width height must be > 40 x 10 -6 m -1 sr -1 , and the peak prominence must be > 60 x 10 -6 m -1 sr -1 . The distributions of the two sets of data (single peaks and multiple peaks) and their Kernel Density Estimates (KDEs) are shown in Figure 3. One would expect the same effect for multiple peaks, but the values at peak would be smaller due to varying degrees of extinction of the backscatter signal at lower levels. The difference between the median value of the single peak distribution 265 and the multiple peak distribution can be calculated and is equal to 4.20 x 10 -5 m -1 sr -1 . Our hypothesis was that for such a large amount of data in both cases (single and multiple peaks), this difference, or offset, was the value of the average extinction due to the presence of various phases in the cloud on the lower altitude that affects the potential SLW peaks. Adding this offset to the value of the backscatter at peak location for all datapoints of the multiple peaks' distribution would therefore "adjust" their peak values. This is shown as well in Figure 3; with the added offset, the adjusted distribution of multiple peaks covers roughly adjustment of the peak value for some of the peaks, the multiple peaks for which the timesteps were labelled "true" which did not meet the arbitrary criteria also had their labelling changed from "True" to "False". Following the above, a new "adjusted" peak properties dataset can be used for further analysis.

275
The next step of our algorithm development was to design, train and test a data-driven model that could predict the label of each of the peaks. A relatively novel tree-based ensemble method was proposed by Chen and Guestrin (2016), e.g. extreme gradient boosting or XGBoost, which is an improved version of gradient boosting with the advantages of reducing overfitting and computational costs. The excellent performances of this method for a wide range of applications, consistently 280 outperforming other methods such as Support Vector Machines or Random Forest led us to select this approach here. The principle of this algorithm relies on a "boosting" strategy, where predictions of "weak" learners are combined to produce a "strong" learner by utilising additive training strategies. The computational cost is reduced by allowing parallel computations during the training phase . Here, we only cover the fundamental principles of the additive learning, and the reader should refer to Chen and Guestrin (2015) for more details. The first learner was initially fitted to all input data; a second 285 model was then fitted to the residuals to reduce the disadvantage of the "weak" learner. This process of fitting was repeated several times until the model satisfied a predefined criterion. The prediction of the model for a given set of hyperparameters was obtained by combining the predictions of each learner. The function that describes the prediction at each step t can be written as eq. (1): where # ( ! ) is the learner at step t, ! (#&') and ! (#) are the predictions at steps t-1 and t, and ! is the input variable.
The extreme gradient boosting model uses the below expression to evaluate the model performance, eq. (2): where l is the loss function, n is the number of observations and Ω is the regularisation term defined as eq. (3): where is the regularisation parameter, is the minimum loss needed to partition the leaf node and is the vector of the 300 scores in the model leaves.
Features were passed to the model for training using a 3 Stratified k-folds cross validation. Stratified K-fold is a variation of k-fold (Ojala and Garriga, 2010)  Finally, once our model has been trained and tested on the data, the third step was to apply the algorithm including the trained model subsequently to each profile. This approach is summarised in the flowchart in Figure 4.

315
< Figure 4 here > The algorithm treated each vertical profile of attenuated backscatter sequentially: in the first step, peaks were detected, and their associated properties computed. If no peaks were detected, the timestep corresponding to that backscatter profile was 320 labelled as "not SLW". If several peaks were detected, a sequential pipeline as seen in Fig. 4 was implemented to correct potential SLW peaks for extinction using the statistical properties obtained in the pre-processing stage. For that given timestep, one or more peaks could be identified together with their properties. These peak features were passed to the previously trained XGBoost model for labelling, either "SLW" or "not SLW". If a given peak was labelled as "SLW", the corresponding bin at peak altitude was labelled 'SLW", as well as the surrounding bins, with the SLW lower and upper boundaries defined as twice 325 the peak width (at mid height) value. With this algorithm, several layers of SLW can be identified within a single profile, and SLW layers can be identified within or outside a cloud.

Strategies for intercomparison of cloud masks
As mentioned previously, the resolution of the merged radar-lidar mask was 1-min and 15-m, while the resolution of the ceilometer cloud mask was coarser (5-min, 50-m). For both masks, linearly interpolated hourly ERA5 variables fields have 330 been used. In order to compare both masks various strategies regarding resolutions can be considered.
For the masks intercomparison, the coarser resolution of the ceilometer mask was used, and the radar-lidar mask was subsampled to 5-min timestamps. Since spatial variabilities could occur at the finer vertical resolution of the W-Band, depolarisation lidar and ceilometer, grid to grid comparison of the masks was not considered suitable and relevant given the 335 objectives of this study. Instead, a comparison timestep to timestep was performed, integrating the information available over each vertical column. Then, two different strategies were used to subsample the merged radar-lidar mask: (i) the matching timestamps of both masks were found, and if SLW or mixed-phase was identified in one bin of the merged radar-lidar mask, that timestep was labelled as positive, otherwise it was labelled as negative. Similarly, if SLW was identified in one of the bins of the vertical column for the ceilometer mask, that timestamp was labelled as positive; (ii) a condition on the spatio-temporal 340 structure of SLW and mixed-phase bins was considered over 5-min periods: for the timestamp to be labelled as positive, a given number of consecutive bins labelled SLW or mixed-phase need to be found at the same height. Threshold values for consecutive bins were set at 3, 4 and 5 and this criterion was applied as a 5-min moving window on the merged radar-lidar cloud mask to produce three subsets of data corresponding to this second strategy. Labelling of SLW for the ceilometer cloud mask was performed similarly to the first strategy.
The recall is the ratio of true positives over the number of true positives and false negatives, e.g. the ability of the classification 365 to find all the positive samples and is defined as eq. (5): The f1 score (or hereafter also described as "accuracy") is defined as the arithmetic average of the precision and the recall, with its best value at 1 and its worst score at 0, with equal contribution of recall and precision (eq. 6): Note that in the current case of binary classification, positive and negative labelling can be inverted so that f1 scores can be calculated both for positive and negative cases. The accuracy is then calculated as the harmonic mean of the positive and negative f1 scores. The balanced accuracy on the other hand is defined as the arithmetic mean of the positive and negative 375 recall.

Results
3.1 Ceilometer backscatter profile analysis: the 6 th of January 2019 case study 380 < Figure 5 here > To illustrate the disparity between the various cloud phase retrievals, observations from the 6 th of January 2019 were selected as an example, as these included both low and higher clouds, with SLW present both within deep clouds or isolated from deep clouds. In figure 5, one can see the calibrated attenuated backscatter from the ceilometer (Fig. 5a), the reference cloud phase mask combining radar and lidar (Fig. 5b), and the cloud phase attribution from two retrievals following T19 (Fig. 5c), and the trained XGBoost model described in this work (Fig. 5d). The XGBoost algorithm was trained on a different set of the data, which excluded the 6 th of January 2019. During the first part of the day, until around 11:30 UTC, the radar-lidar cloud phase mask shows little occurrence of SLW or mixed phase, except at the very beginning of the day (the first 10 min) and around 2:00 UTC. In the second part of the day, clear horizontal bands of SLW can be observed, including times with clouds or precipitation below the SLW bands. The ceilometer backscatter (Fig. 5a) showed distinct signal patterns for the first and second part of the day, but visually it remains difficult to clearly distinguish strong backscatter during the first part of the day, that could indicate the presence of SLW.

395
< Figure 6 here > Figure 6 shows five selected vertical profiles of attenuated backscatter (Fig. 5a) for that day, chosen to illustrate the diversity 400 in attenuated backscattering signal profiles. It also shows the theoretical molecular backscatter at the wavelength of the ceilometer, the identified peaks, as well as their average properties. For the profiles A, D and E, SLW or mixed phase were identified in the reference radar-lidar mask. These three backscatter profiles presented common characteristics, such as a narrow peak (low value of the peak width at mid-height), relatively high values of the attenuated backscatter (above 10 -4 m -1 sr -1 ), and high prominences (high values of the difference between peak value and the surrounding baseline). Conversely, peaks 405 B and C present much wider peaks (higher values of peak width at mid-height), and smaller values of the prominence and were classified as ice.
< Figure 7 here > 410 In Figure 7, the average peak properties such as peak value, peak width, peak width height, the number of peaks per profile, peak altitude for each of the attenuated backscatter profiles for which peak had been identified for the 6 th of January 2019 are presented as scatter plots with peak β value as the y-axis. We chose to use peak value as the y-axis as this is the most discriminant peak characteristic, and by analysing scatter plots, we can visually observe clustering patterns. In Figure 7, a 415 single ß profile can generate multiple datapoints if several peaks were observed for that profile. The points A, D and E were well clustered within the same region in Fig. 7a and Fig. 7b, corresponding to high values of ß at peak, low values of the peak width and peak width height. Peaks A and E also appeared in the same region in Fig. 7c. Generally, SLW was observed when a single peak was observed for the ß profile. This was not always the case, as sometimes, SLW can be observed when several peaks are present: for instance, on the 6 th of January, the isolines showed that several instances of SLW were observed when 420 two peaks were present, but this dropped dramatically for three or four peaks. For that specific day, the scatter plot with peak altitude (Fig. 7d) shows that SLW is more frequent at higher altitudes with three clusters around 1000 m, 2000 m and 3500 m AGL, corresponding to the spatial organisation of SLW (Fig. 5b), while non-SLW peaks are far more frequent at lower altitudes (cluster located between 0 and 1000 m AGL). These non-SLW peaks are associated to low-level clouds as no fog was identified over that three-month period.

425
The data shown in Figures 5, 6, 7 demonstrate that peak average characteristics exhibit very specific features that could be used to detect the occurrence of SLW. The original approach from T19 was already skilful in identifying SLW. For the second half of the day on January 6 th , 2019, there was a very good match between the cloud mask based on T19 (Fig. 5c) and the reference cloud mask (Fig. 5b). Conversely, for the first half of the day, the T19 approach identified multiple SLW regions 430 within the cloud producing a speckle pattern of SLW. This was not observed by the radar-lidar observations and algorithm and was thus probably wrongly labelled by the T19 approach. Our new approach based on the use of average peak characteristics and a dedicated trained algorithm using the radar-lidar reference showed a great improvement in the retrievals: the second half of the day remained like the retrieval of T19, although labelling thicker bands of SLW utilising the peak width property. These thicker horizontal bands of SLW were more in line with the radar-lidar reference, which showed occurrence of SLW or mixed 435 phase of about the same thickness. For the first half of the day, our approach outperforms the T19 approach, by removing the spurious speckle patterns while keeping the correct detection of SLW at the very beginning of the day, also observed in the radar-lidar cloud mask (profile A in Fig. 6a). The SLW around 2:00 UTC found in the radar-lidar cloud phase mask was also retained by our new technique. An occurrence of SLW at around 5:00 UTC was present in our retrieval but not in the reference radar-lidar cloud mask. This may be due to an error in our retrieval, an error in the phase assignment in the lidar product, or 440 different observations made by the lidar, radar and ceilometer at that timestep. Nonetheless, our retrieval performed very well: computed accuracies for that day (6 th of January 2019) using the radar-lidar mask as the reference were equals to 0.84 for our new data-driven retrieval, compared to an accuracy score of 0.65 with the T19 approach. In the next section, we evaluate the performances of T19, a data-driven threshold approach and our XGBoost retrieval for our full dataset covering almost three months of data during the Southern Hemisphere summer.

Evaluation of retrievals for the PLATO period
In Figure 8, similarly to what was presented in Figure 7, the average peak properties such as peak value, peak width, peak width height, the number of peaks per profile, peak altitude for each of the attenuated backscatter profiles for which peak had been identified are presented as scatter plots with peak β value as the y-axis, but this time for the full dataset, equivalent to 11,327 datapoints. The two-dimensional distributions are like that of Figure 7, showing that the results of the case of January 6 th can be extrapolated to the full dataset. The value of β at peak is directly correlated to the peak width height, making that feature redundant. The value of β at peak is clearly the best feature to separate the SLW-labelled datapoints (True) to the non-SLW datapoints (False). The other features, e.g., peak width, number of peaks and peak altitude are weaker discriminants, but they reveal that non-SLW datapoints (False) are much more widespread than SLW datapoints (True), showing that extreme 455 values of the features are usually associated with the non-occurrence of SLW. Typically, many peaks (>2), or very wide peak widths (> 3) for each profile are associated to non-SLW. The presence of a dense concentration of non-SLW datapoints at lower altitudes (< 1000 m) shows that SLW is usually not observed at these low altitudes above ground level. Based on the clustering as observed in Figure 8 for the full dataset, we decided to implement a second classification using 470 arbitrary thresholds for each of the peak features. This has the advantage of not having to train and test a model and circumnavigates the need for a high-resolution reference dataset (although the arbitrary thresholds are based here on the cluster plots where the labelling has been done using the reference dataset). The arbitrary thresholds that we used to label a timestep as "SLW" were for peak features such as: value of β at peak > 5 x 10 -5 m -1 sr -1 , peak width < 4, peak width height > 40 x 10 -6 m -1 sr -1 , peak prominence value > 60 x 10 -6 m -1 sr -1 and the total number of peaks < 3. We evaluated the arbitrary threshold 475 approach on both the full dataset and the dataset with only identified peaks. For the full dataset, the accuracy score was 0.89, while for the dataset including peaks only, the accuracy was 0.76.
Our extreme gradient boosting model was trained and tested using a 3-fold stratified cross validation approach as described previously. The model was trained and tested for both the full dataset, and the dataset containing only timesteps for which 480 peaks had been identified. For the full dataset, the best testing accuracy score was 0.91 (with learning rate = 0.005, max depth = 12, child weight = 8) for an average training accuracy score of 0.95. For the dataset for which peaks were identified, the total https://doi.org/10.5194/amt-2022-10 Preprint.  proposed an explanation method for tree-based models, building on work based on classic game theoretic Shapley values (Lundberg and Lee, 2017). We implemented this explanation method, also known as "TreeExplainer" (Lundberg et al., 2020) 490 in its original python version. With TreeExplainer, we were able to provide local explanations for each prediction by calculating their Shapley (SHAP) values. The input features were the same as described for the implementation of our XGBoost model, e.g., the average adjusted peak properties. Figure 9 shows the distribution of SHAP values for these 8 input features, together with their normalised value represented as a colour bar. The features were ranked from the most important at the top of the graph (value of backscatter at the peak location) to the least important at the bottom (the peak number, or rank).

495
< Figure 9 here > 500 As expected, the value of ß at the peak location was the most important feature to produce accurate predictions: high values of ß were important to detect the presence of SLW, while low values of ß were skilful in predicting the absence of SLW. The total peak number was the second feature of importance, with low values of the total peak number contributing to better predictions of the presence of SLW. In fact, very well-defined horizontal bands of SLW showed these typical characteristics with a single narrow peak of high values. Peak prominence had a similar SHAP values distribution pattern as peak value, with 505 less defined clusters. This result is consistent with the scatter plot of peak value versus peak prominence in Figure 8b, showing a strong correlation between these two features. Low values of peak width were important in the prediction of the presence of SLW, showing the importance of the peak shape (narrow) in indicating the presence of SLW. Low values of the peak width height tend to also help predict the presence of SLW. Conversely, peak altitude and peak temperature are associated with average SHAP values close to 0, indicating that these two features were not important in the production of accurate predictions.

510
While we saw a tendency of SLW to be located within a preferred range of altitudes (between approximately 1000 and 4000 m ASL), this range was too wide to make altitude of the peak a useful criterion to help identify SLW. Low values of peak altitude however seemed to help identify conditions with the absence of SLW (as indicated by the negative and blue SHAP values for the peak altitude).

Discussion and conclusions
We have demonstrated that the presence of supercooled liquid water can be detected using solely the measured attenuated backscatter signal from a ceilometer, even in the absence of information on the signal depolarisation. We utilised coincident observations from a W-Band radar and a depolarisation lidar to build a reference cloud phase mask that was used as a 520 benchmark to compare to our ceilometer-retrieved supercooled liquid water detections. Utilising the ceilometer data and an existing approach proposed by T19, we obtained an overall accuracy of 0.84 and an accuracy for days with detection of strong backscatter signals of 0.72. We then developed an enhanced method, utilising the benchmark dataset of cloud phase observations to develop, train and test an extreme gradient boosting model.

525
Utilising the ceilometer observations and this model, we increased the overall accuracy of correctly identifying SLW layers to 0.91 and the accuracy for days with a detection of strong backscatter signals to 0.81. This enhanced model greatly improved detection accuracy, for the cases where multiple peaks in the backscatter were observed and were erroneously classified as SLW by the method from T19. The most important input features were the value of the backscattered signal at the peak, followed by the total number of peaks within that profile. In the current approach, we considered each profile (or timestep) 530 independently from one another (although we did consider all points together in the statistical analysis and in the model development). Alternative or complementary to our existing algorithm, data-driven approaches such as Recurrent Neural Networks (LSTM or GRU) could consider the spatio-temporal patterns of peak properties to predict the occurrence and location of SLW at the following timestep.

535
Ceilometers are relatively low-cost ground-based active atmospheric remote sensing tools as compared to Weather radars or depolarization lidars. They are commonly deployed at aerodromes but also at other operational or research atmospheric monitoring facilities. Here, we showed that the raw backscatter signal can be utilised to detect supercooled liquid water, thus broadening the observational capabilities of such instruments, for regions where observations are scarce, like Antarctica. The present work is the first of its kind utilising a benchmark radar-lidar cloud phase mask to train a dedicated model to detect 540 supercooled liquid water from the ceilometer backscatter only. It will be important to test the application of the same approach elsewhere, especially for current monitoring sites or for historical data including the same set of instruments presented here, that is weather radar, depolarisation lidar and ceilometers. Our approach was developed for a polar environment, and it would be important to see how the developed technique transfers to regions at mid or low latitudes.

545
Ground-based observations of supercooled liquid water are complementary to spaceborne observations. Satellite detection of supercooled liquid water suffers from the attenuation of the signal in the lower layers, and from a lower spatial and temporal resolution. The combination of satellite and ground observations has the potential to improve cloud phase products.
Knowledge of the cloud phase including supercooled liquid water at high-resolution can help develop and validate icing algorithms, with the objective of predicting aircraft airframe icing potential (Morcrette et al., 2019) or predicting the potential 550 icing of wind turbines for wind production (Hamalainen et al., 2020). Detection capabilities developed in this paper will enable important studies to examine the seasonal variability of the occurrence of SLW and to develop aircraft icing potential nowcasting capabilities.

Code and data availability
The ALCF is open-source and available at https://alcf-lidar.

Author contribution
AG performed the ceilometer code development and overall data analysis. AP and SA analysed the radar and depolarisation lidar data and produced the radar-lidar cloud mask. AP, SA, and AK provided regular scientific inputs on the analysis. AP,

565
SA, AK, and AMD designed the experimental setup at Davis to gather ground-based observations. PK developed the code to pre-process the ceilometer data and provided scientific inputs on the ceilometer data analysis. AG prepared the manuscript with contributions from all co-authors.

570
The authors declare that they have no conflict of interest.
Deployment of the instrumentation to Davis, and the contribution of S.P. Alexander to this project, was supported by the       converted to time by multiplying by 5 min. Peak are identified with a red dot, peak widths with a vertical orange line and peak prominences with a horizontal green line. The theoretical molecular backscattering was computed following the equation as found in Kuma et al. (2021) and is shown on each subpanel as a solid grey line.  values are shown with a blue to pink gradient as indicated by the right-hand side colour bar.