Cloud fields and aerosol classification with lidar using advanced AI approach

Peleg, Yonatan; Zeida-Cohen, Lior; Tzror, Imri; Bühl, Johannes; Ansmann, Albert; Chudnovsky, Alexandra; Yakhini, Zohar

doi:10.5194/amt-19-4415-2026

Articles | Volume 19, issue 13

https://doi.org/10.5194/amt-19-4415-2026

Articles | Volume 19, issue 13

Research article

03 Jul 2026

Research article |

| 03 Jul 2026

Cloud fields and aerosol classification with lidar using advanced AI approach

Yonatan Peleg, Lior Zeida-Cohen, Imri Tzror, Johannes Bühl, Albert Ansmann, Alexandra Chudnovsky, and Zohar Yakhini

Abstract

Understanding the vertical distribution of aerosol and clouds i.s critical for climate modeling, weather forecasting, and air quality monitoring. Lidar observations are central to profiling atmospheric composition, yet signal attenuation in optically thick layers limits the effective retrieval of some important properties above those layers. More complex measurement approaches, using a combination of Lidar and cloud radar systems, can be taken to support more inclusive and accurate inference. In this study, we develop a deep learning framework to address this trade-off and gap in the cost of data acquisition by enabling full-column aerosol and cloud classification using only standard lidar inputs, achieving particularly high skill for aerosol typing while demonstrating robust, physically consistent classification of ice-cloud fields even under conditions of strong lidar signal attenuation, with liquid-cloud uncertainties primarily arising from closely related microphysical classes. The approach is based on a U-Net architecture trained to predict combined aerosol and cloud types from vertical profiles of backscatter and depolarization. Classification targets integrate established aerosol typing from PollyXT with cloud and precipitation categorization from Cloudnet, facilitating a unified scheme. The model achieves high precision, recall, and F1-scores above 95 %. By evaluating numerous complex case studies, we establish the model's ability to exploit information embedded in the lidar signal below attenuating layers, including structural and contextual features, to infer atmospheric conditions at higher altitudes, offering a robust AI-based enhancement to lidar-based atmospheric profiling and target classification. The application of AI in this context closes the gap between the need for vertical cloud maps and the sparse availability of Cloudnet.

Download & links

Article (PDF, 7954 KB)

Download & links

How to cite.

Received: 26 Oct 2025 – Discussion started: 26 Jan 2026 – Revised: 08 Jun 2026 – Accepted: 21 Jun 2026 – Published: 03 Jul 2026

1 Introduction

Aerosols and clouds are fundamental components of the Earth's atmosphere, exerting profound influences on the planet's climate system. They modulate radiative energy balance through scattering and absorption of solar and terrestrial radiation, significantly impacting the hydrological cycle through precipitation processes (Ramanathan et al., 2001; Albrecht, 1989). They also play a role in atmospheric chemistry by interacting with gases and providing surfaces for chemical reactions to occur (Intergovernmental Panel on Climate Change (IPCC), 2014; Jacob, 2000; Lelieveld and Crutzen, 1991). Clouds, in particular, have large but opposing effects on short-wave and long-wave radiation, resulting in a significant net cooling effect globally, although the magnitude remains uncertain (Hartmann and Doelling, 1991). Aerosol-cloud interactions (ACI) represent one of the largest uncertainties in current climate projections (Intergovernmental Panel on Climate Change (IPCC), 2014; Rosenfeld et al., 2014). Understanding the vertical distribution and properties of both aerosol and clouds is therefore critical to accurately understand and quantify their climatic impacts and to improve the representation of atmospheric processes in weather prediction and climate models (Winker et al., 2010; Weitkamp, 2005; Rogozovsky et al., 2023).

Significant progress has been made in the development of remote sensing techniques capable of profiling atmospheric constituents. Lidar has emerged as a powerful tool for providing detailed vertical profiles of aerosol particles and clouds with high spatial and temporal resolution (Cairo et al., 2024). Advanced lidar systems, such as multiwavelength Raman and polarization lidars, can retrieve not only the vertical distribution of aerosol backscatter but also intensive optical properties. These properties include the lidar ratio (extinction-to-backscatter ratio) and the particle linear depolarization ratio, which provide crucial information about particle size, shape, and absorption characteristics, enabling the classification of different aerosol types and their different vertical distribution (e.g. dust, smoke, marine, urban haze) (Rogozovsky et al., 2025, 2026; Baars et al., 2017). Networks like PollyNET, using standardized and automated PollyXT (POrtabLe Lidar sYstem with eXTended capabilities) (Engelmann et al., 2016) lidars, demonstrate the capability for continuous, near-real-time monitoring and characterization of aerosol profiles in diverse global locations (Baars et al., 2018, 2017).

Lidar is highly effective for detecting aerosols and thin cloud layers, yet it is constrained by signal attenuation (Winker et al., 2010; Weitkamp, 2005). The laser beam can be strongly scattered and absorbed by dense atmospheric constituents, particularly liquid water droplets (Haarig et al., 2023). In optically thick clouds, especially those containing liquid water, the lidar signal is usually fully attenuated within a few hundred meters above the cloud base, typically at optical depths (τ) around 3–5 (Winker et al., 2017). This attenuation prevents the lidar from probing the full vertical extent of the cloud, hindering the characterization of the height of the cloud top, the internal structure, and the thermodynamic phase (Kalesse-Los et al., 2022). This physical limitation is the primary reason why synergistic approaches like Cloudnet rely on cloud radar, which can easily penetrate multiple cloud layers, to provide information above the lidar attenuation height (Bühl et al., 2017).

Cloudnet integrates measurements from a lidar ceilometer, a cloud radar (typically millimeter-wavelength), a microwave radiometer (for integrated liquid water path), and thermodynamic profiles (temperature, humidity) from numerical weather prediction models. Its synergistic infrastructure is essential for retrieving continuous cloud micro-physical properties, continuously evaluating numerical weather prediction and climate models, and advancing our understanding of aerosol-cloud interaction. It also provides a detailed target classification, distinguishing between clear sky, aerosol, various cloud phases (liquid droplets, supercooled liquid, ice), precipitation types (drizzle, rain, snow), and even non-meteorological targets like insects (Illingworth et al., 2007). Due to the complexity and high cost of operating and maintaining full Cloudnet setups, there is a need for new scalable, data driven methods that can extract comparable cloud information from single instrument observations such as lidar. While techniques exist to determine cloud base from lidar (Pal et al., 1992), accurately classifying the entire cloud column using only lidar data remains a significant challenge, particularly for multi-layer, mixed-phase, or deep convective cloud systems (Kalesse-Los et al., 2022).

The rapid advancement of deep learning, particularly through architectures such as Convolutional Neural Networks (CNNs) and the U-Net, has introduced innovative approaches for analyzing complex datasets. These methods have achieved outstanding performance in pattern recognition and image segmentation across a wide range of disciplines (Krizhevsky et al., 2017; LeCun et al., 2015; Ronneberger et al., 2015; Reichstein et al., 2019). In atmospheric science, deep learning has already demonstrated significant potential, for example in cloud phase classification from radar Doppler spectra (Schimmel et al., 2022), cloud detection and segmentation from satellite imagery, and the analysis of lidar point clouds (Biasutti et al., 2019). A particularly compelling application lies in addressing the long-standing problem of signal attenuation in lidar observations. Conventional retrieval methods are limited once the backscatter signal is extinguished, leaving the atmospheric structure above the attenuation height poorly constrained. Deep learning offers a new perspective: by leveraging the rich set of features, correlations, and contextual cues embedded in the portion of the lidar profile below the attenuation threshold, and potentially its spatial and temporal evolution (Bansal et al., 2022), it infers the atmospheric properties above. A U-Net model, with its strong capacity for hierarchical feature extraction and pattern recognition, is especially well suited to capture these subtle and non-intuitive relationships.

Machine learning has been increasingly applied to the detection and classification of aerosols and clouds from lidar observations. In most approaches, lidar measurements, commonly represented as time–height or along-track–height cross-sections, are treated as two-dimensional images. This formulation allows CNNs and U-Net architectures to learn spatial textures, morphological features, and contextual patterns directly from the data. For example, a CNN was developed for cloud–aerosol discrimination using only lidar measurements from NASA's Ice, Cloud, and Land Elevation Satellite (ICESat-2) (Oladipo et al., 2024). Similarly, a U-Net model enhanced with self-attention mechanisms was constructed to classify cloud and aerosol layers in atmospheric vertical profiles using CALIPSO L1 data (Zhou et al., 2024). Both studies demonstrated the capacity of deep learning to reliably separate aerosols from clouds. However, their focus remained limited to binary cloud–aerosol discrimination, without further subdivision into specific categories. Another recent study introduced a multitask machine learning framework for space-based lidar, capable of simultaneous cloud-aerosol discrimination and aerosol typing (Fuller et al., 2025). While their approach successfully improves the spatial resolution of retrievals compared to standard products, their model is trained on lidar-derived optical products and is therefore strictly bound by the physical signal limitations of the lidar instrument itself. Consequently, the model cannot infer or characterize atmospheric structures in regions where the lidar signal is fully attenuated.

Other direction of research has targeted aerosol sub-classification using lidar data. For example, one study applied traditional algorithms to first detect atmospheric layers and compute their integrated optical properties, where these derived feature vectors were subsequently classified by a standard artificial neural network (ANN) (Nicolae et al., 2018). A more recent comparison of six machine learning models for aerosol typing identified LightGBM as the most effective (del Águila et al., 2025). While these efforts highlight the promise of machine learning for aerosol categorization, they do not extend to the joint classification of cloud and aerosol subtypes. Importantly, distinct aerosol and cloud categories often exhibit complex cross-category and cross-type interactions. Capturing these interactions requires integrated datasets that explicitly combine both aerosol and cloud categories as classification targets.

This paper introduces a deep learning methodology aimed at achieving unified aerosol and cloud classification throughout the vertical atmospheric column using only standard lidar measurements as input. The approach utilizes a U-Net architecture trained end-to-end to map vertical profiles of lidar backscatter and depolarization to a combined target classification derived from PollyXT aerosol typing and Cloudnet categorization. Crucially, while elastic lidar observations are fundamentally limited by complete signal attenuation in optically thick clouds, our architecture leverages contextual learning to look beyond this physical barrier. Rather than attempting to retrieve impossible optical properties above the attenuation limit, the network generates probabilistic inferences for these upper atmospheric classes. These classifications are strictly constrained by the observed vertical structure below the cloud top and the surrounding thermodynamic context, offering a novel predictive capability where direct lidar observation fails.

2 Data

The primary input to the deep learning model consists of vertically resolved profiles obtained from ground-based lidar measurements in Limassol, Cyprus, between November 2016 and April 2018. Data were formatted as two-dimensional (2D) time-height images representing a sequence of profiles. Using 2D inputs allows the CNN architecture to exploit spatio-temporal context, capturing dynamic features or advection patterns relevant to the classification task. Our data consists of a temporal resolution of 90 s and a vertical resolution of 37 m, where each image spans over 24 h and 22.5 km in height. Data were provided by the Leibniz Institute for Tropospheric Research (TROPOS). The terms “sample” and “image” will be used intermittently to describe one 2D (time-height) input, where an “image” is a multi-channel time-height dataset.

All input features were selected based on their established relevance in lidar-based classification frameworks. Optical channels primarily drive aerosol discrimination via particle size and shape, while thermodynamic variables (temperature and pressure) constrain physically plausible cloud phases. The input features used in this study include: attenuated backscatter coefficient at 532 and 1064 nm, aerosol backscatter coefficients at 532 and 1064 nm, particle depolarization ratio at 532 nm, volume depolarization ratio at 532 nm, the backscatter-related Ångström exponent between 532 and 1064 nm, model pressure and model temperature. For generation of the training dataset, all variables of the Cloudnet processing scheme were mapped to the PollyNET time-height grid. All variables from the Cloudnet processing scheme were interpolated onto the finer PollyXT time-height grid (90 s, 37 m). To avoid the introduction of artifacts or the blending of discrete categorical classes that would result from numerical averaging, this mapping was performed using nearest-neighbor interpolation. Consequently, the Cloudnet data was simply replicated onto the finer lidar grid, leaving the original categorical values entirely untouched.

The target variable represents a unified classification that combines aerosol and cloud/precipitation types for each vertical bin in the profile. The construction of this unified mask follows a straightforward, rule-based merging strategy. The foundational mask is derived from the PollyXT target categorization algorithm (Baars et al., 2017). To integrate comprehensive cloud and precipitation data, this base mask is subsequently overwritten by the Cloudnet target classification (Illingworth et al., 2007) in any pixel where the Cloudnet radar detects cloud or precipitation particles.

https://amt.copernicus.org/articles/19/4415/2026/amt-19-4415-2026-f01

Figure 1Unified atmospheric target classification mask. A 2D time-height training label for 3 November 2016, created by integrating PollyXT aerosol typing and Cloudnet cloud/precipitation categorization. The vertical axis represents altitude (up to 22.5 km), and the horizontal axis represents 24 h of observations at 90-s resolution. Classes range from clear atmosphere (Class 1) to specific aerosol types (Classes 3–6) and various cloud phases including water droplets and ice crystals (Classes 8–11).

Cloud fields and aerosol classification with lidar using advanced AI approach

3.1 Data preprocessing

3.2 Learning

4.1 Analysis of Model Performance

4.2 Case studies to test model performance

4.2.1 Case 1: Cloud-free multiple layering conditions

4.2.2 Case 2: Aerosol-cloud interaction study

4.2.3 Case 3: Mid–High Clouds and Low-Level Dust Event