In this study, we explore a new approach based on machine learning
(ML) for deriving aerosol extinction coefficient profiles, single-scattering
albedo and asymmetry parameter at 360 nm from a single multi-axis differential optical absorption spectroscopy (MAX-DOAS) sky scan.
Our method relies on a multi-output sequence-to-sequence model combining
convolutional neural networks (CNNs) for feature extraction and long
short-term memory networks (LSTMs) for profile prediction. The model was
trained and evaluated using data simulated by Vector Linearized Discrete Ordinate Radiative Transfer (VLIDORT) v2.7, which contains
1 459 200 unique mappings. From the simulations, 75 % were randomly selected for training and the remaining 25 % for validation. The overall error of
estimated aerosol properties (1) for total aerosol optical depth (AOD) is

Aerosols play an important role in the Earth–atmosphere system by modifying the global energy balance, participating in cloud formation and atmospheric chemistry, and fertilizing land and ocean. Aerosols are widely spread in the troposphere, being emitted by anthropogenic and natural processes (primary aerosols) and formed by gas-to-particle conversion mechanisms (secondary aerosols). Aerosols are removed from the atmosphere by dry (gravitational settling and turbulent) deposition and wet deposition and have variable lifetimes ranging from a few minutes to a few weeks (Haywood and Boucher, 2000).

The spatial and temporal distribution of aerosols in the lower troposphere is highly variable and greatly depends on the proximity to the sources, type of aerosols, meteorological conditions and photochemical processes. Horizontal and vertical heterogeneity of the aerosol distribution, their properties and processes pose a serious challenge for modeling aerosol-induced radiative forcing and is an important source of uncertainties in the climate modeling results (IPCC, 2013).

Macroscopic aerosol optical properties required for modeling aerosol radiative forcing include single-scattering albedo, scattering phase function and aerosol optical depth (AOD; Dubovik et al., 2002).

This paper investigates the potential of using advances in machine learning to invert aerosol properties (aerosol extinction coefficient profiles, single-scattering albedo and scattering phase function) from a hyperspectral remote-sensing technique called multi-axis differential optical absorption spectroscopy (MAX-DOAS).

Machine learning (ML) is a branch of artificial intelligence that derives its roots from pattern recognition and statistics. The goal of ML is to build statistical (or mathematical) models of a real-world phenomenon by relying on training examples. For instance, in supervised ML, a model is first presented with a set of paired examples (termed as the training set), where every training example contains a pair of input variables and output variables, and the goal of ML algorithms is to find the statistical structure of mapping from the input variables to the output variables that match with the training examples and can be generalized to unseen examples (termed as test set). The learned mapping (or the model) can be applied to the inputs of test examples to make predictions on their outputs. There are several advantages of using ML. Firstly, it can sift through vast amounts of training data and discover patterns that are not apparent to humans. Secondly, ML algorithms can have continuous improvement in accuracy and efficiency with increasing amount of training data. Thirdly, ML algorithms are usually very fast to apply on test examples since the time-consuming training process of ML models is offline and one time. With these advantages as well as the availability of faster hardware, ML has soon become the most popular data analytic technique since the 1990s. In recent years, it has also been applied to the field of remote sensing (Efremenko et al., 2017; Hedelt et al., 2019).

Artificial neural networks (ANN) are methods studied in the ML field, successfully applied to a number of commercial problems such as image detection, text translation and speech recognition. It is inspired by the biological neural networks constituting animal brains. As an analogy to a biological brain, an ANN is based on artificial neurons. An artificial neuron is a mathematical function receiving and processing input signals and producing outputs signals or activations. Each neuron comprises weighted inputs, an activation function and an output. Weights of the neuron are parameters to be adjusted, while the activation function defines the relationship from the input signals to the output signals. When multiple neurons are composed together in a layered manner (where the output signals of neurons in a given layer are used as inputs for the neurons in the next layer), we call it an artificial neural network. A common algorithm for training ANNs is the backpropagation algorithm, which passes the gradients of errors on the training set from the output layer to inner layers to refine the weights at all layers in an incremental way. The backpropagation algorithm converges when there is no change in ANN weights across all layers beyond a certain threshold. There are several optimization methods that are used for performing backpropagation and are behind standalone ANN packages commonly used by the ML community. ANNs have many different types depending on the specifics of the neuron arrangement or architecture. A simple type of ANN is a multilayer perceptron (MLP), where all neurons at a given layer are fully connected with all neurons of the next layer, also termed as dense layers. Other complex types of ANN include convolutional neural network (CNN) and recurrent neural network (RNN). Two important types of artificial neural networks used in this study are the CNNs (Fukushima, 1980; LeCun et al., 1999) and the long short-term memory (LSTM) neural networks (Hochreiter and Schmidhuber, 1997), which are variants of recurrent neural networks.

Schematics of a simple CNN.

Convolutional neural network is a class of deep neural networks that
uses the convolution operation to define the type of connections from one
layer to another. While they have shown impressive results in extracting
complex features from images in computer vision applications
(Krizhevsky et al., 2012; Simonyan and
Zisserman, 2015), they are relevant in many other applications involving
structured input data, e.g., 1D sequences. A CNN is composed of an input
layer, multiple hidden layers and an output layer. The hidden layers usually
consist of several convolutional layers, followed by pooling layers, fully
connected layers (dense layers) and normalization layers. Figure 1 shows a
simple example of CNN. The input vector (or sequence) is first passed
through a convolutional layer where it is convolved with three filters
(convolution kernels) of size 3 using the same padding to produce three

LSTM neural networks have many applications such as speech recognition (Li and Wu, 2015) and handwriting recognition (Graves et al., 2008; Graves and Schmidhuber, 2009). They are a special kind of ANNs termed as recurrent neural networks (RNNs). RNNs are designed for modeling sequence-dependent behavior (e.g., in time). They are called “recurrent” because they perform the same operation for every element of a sequence, with the output at a given element dependent on previous computations at earlier elements (Britz, 2015). This is different from traditional neural networks wherein all the input–output examples are assumed to be independent of each other.

Figure 2 shows a diagram of an unrolled RNN with

Unrolled recurrent neural network.

An example of an LSTM cell is illustrated in Fig. 3, of which the update
rules are as follows:

LSTM cell diagram (modified from Thomas, 2018).

The MAX-DOAS technique has been widely used to derive vertical aerosol
extinction coefficient profiles in the lower troposphere. This is typically
done from ground-based measurements of oxygen collision complex
(

Demonstration of the MAX-DOAS principle:

The MAX-DOAS technique consists of measuring sky-scattered UV–VIS solar photons at multiple, primarily, low elevation angles (Fig. 4). MAX-DOAS shows a large sensitivity to the tropospheric gases due to increased photon path length through the lower troposphere (Platt and Stutz, 2008). To eliminate the contribution from the upper-atmosphere, solar spectra measured at low elevation angles are divided by the reference spectrum collected from the zenith direction. The DOAS technique has the advantage of not needing an absolute radiometric calibration.

The first step of the DOAS retrieval is a spectral evaluation to calculate
the differential slant column density (

The inversion of Eq. (2) is often done in the framework of Bayes' theorem,
which allows for the assignment of probability density functions to all
possible states given measurements and prior knowledge of the state.
However, in reality, we are not interested in all possible solutions but
rather a single, the most “probable” solution with its error estimation.
Equation (3) shows a Transfer Function that defines an estimated solution
(

In addition to the optimal estimation method (OEM), briefly described above, parameterized (Beirle et al., 2019; Vlemmix et al., 2015) and analytic (Frieß et al., 2019; Spinei et al., 2020) inversion algorithms were developed. Frieß et al. (2019) provided a detailed intercomparison of currently available state-of-the-art inversion algorithms for the MAX-DOAS measurements. Most of the current algorithms take between 3 to 216 s to process a single MAX-DOAS sky scan (Frieß et al., 2019), mainly due to the iterative inversion step. Aerosol extinction coefficient profiles are inverted, while aerosol single-scattering albedo and asymmetry factor are typically assumed based on the colocated AERONET measurements. They also require external information about the atmosphere (e.g. temperature and pressure profiles) that might not be readily available at the measurement timescales and a priori information that does not typically exist. With an increasing number of MAX-DOAS 2D instruments worldwide capable of sunrise to sunset measurements (e.g. Pandonia Global Network), fast methods are needed that can harvest full information from the MAX-DOAS hyperspectral measurements.

This study describes and evaluates a fast novel machine learning (ML)
approach for retrieving aerosol extinction coefficient profiles, asymmetry
factor and single-scattering albedo at 360 nm from

The rest of the paper is organized in the following sections. Section 3 provides an overview of the new retrieval algorithm. Section 4 focuses on training data generation using the radiative transfer model (Vector Linearized Discrete Ordinate Radiative Transfer, VLIDORT). Section 5 details ML implementation. Section 6 provides an extensive comparison of ML-predicted versus “true” macroscopic aerosol properties outside the training dataset. Section 7 summarizes the findings.

Our approach consists of three stages: (1) training set generation; (2) a
one-time training that results in an inverse ML model

Schematics of the machine learning inversion algorithm.

First, a training set containing simulated measurements

The success of any ML model depends on the quality of the training data. Since there is no reliable dataset that combines simultaneous MAX-DOAS measurements and observations of aerosol macrophysical properties and vertical extinction coefficient profiles at 360 nm, we use a radiative transfer model to simulate MAX-DOAS measurements. In this study, we train our ML model on air mass factors (AMF) calculated from the simulated solar radiances at the bottom of the atmosphere.

AMF represents a ratio between the true average path that photons take
through a gas layer before detection by a MAX-DOAS instrument and the
vertical path. Since

Radiative transfer model settings.

In the absence of aerosols and clouds, only air molecules (mainly oxygen and
nitrogen) scatter solar photons in the Earth's atmosphere. This molecular-only (Rayleigh) scattering process is considered to be well understood
(Bodhaine et al., 1999), and

VLIDORT models radiative transfer processes at a specific wavelength in a stratified atmosphere. It requires geometrical and “optical” information about the atmospheric layers and the underlying ground surface. These include layer heights, pressure and temperature at layer boundaries for refractive geometry calculations, solar zenith, viewing zenith direction, and relative azimuth angles between the viewing direction and solar position. Each atmospheric layer is described by total optical thickness, total single-scattering albedo and the set of Greek matrices specifying the total scattering law.

VLIDORT simulations were performed for the US 1976 standard atmosphere
divided into 67 layers (same as in Frieß et al., 2019) with
0.1 km layers from the surface to 4 km; 0.5 km layers from 4 to 8 km and
varying width up to 60 km. Since surface reflectivity has a small effect on
ground-based MAX-DOAS measurements we performed simulations only for a
single Lambertian albedo of 0.04. Absorption only by two gases was
considered in this study: ozone and

Aerosol types in this study are described by a single-scattering albedo and asymmetry factor combination with a total of 20 “types”: (1) single-scattering albedo: 0.775, 0.825, 0.875, 0.925, 0.975; (2) Henyey–Greenstein asymmetry factor: 0.675, 0.725, 0.775, 0.825. Aerosol extinction coefficient profiles were generated by combining an exponential function at the surface with a “sliding” Gaussian function above. The aerosol total optical depth was partitioned between the exponential and Gaussian functions. Total AOD cases included 0.15, 0.3, 0.45, 0.6 and 0.75 with exponential-to-Gaussian partitioning fractions of 0.3, 0.6 and 0.9. The Gaussian function peak center height was varied from 0.5 to 2 km in steps of 0.5 km. The Gaussian function peak width was varied too, at 0.1, 0.2, 0.3 and 0.5 km. This results in 4800 aerosol cases and a total of 1 459 200 measurement simulations (sky scan). Figure 14 demonstrates the aerosol profile samples, where the near-surface aerosol partial optical depth profiles are described by the exponential function, and the layers aloft are described by the Gaussian function with various widths and heights added to the exponential function profile. While VLIDORT simulations were performed for an atmosphere divided into 67 layers, ML training was done by resampling only onto 23 layers. The new layer depths are 100 m from the surface to 1 km, 200 m from 1 to 3 km, 500 m from 3 to 4 km, and 56 km (height of the last layer). The new layer partial AODs were generated by adding the neighboring layer partial aerosol optical depths. The ML algorithm was trained on 75 % randomly selected measurement simulations (1 094 400 samples), and model performance was tested on the remaining 25 %. Note that no validation data were held off from the 75 % training set for tuning hyperparameters of our ML model, as all ML hyperparameters were kept constant across all experimental settings in this paper.

We employ a supervised ML formulation for our problem of aerosol profile
retrieval, where the goal is to learn the mapping from input variables to
output variables given a training set of paired data instances. In our
formulation, every data instance corresponds to a single MAX-DOAS sky scan
at a fixed relative azimuth angle (RAA) and solar zenith angle (SZA), where
the inputs of the data instance comprise the following: (a) RAA scalar value, (b) SZA
scalar value and (c) a sequence of

Note that in our supervised ML formulation, there are sequences in both the
input signals and output signals, namely

Schematics of the multi-output sequence-to-sequence model for deriving aerosol optical properties from MAX-DOAS measurements.

Figure 6 illustrates the novel multi-output sequence-to-sequence model for
learning the inverse mapping from MAX-DOAS measurements to aerosol optical
properties. To extract sequence-based features from MAX-DOAS inputs, a
1D convolutional neural network (CNN;
Fukushima, 1980; LeCun et al.,
1999) is first applied to the sequence of inputs (we concatenate

Figure S1 in the Supplement shows the detailed architecture of the multi-output
sequence-to-sequence model. The CNNs consist of eight 1D convolutional
layers (

Extracted feature vector from the

We implemented our ML model in the Jupyter notebook using the Keras library, which is a commonly used deep learning library for Python. RMSprop was chosen as the optimizer, and the mean squared error was used as the loss function (Hinton, 2012). We trained the model on 75 % of the dataset for 124 epochs with a batch size of 640. The following choice of hyperparameters was used: choice of optimizer is RMSprop, with a learning rate of 0.001, a decay factor of 0.9, a learning rate decay of 0 and a fuzz factor – none. We did not perform any hyperparameter tuning on a separately held validation set inside the training set, and the values of all hyperparameters in our ML model were kept constant throughout all experiments in the paper on the test set. In order to ensure that there was no overlap between the training and testing steps, we did not make use of the test data either directly or indirectly during the training phase, either for learning parameter weights or selecting hyperparameters.

Evaluation of the accuracy of ML mapping rules derived during the training stage for MAX-DOAS data inversion was done by comparing the true atmospheric aerosol properties to the ML inverted properties. The evaluation dataset consists of 364 800 MAX-DOAS simulated sky scans that are outside of the training set. The number of simulations in the evaluation dataset as a function of solar zenith angle (SZA) and relative azimuth angle (RAA) are shown in Fig. 7. Between 1100 and 1300 aerosol scenarios are present in each SZA-RAA bin.

Number of simulations in the evaluation dataset as a function of solar zenith (SZA) angle and relative azimuth angle (RAA).

The following ML-predicted aerosol properties were evaluated: (1) asymmetry
factor, (2) single-scattering albedo, (3) total aerosol optical thickness
and (4) partial aerosol optical thickness for each layer from 0 to 4 km. A
relative error

The ML-based approach shows an ability to invert aerosol asymmetry factor
with a mean error of

Asymmetry factor retrieval errors:

Single-scattering albedo retrieval errors:

Similar high accuracy is achieved for ML retrieval of the single-scattering
albedo with a mean error of 0.19 % and 2 standard deviations of 3.46 %
and nearly normal error distribution, somewhat positively skewed (Fig. 9).
Slightly higher errors are observed at RAA smaller than 60

Mean errors are also larger at small RAA and SZA

Total AOD retrieval is more challenging for the ML model than the single-scattering albedo or asymmetry factor, especially at lower total AOD levels.
Boxplots of the total AOD error for different true total AOD values are
given in Fig. 10. In general, the ML algorithm tends to underestimate total AOD
from the mean error

Boxplots of total AOD prediction errors for each true total
AOD value. The box central mark indicates the median, and the bottom and top
edges of the box indicate the 25th and 75th percentiles, respectively. The
whiskers extend to the most extreme data points not considered outliers, and
the outliers are plotted individually using
the “

Total AOD retrieval errors:

The contribution of partial AOD retrieval error at each atmospheric layer from 0 to 4 km to the total AOD is shown in Fig. 12. Layer partial AOD retrieval error relative to the total AOD depends on the absolute amount of aerosols and its altitude and on average is less than 1 % per layer. Just like OEM methods, the ML method has lower accuracy of retrieving elevated aerosol layers especially corresponding to smaller total AOD. The larger distribution of relative errors in partial AOD at 1.5 and 2 km is mainly due to the presence of elevated layers in the training data that peaked at those heights. If the aerosol were also present in meaningful amounts above those altitudes, the error distribution would have been larger above 2 km.

Mean partial layer AOD error

A linear regression analysis of the true versus the retrieved partial
AOD was performed using the least-squares fitting for each layer from 0 to
2.2 km (Fig. 13). Intercepts of linear regression analysis for all layers
were zero with

Correlation between the retrieved partial AOD and the true
partial AOD for each layer from 0 to 2.2 km (

Figure 14 shows some examples of the partial AOD profiles retrieved by the ML inversion model. Panels (a)–(h) in Fig. 14 contain randomly selected profiles out of the tested pool. While panels (i)–(l) contain some of the worst predictions. These examples show that the ML model is able to predict the elevated aerosol layers and even in those cases having large discrepancies, the model is still capturing the correct shape.

Examples of predicted partial layer AOD profiles:

To estimate retrieval uncertainties due to random noise in ML training on the aerosol properties we reran the ML training stage 20 times. Mean errors and standard deviations for total AOD, single-scattering albedo and asymmetry factor for each trained model are shown in Fig. 15.

Effect of random noise in model training on the retrieved aerosol properties.

Table 2 summarizes the effect of random model training noise on the
retrieved properties. In general, most ML models result in a normal
distribution of errors with an additional bias in the mean. Since the
individual model training has a very small effect on error distribution
(small changes in standard deviation between the different training runs) we
add the variation in bias with standard deviation in quadrature to estimate
the total error of the ML model including the random error of the training
as follows:

Total AOD error

Single-scattering albedo error

Asymmetry factor error

Statistics of aerosol property error analysis from 20 ML models (20 different training runs).

This paper presents a fast ML-based algorithm for the inversion of

Evaluation of four retrieved aerosol properties (asymmetry factor, single-scattering albedo, total AOD and partial AOD for each layer from 0 to 4 km)
shows good performance of the ML algorithm with small biases and a normal
distribution of the errors. Overall, 95.4 % of the retrieved optical properties
have errors within the following ranges: (

Application of ML-based algorithm to real data inversion has the following
advantages:

Fast real-time data inversion of the aerosol optical properties;

Simple implementation by using an HDF file with the model coefficients in open-source codes such as Python;

Ability to retrieve single-scattering albedo and asymmetry factor;

Use of the ML algorithm-retrieved aerosol extinction coefficient profiles; single-scattering albedo and asymmetry factor as initial guess inputs in more formal inversion algorithms (with radiative transfer simulations).

To make the ML model more robust, the training data should include more realistic aerosol inputs and radiative transfer simulations including (1) rotational Raman scattering simulations to add ring measurements from MAX-DOAS, (2) different surface albedos, (3) more realistic aerosol profiles (e.g., from a 3D multiwavelength aerosol/cloud database based on CALIPSO and EARLINET aerosol profiles, LIVAS; Amiridis et al., 2015) and (4) multiple wavelengths.

All data used in this study (radiative transfer simulations and ML model from a single training) are available from

The supplement related to this article is available online at:

ES conceived the original idea of the algorithm and performed radiative transfer simulations to generate training and test datasets. YD developed the machine learning (ML) algorithm, conducted training and data inversion, and performed error analysis and visualization. AK guided the design of the ML model architecture. ES supervised the project. All authors discussed the results and contributed to the final paper.

The authors declare that they have no conflict of interest.

This paper was edited by Omar Torres and reviewed by two anonymous referees.