We consider the problem of reconstructing the number size distribution (or particle size distribution) in the atmosphere from lidar measurements of the extinction and backscattering coefficients. We assume that the number size distribution can be modeled as a superposition of log-normal distributions, each one defined by three parameters: mode, width and height. We use a Bayesian model and a Monte Carlo algorithm to estimate these parameters. We test the developed method on synthetic data generated by distributions containing one or two modes and perturbed by Gaussian noise as well as on three datasets obtained from AERONET. We show that the proposed algorithm provides good results when the right number of modes is selected. In general, an overestimate of the number of modes provides better results than an underestimate. In all cases, the PM

Lidar (light detection and ranging) is a remote sensing technique similar to radar (radio detecting and ranging) which uses light in the form of short laser pulses to invest a target and obtain, through elastic and inelastic scattering processes, information on the target properties as a function of the distance from the lidar system.

In the atmospheric application, lidar systems can be used to obtain spatially resolved information about the optical properties of the atmospheric aerosols (desert dust, volcanic ash, smog and many other types of substances)

However, information on the microphysical properties of the atmospheric aerosols is seldom obtained using the lidar signal alone. This information, which is essential for a complete aerosol characterization useful to understand their effect on climate, is instead frequently obtained through the synergistic use of in situ instruments; incidentally these measurements also allow a validation of the lidar retrievals, but only for those values that are closest to the ground and for a particular aerosol typology (Saharan dust, biomass burning aerosol, etc.); alternatively, validation can be done using synthetic data

In order to retrieve the microphysical properties of the aerosol from lidar measurements, two inverse problems must be solved in sequence: in the first inverse problem, one uses the measured backscattered power to obtain an estimate of the aerosol optical parameters; in the second problem, one uses the estimated optical parameters (at different wavelengths), derived from lidar observations, to obtain an estimate of the number size distribution, i.e., the density of particles as a function of the particle size. This latter problem is particularly challenging because of the limited number of data due to the practical problem of measuring at many different wavelengths. Moreover, from a mathematical point of view, the microphysical parameters are generally derived from the optical ones through integral equations that cannot be solved analytically and whose numerical solution leads to a so-called ill-posed problem. This last is characterized by a strong sensitivity of the solution from the input data uncertainties and by the non-uniqueness of the solution. Remarkably, from a mathematical viewpoint there can be several ways to overcome ill-posedness; however, not all of them actually reflect realistic physical conditions. In addition, numerical studies showed that a poor selection of the constraints can affect the quality of the solution and compromise the microphysical retrieval in spite of the strength of the regularization algorithm. Therefore, in order to obtain stable and physically acceptable solutions, mathematical and physical constraints variously combined with regularization methods are applied

In the past decade, a number of studies have focused on the retrieval of the microphysical aerosol parameters from multiwavelength lidar measurements using the standard “

Other studies have compared the “

Despite these difficulties, the possibility of characterizing the atmospheric particulate using only the lidar instrument would be very advantageous, and for these reasons it is currently a much studied topic

Following the state of the art, we retrieve the particle size distribution from “

This article is organized as follows: in the “Methods” section, we provide the mathematical formulation of the problem and a description of the Monte Carlo algorithm; in the following section, we analyze the results obtained for synthetic data using five exemplar cases; in the final section we briefly summarize our conclusions.

Lidar instruments measure the backscattered light power at wavelength

The extinction and backscattering coefficients carry information on the number size distribution through the Mie scattering theory. Specifically, let

The problem we want to solve consists of retrieving the number size distribution

Solving the linear inverse problem defined by Eq. (

In this work, we set up a Bayesian model and apply a Monte Carlo technique to find the best parameters of unimodal (

In the Bayesian framework, probability distributions are used to code our degree of knowledge on the values of unknown or unobservable quantities: perfect knowledge is represented by a probability distribution which is non-zero only in the correct value, and partial knowledge is represented by a probability distribution which assigns high probability to likely values and low probability to unlikely values.
The Bayesian framework is useful to combine a priori information, i.e., information available before the data are collected, with the information content of the data: a priori information is coded in the so-called

Let us define the vector

As far as the likelihood is concerned, we assume that the data are affected by Gaussian noise, which leads us to define the likelihood as

Finally, let us observe that in this work we assume that

The posterior distribution defined in Eq. (

The general idea of Monte Carlo methods is to

MCMC methods work by constructing a Markov Chain whose invariant distribution is the target posterior distribution, i.e.the transition kernel

In this work we use the Metropolis–Hastings construction of the kernel, which has the form

Eventually, the MCMC algorithm works as follows.
We start from an initializing value

for every and each parameter within

compute the acceptance ratio

accept the proposed value with probability

We show several examples of applications of the Monte Carlo method to completely synthetic data as well as to data derived from experimental recordings, in the following denoted as

In order to perform a quantitative analysis of reconstruction accuracy, we use two different methods.

The first method is based on the deviation between the size distribution (SD) reconstructed by the inversion algorithm and the simulated exact SD.
Indeed, the algorithm determines the SD that reproduces the set of measured parameters with some tolerance to account for the presence of noise;
it is first necessary to define a method for quantitatively measuring the distance between the synthetic SD and the reconstructed SD. This can be done using the deviation defined as

The second method of evaluating the accuracy of the solutions is based on the calculation of integral properties of the size distributions. Since our algorithm allows us to determine the dimensional distribution expressed as

In addition, we also take into consideration the effective radius

We first proceed to show that the deviation measure is a good indicator of the performances in the sense that its value gives a quantitative evaluation of the distance between true and estimated size distributions.
To this aim, we simulated in Simulation 1 a unimodal distribution with the parameters given in Table

True values of log-normal distributions used for testing purposes in Sect.

Discretization of the

Figure

Reconstructions of number size distribution that show increasingly large deviation from the true one; from

Having ascertained that the deviation is a good measure of the “closeness” of the reconstruction to the real distribution, it is obvious to assume that a reconstruction with a small value of deviation must correspond to a low value of the

The statistical nature of the Monte Carlo method includes an intrinsic instability in the sense that a repetition of the calculation with the same set of input optical parameters, even if without error, leads to different reconstructions. The dispersion of the reconstructions with the same initial conditions is a measure of the stability of the method.

Another issue is the effect of noise on the optical input parameters. This random perturbation causes a further increase in the instability of the reconstruction, which may also prevail over the intrinsic instability of the method.

In order to have a quantitative evaluation of the influence of the instability of the method with respect to the noise in the input parameters, we have made a statistical analysis of the discrepancy.
A bimodal distribution was simulated with the parameter values given in Table

The values of the extinction coefficients at the wavelengths of 355 and 532 nm and of the backscattering coefficients at the wavelengths of 355, 532 and 1064 nm were then determined with the Mie theory, considering homogeneous spherical particles. The values of the optical coefficients were then used as input data for the reconstruction. The reconstruction was repeated 30 times, each time perturbing the set of input parameters in order to simulate a 5 % error in each of them.
We then calculated the discrepancy of each of the 30 sets of optical parameters perturbed with respect to the unperturbed set (input discrepancy). The distribution of the input discrepancy is shown in Fig.

The algorithm always provides output (predicted) optical parameters

It should be kept in mind that the discrepancy input represents the “distance” between each set of perturbed parameters and the theoretical set, while the discrepancy output represents the “distance” between the set of parameters corresponding to each reconstruction with respect to the set of input parameters of the same reconstruction.
From Fig.

The uncertainty obviously takes into account both the instability of the method and the errors in the input parameters. In the tests conducted so far the distribution of the optical parameters has been considered Gaussian, but in view of the simple logical structure of the algorithm, it is in principle possible to introduce arbitrary distributions to take into account, for example, the contribution of systematic errors in the input optical parameters (consider for example that the error in the backscattering coefficient at 1064 nm can be dominated by the uncertainty in the lidar ratio, whose value should be fixed a priori in a more or less arbitrary manner).

In the following we show the results of different tests for simulated SDs representing realistic cases. The reconstruction has been obtained by setting the number of iterations to 5000 and by running the algorithm 30 times, with noise equal to 5 %. For each run, we collect the best solution, and we provide uncertainty quantification, shown as a shaded area in the pictures below, using the standard deviation of the best solution across these runs.

Figure

Reconstructions obtained by running the inversion algorithm with a unimodal distribution, when data are generated by a unimodal distribution. Different panels from

True values of unimodal and bimodal log-normal distributions used for testing purposes in Sect.

The (a) and (b) cases simulate a unimodal SD centered on the coarse mode of a realistic bimodal distribution; the (c) and (d) cases simulate a unimodal SD which approximates a fine mode of a realistic bimodal distribution.

In Table

Performance metrics obtained by running the inversion algorithm with a unimodal distribution, when the data are generated by a synthetic unimodal distribution (see Fig.

Figure

Reconstructions obtained by running the inversion algorithm with a unimodal distribution, when data are generated by a bimodal distribution. Different panels from

Reconstructions obtained by running the inversion algorithm with a bimodal distribution, when data are generated by a unimodal distribution. Different panels from

Figures

Reconstructions obtained by running the inversion algorithm with a bimodal distribution, when data are generated by a bimodal distribution. Different panels from

Tables

Performance metrics obtained by running the inversion algorithm with a unimodal distribution, when data are generated by a synthetic bimodal distribution (see Fig.

Performance metrics obtained by running the inversion algorithm with a bimodal distribution, when data are generated by a synthetic unimodal distribution (see Fig.

Performance metrics obtained by running the inversion algorithm with a bimodal distribution, when the data are generated by a synthetic bimodal distribution (see Fig.

Constraints used in the analysis of the quasi-real data.

Details of the three experimental datasets used in Sect.

We finally validate our proposed method on three datasets that have been obtained from experimental data recorded by AERONET by applying the direct calculation of Mie functions to the size distribution reported in the AERONET database. For all three datasets, we attempt reconstruction with a unimodal and a bimodal distribution: constraints for the parameter values are reported in Table

All reconstructions were obtained with 5000 iterations, 20 repetitions and 5 % noise in the optical parameters.
In Table

The comparisons between the reconstructed and the simulated distributions, shown in Figs.

As shown in Fig.

The reconstruction of bimodal distributions, with realistic values of the parameters, by using unimodal distributions (see Fig.

Figures

Results obtained by applying our proposed method to the experimental dataset corresponding to the AERONET size distribution measured in Bucharest: in

Results obtained by applying our proposed method to the experimental dataset corresponding to the AERONET size distribution measured on Etna: in

Results obtained by applying our proposed method to the experimental dataset corresponding to the AERONET size distribution measured in Gozo: in

In the bimodal SD reconstruction (Fig.

The analysis of quasi-real data (Figs.

We observe that reconstructions obtained with bimodal distributions provide consistently better results than those obtained with unimodal distributions and that the error in assessing the PM concentrations remains at more-than-acceptable levels, particularly with bimodal distributions. This result is particularly reasonable because all the datasets were indeed generated by bimodal distributions, including the one recorded on Etna (Fig.

Our analysis also highlights a few limitations of the proposed technique. First, the technique presents an inherent subjectivity as regards the choice of unimodal versus bimodal distributions; while, on average, the bimodal settings performs better, it can also produce some spurious modes such as those in Fig.

A second limitation concerns the subjectivity in the choice of the CRI, which was assumed to be known in the present study. Our analysis of quasi-real data showed how the quality of the retrieval may depend on the value of the CRI and partly deteriorates when the imaginary part grows, particularly for larger modes. This is a known issue with lidar data that can possibly be solved in a Bayesian framework by devising better priors. In addition, a full Bayesian model including the CRI among the unknowns can be devised; however, with an increased number of unknowns it will be necessary to exploit more prior information to reduce the degree of ill-posedness.

To conclude, we observe that the uncertainty quantification currently implemented seems to provide, at times, optimistic results, to the extent that the true distribution is not always included in the confidence bands.
We reckon that these limitations can be overcome by using more complex but more powerful Monte Carlo sampling techniques, such as those described in

The preliminary results presented in this paper indicate that the proposed method can retrieve uni- and bimodal distributions from extinction coefficients measured at two wavelengths and backscattering coefficients measured at three wavelengths when the correct number of modes is selected. The reconstruction of three-modal distributions is more challenging, and more constraints might be necessary to obtain reliable and stable solutions. The extension of the method to three-modal distributions and variable refractive index, together with better uncertainty quantification and automatic model selection, will be the subject of future studies.

The code used in this article is not publicly accessible as the research has been partially funded by a private company. However, the article contains information that shall enable the reproduction of the code.

Part of the data used in this article are publicly available on the AERONET website (

ASo conceived and implemented the algorithm. VT and PC contributed to an efficient implementation of the algorithm. NS, ASa and AB performed the statistical analysis used to determine inversion constraints. NS ran the simulations and performed the analysis of experimental data. All authors contributed to writing the manuscript.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Alberto Sorrentino was partially supported by Gruppo Nazionale per il Calcolo Scientifico. This project was partially supported by the Beijing Research Institute on Telemetry (BRIT), Beijing, China.

This paper was edited by Daniel Perez-Ramirez and reviewed by three anonymous referees.