The aerosol scattering coefficient is an essential parameter for estimating aerosol direct radiative forcing and can be measured by nephelometers. Nephelometers are problematic due to small errors of nonideal Lambetian light source and angle truncation. Hence, the observed raw scattering coefficient data need to be corrected. In this study, based on the random forest machine learning model and taking Aurora 3000 as an example, we have proposed a new method to correct the scattering coefficient measurements of a three-wavelength nephelometer under different relative humidity conditions. The result shows that the empirical corrected values match Mie-calculation values very well at all three wavelengths and under all of the measured relative humidity conditions, with more than 85 % of the corrected values having less than 2 % error. The correction method obtains a scattering coefficient with high accuracy and there is no need for additional observation data.

Atmospheric aerosol particles directly impact the Earth's radiative balance by scattering or absorbing solar radiation. However, the uncertainty of aerosol direct radiative forcing varies greatly, ranging between

In order to correct the measurement errors of the nephelometer, Anderson and Ogren (1998) used a single parameter as the scattering correction factor (CF) to quantify the nonideal effects. The CF is defined as the ratio of Mie-calculated scattering coefficient to that measured by the nephelometer and is closely related to the aerosol size and chemical composition. Müller et al. (2011) summarized several methods that have been proposed to derive the CF. Initially, researchers simulated the nephelometer measurements based on the Mie model. That is, they replaced the ideal sinusoidal function with the nephelometer's actual scattering angle sensitivity function to derive the scattering coefficient under nephelometer light source conditions. The scattering coefficient under the condition of ideal Lambertian light is also obtained by the Mie model, which allows calculation of the CF. However, this method additionally needs the particle number size distribution (PNSD), particle shape, and refractive index (Quirantes et al., 2008). It is not convenient to obtain simultaneous PNSD data because the measurement instrument is expensive and not easy to maintain.

An alternative popular correction mechanism is to constrain the CF simply by the wavelength dependence of scattering (scattering Ångström exponent, SAE). Considering that the SAE and CF both rely on particle size, Anderson and Ogren (1998) established a linear relationship between them for each TSI nephelometer's wavelength. This ingenious method is convenient because the scattering properties at different wavelengths, or SAEs, can be directly measured by the nephelometer itself. However, Bond et al. (2009) found that the SAE is also affected by the particle refractive index, while the CF is scarcely impacted by it. This difference renders the regression method less accurate. Furthermore, the absorption properties of sampled particles can alter the wavelength dependence of scattering, contributing to errors in this correction method for absorbing aerosols (Bond et al., 2009). Therefore, it is not an accurate correction method to establish a simple linear relationship between a single parameter SAE and CF.

In this study, the measurement limitations of angle truncation and the nonideal Lambertian light source are both considered. In light of the disadvantages of the methods mentioned above, we propose a new correction method for the scattering coefficient measurements of a three-wavelength nephelometer with the use of a machine learning model and taking an Aurora 3000 correction as an example. A description of the data and methodology under dry and other relative humidity conditions is given in Sect. 2. The verifications of the linear regression method and our new method are presented in Sect. 3. Finally, the conclusions are presented in Sect. 4.

Eight field observations (Table 1) were conducted at different time periods
in China, including two observations in Wuqing (39

Summary of the eight field observations used in this paper.

The number size distribution of the measured aerosol in

The SSA of field observations (1)–(8).

This paper proposes a simple and precise method of deriving the CF. Inspired by establishing a linear relationship between the SAE and CF (Anderson and Ogren, 1998; Müller et al., 2011), this paper first elucidates more parameters that exert impacts on the CF and can be directly obtained by nephelometer measurements. Considering the complex relationships among parameters and the requirements of the ordinary regression method (e.g., independent variables), it is not an appropriate means to use regression analysis to derive the relationship between the CF and some variables at each wavelength. Therefore, a random forest (RF) machine learning model from the scikit-learn machine learning library (Pedregosa et al., 2011), an effective method that can be used for classification and nonlinear regression (Breiman, 2001), is adopted. The RF model has several advantages (Zhao et al., 2018). First, it involves fewer assumptions of dependency between observations and results than traditional regression models. Second, there is no need for a strict relationship among variables before implementing the model simulation. Third, this model requires much fewer computing resources than deep learning. Finally, it has a lower over-fitting risk. Based on this machine learning model, our new method splits the above datasets into seven training datasets and one test dataset and then uses the Mie model and the training datasets to calculate the CF. The training dataset CFs, combined with parameters that impact the CF and can be directly obtained from nephelometer, are used to train the machine learning model. The derived RF models are verified by the test dataset. If the verification results are credible, the RF models can be directly used in field measurements to predict the in situ CF and finally obtain the corrected scattering coefficient.

An important feature of Mie scattering is that the larger the particle, the more forward scattering, meaning that the ratio of the backscattering coefficient to total scattering coefficient, or the hemispheric backscattering fraction (HBF), becomes smaller. Therefore, the HBF can to some extent stand for aerosol size and this paper aims to determine whether the HBF can be used as one parameter to predict the CF or not. Considering that both the SAE and CF relate to particle size, this paper uses the datasets of field observations (1)–(7) to explore the relationship between the CF and the calculated SAE and HBF at different wavelengths (Fig. 3).

Scattering correction factors versus the scattering
Ångström exponent. Panels

Following the method of Anderson and Ogren (1998) and Müller et al. (2011), we established a linear regression equation between the CF and SAE (black dashed lines). It is found that the change in the CF could be constrained by the change of the SAE to a certain extent, but the data points are dispersed from the regression equation. The larger the HBF, the greater the slope of the CF changes with the SAE. Therefore, besides the SAE, the HBF can be utilized to provide extra information on particle size and thus predict the CF.

Before establishing the relationship between the CF and the calculated SAE and HBF,
it is necessary to obtain the size range for which particles contribute significantly
to the variations of the SAE and HBF. This paper makes the assumption that there are three
independent types of particle composition: scattering particles, absorbing
particles, and core–shell mixed particles with a core radius of 35 nm. The
refractive index is

As shown in Fig. 4, for all three types of aerosols, scattering is mainly concentrated in the size range of 100–1000 nm; particles larger than 1000 nm contribute little to the total scattering and hence there is no followup discussion of the SAE change of these large particles. When particles are smaller than 1000 nm, the overall trend is that the SAE decreases with increasing particle size and that the SAE calculated at different wavelengths is obviously different. Especially when the particle is greater than 300 nm, the SAE variation with particle diameter is large, while particles in the size range of 100–300 nm contribute little to SAE variations. Therefore, the SAE variability is mostly sensitive to the concentration of particles in the 300–1000 nm size range.

The SAE change of scattering particles

The HBF change of scattering particles

From Fig. 5, for environmental aerosol particles the backscattering of particles in the 100–1000 nm range also contributes a lot to the total scattering. The HBF characteristics of particles greater than 1000 nm is not discussed further. For particles with a size less than 300 nm, all three types of aerosol particles show a noticeable HBF variation with the change of particle size. However, particles larger than 300 nm contribute little to HBF variations. In other words, HBF variability is mostly sensitive to the concentration of particles in the 100–300 nm size range.

Based on the above analysis, it is known that the SAE and HBF can represent different size information of aerosol particles (300–1000 nm for the SAE and 100–300 nm for the HBF), and they are used together to derive the particle size information in the range of 100–1000 nm. Therefore, the SAE and HBF are two parameters that can be used for the machine learning process.

In order to calculate accurate SAEs and HBFs, scattering and backscattering
information should be accurate. Considering that it is also affected by the
mass concentration of BC and aerosol mixing states, not only PNSD but also
black carbon (BC) data are needed to run the Mie model. According to Ma et
al. (2012), when calculating the amount of externally mixed BC and
core–shell mixed BC,

In summary, our nephelometer correction method under dry conditions
encompasses the following procedure (Fig. 6):

Obtain information on particle number size distribution (PNSD), black carbon (BC), and mixing state (

Calculate the scattering and backscattering using the Mie model under the nephelometer light source conditions at the wavelengths of 450, 525, and 635 nm.

Calculate the hemispheric backscattering fractions (HBFs) at the three wavelengths.

Calculate the scattering Ångström exponent (SAEs) of the three wavelength combinations (

Calculate the scattering and backscattering using the Mie model under the ideal light source conditions at the wavelengths of 450, 525, and 635 nm.

Based on the results of the second and fifth steps, calculate the theoretical CF at the three wavelengths.

Use six parameters, including three HBFs and three SAEs, and the theoretical CF of each wavelength to train the machine learning model, which derives the RF predictor.

Verify the predictive validity of the trained model with the dataset of Gucheng.

Flow chart for estimating the CF under dry conditions by machine learning.

Under elevated relative humidity conditions, a correction method taking the hygroscopicity into account is needed because, with the increment of relative humidity, the non-absorbing component in the aerosol particle can take up water due to its hygroscopic growth. Accordingly, the water content and particle size may change, resulting in a certain change in the CF for the same group of aerosol particles. Therefore, besides the SAE and HBF, more parameters related to hygroscopicity should be considered when deriving the CF under elevated relative humidity conditions.

The hygroscopicity or aerosol hygroscopic growth could be indicated by the
scattering hygroscopic growth curve

When it comes to the aerosol overall hygroscopicity, according to 24 size
distributions of

The comparison between

Therefore, this paper attempts to derive the CF under different RH conditions in
a similar machine learning way as described for the dry state. First of all,
we need to find parameters impacting the CF under different RH conditions.
Aerosol size accounts for the CF, as discussed in Sect. 2.2.1, and thereby the SAE and
HBF in the dry state at three wavelengths are needed. In addition,
hygroscopicity matters to a large extent.

With the PNSD information, refractive index of the dry aerosol, mixing state,
size distribution of

Flow chart for estimating the CF under different relative humidity conditions by machine learning.

To summarize, our nephelometer correction method under different
relative humidity conditions encompasses the following procedure (Fig. 8):

Obtain information on particle number size distribution (PNSD), black carbon (BC), mixing state (

Calculate the scattering and backscattering using the Mie model under nephelometer light source conditions at the wavelengths of 450, 525, and 635 nm in the dry state.

Calculate the hemispheric backscattering fractions (HBFs) at the three wavelengths under dry conditions.

Calculate the scattering Ångström exponent (SAEs) of the three wavelength combinations (

Under different relative humidity conditions and assumptions of aerosol hygroscopicity, according to the

Calculate

Calculate the fitting parameters of

Calculate the scattering and hemispheric backscattering after the hygroscopic growth under ideal light source conditions at three wavelengths.

Based on the results of the fifth and eighth steps, calculate the theoretical CF at the three wavelengths.

Use 13 parameters, including three HBFs and three SAEs, relative humidity RH, three

Verify the predictive validity of the trained model with the dataset of Gucheng.

The comparison of theoretical correction factor and predicted
correction factor calculated by the Ångström index at different
wavelengths; panels

In order to verify the methods introduced above on the basis of Gucheng data and the derived RF predictor, we have predicted the CF and compared it with the theoretical Mie-calculated CF. First of all, for comparison, Gucheng data are used to verify the simple linear parameterization shown in Anderson and Ogren (1998) and Müller et al. (2011).

The PNSD and BC data of Gucheng are used to establish linear fit relationships between the CF and the corresponding SAE at three different
wavelengths (450, 525, and 635 nm), which are, respectively, represented as:

When aerosols take up water due to hygroscopic growth with Gucheng data, this paper
establishes different linear statistical relationships under different
relative humidity conditions in order to estimate the CF. The data points
gradually become dispersed from the

The comparison of theoretical correction factor and predicted
correction factor calculated by the Ångström index at different
wavelengths; panels

Therefore, the ordinary linear regression method of establishing a relationship between the CF and a single parameter SAE (Anderson and Ogren, 1998; Müller et al., 2011) cannot be applied to most cases, especially under the condition of high relative humidity.

When it comes to the results of our new method, as shown in Fig. 11 for 450 and 525 nm, the prediction performance is relatively good, and the
correlation coefficient between prediction value and the theoretical
Mie calculation is 0.88 and 0.84, respectively. More than 90 % of the
points fall within the 2 % error range and most of them are basically
concentrated near the

Under dry conditions, the comparison of the correction factors
calculated by our method and the theoretical Mie-calculation values at the
wavelengths of 450 nm

This paper uses each PNSD of the field observations (1)–(7) and averages them to plot Fig. 12, which represents the variation characteristics of the CF with the change of relative humidity and aerosol population hygroscopicity at the three wavelengths of 450, 525, and 635 nm, respectively.

Under all of the different relative humidity conditions, the CF at 450 nm is the largest, with that of 525 nm coming second, and that of 635 nm being the smallest (Fig. 12). All CFs at the three wavelengths increase with the increment of relative humidity. Furthermore, if the relative humidity remains constant, the CF also increases as aerosol hygroscopicity increases. This is reasonable since the environment relative humidity and the hygroscopicity of aerosols have positive impacts on particle sizes and thus the CF.

The theoretical calculation of the scattering correction factors (CFs) versus relative humidity (RH) and hygroscopicity

At different relative humidity conditions, the comparison of the
correction factors calculated by our method and the theoretical Mie-calculation
values at the wavelengths of 450 nm

Our correction method under different RH conditions takes the humidity and
hygroscopicity into account. As depicted in Fig. 13, the new method predicts
the CF very well at all the three wavelengths, and nearly all scatter points at
the three wavelengths are centered near the

The aerosol scattering coefficient is an essential parameter for estimating aerosol direct radiative forcing, which can be measured by nephelometers. However, nephelometers have the problems of a nonideal Lambertian light source and angle truncation and hence the observed scattering coefficient data need to be corrected. The scattering correction factor (CF) is thus proposed and it depends on the aerosol size and chemical composition. The most direct calibration method is to combine the particle number size distribution, black carbon data, and Mie scattering model to correct the nephelometer. However, this method requires auxiliary measurement data. After proposing this method, the scattering Ångström exponent (SAE) measured by nephelometer itself is utilized to establish a linear relationship with the CF. After verification, it is found that the method lacks precision and accuracy. Therefore, our paper has proposed a new method of nephelometer self-correction.

Under dry conditions and after the analysis, the SAE and HBF can represent different
ranges of aerosol particle size information (300–1000 nm for the SAE and 100–300 nm for the HBF). With the use of the existing observation results of PNSD, black carbon, and

Under other relative humidity conditions, the humidified nephelometer system
is utilized. In addition to the dry aerosol particle size information, we
should also consider the change in water content and particle size brought
by the growth of aerosol taking up water. This paper finds that the CF increases
with the increment of relative humidity and aerosol hygroscopicity.
Therefore, on the basis of

The strengths of our new method are summed up as follows: under either dry or any other relative humidity conditions, the prediction performance of the CF at three wavelengths is excellent. Furthermore, at each relative humidity, the accuracy of the CF estimation is almost the same. All inputs can be obtained through the nephelometer's observations, thus achieving self-correction; that is, on the basis of ensuring the accuracy of correction, there is no need for other aerosol microphysical observations.

Due to the limitations of Mie theory, our method cannot be applied to analyze datasets that include desert and marine aerosols and hence further studies are needed. In this study, the new method is put forward only based on datasets in the North China Plain. There might be errors in applying our RF models to predict the CF all over the world. Therefore, more field observation datasets are needed to verify and perfect this method, hopefully establishing a database of RF models in the future.

The data and codes used in this study are available by request to the author (email: zcs@pku.edu.cn). They can also be obtained from

JQ, WT, GZ, YY, and CZ discussed the results; WT offered his help with the coding; and JQ wrote the manuscript.

The authors declare that they have no conflict of interest.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We acknowledge the support of the National Natural Science Foundation of China.

This research has been supported by the National Natural Science Foundation of China (grant no. 41590872).

This paper was edited by Paolo Laj and reviewed by three anonymous referees.