New correction method for the scattering coefficient measurements of a three-wavelength nephelometer

The aerosol scattering coefficient is an essential parameter for estimating aerosol direct radiative forcing and can be measured by nephelometers. Nephelometers are problematic due to small errors of nonideal Lambetian light source and angle truncation. Hence, the observed raw scattering coefficient data need to be corrected. In this study, based on the random forest machine learning model and taking Aurora 3000 as an example, we have proposed a new method to correct the scattering coefficient measurements of a threewavelength nephelometer under different relative humidity conditions. The result shows that the empirical corrected values match Mie-calculation values very well at all three wavelengths and under all of the measured relative humidity conditions, with more than 85 % of the corrected values having less than 2 % error. The correction method obtains a scattering coefficient with high accuracy and there is no need for additional observation data.


Site description
Eight field observations (Table 1)  Wangdu, Zhangqiu, Gucheng) are located in suburban areas, representing the characteristics of regional anthropogenic aerosols on the North China Plain. Measurement in Beijing was conducted in Peking University (downtown Beijing), surrounded by two heavy traffic roads, and hence it can well represent the typical case of urban pollution. As shown in Fig. 1, the number size distributions of our datasets cover a wide range of 10-1000 nm, including most continental aerosol types. 70 Table 1. The summary of eight field observations used in this paper.

Correction under dry conditions 75
An important feature of Mie scattering is that the larger the particle, the more forward scattering, meaning that the ratio of backscattering coefficient to total scattering coefficient, or hemispheric backscattering fraction (HBF), would become smaller. Therefore, HBF can to some extent stand for aerosol size. Considering that both SAE and correction factor relate to particle size, this paper uses the datasets of field observation (1)-(7) to explore the relationship between scattering correction factor (hereinafter CF) and calculated SAE and HBF at different wavelengths. 80 CF could be constrained by SAE to a certain extent, but we cannot fit them with a simple linear regression equation in that it is also related to HBF (Fig. 2). On the whole, the larger the HBF, the greater the slope of CF changing with SAE.
https://doi.org/10.5194/amt-2020-412 Preprint. Discussion started: 15 January 2021 c Author(s) 2021. CC BY 4.0 License. Before establishing the relationship between CF and calculated SAE and HBF, it is necessary to figure out the size range represented by SAE and HBF. The paper makes the assumption of three types of particle composition: scattering particles, absorption particles, and core-shell mixed particles with the core radius of 35 nm; based on this assumption and datasets mentioned above, the variation of SAE at the three wavelength combinations (450+525 nm, 450+635 nm, 525+635 nm) and 90 HBF at the three bands (450 nm, 525 nm and 635 nm) in the particle size range (100 nm-10 μm) is calculated by the Mie model. Additionally, to distinguish the particle size range where the change of SAE and HBF can be obviously manifested in the overall optical properties of aerosols, the paper also calculates the ratio of size-resolved scattering and hemispheric backscattering to total scattering for three types of assumed aerosols.
As shown in Fig. 3, for all the three types of aerosols, scattering is mainly concentrated in the size range of 100-1000 nm; 95 while particles larger than 1000 nm contribute little to the total scattering, and hence there is no follow-up discussion of SAE change of these large particles. When particles are smaller than 1000 nm, the overall trend of SAE is decreasing with the increase of particle size. Especially when the particle is greater than 300 nm, with particle size increasing, the decline rate of SAE is relatively large, and SAE calculated at different bands is obviously different. Therefore, the SAE calculated at the three band combinations is approximately representative of particle size ranging from 300 to 1000 nm. From Fig. 4, for environmental aerosol particles, the backscattering of particles in the 100-1000 nm range also contributes 105 a lot to the total scattering, and the HBF characteristics of particles greater than 1000 nm are no longer discussed below. For particles with a size less than 300 nm, all three types of aerosol particles show a noticeable feature of HBF decreasing with the increment of particle size. However, when the particle becomes larger than 300 nm, the HBF is almost unchanged. In other words, HBF can represent the size information of particles smaller than 300 nm.  Based on the above analysis, it is known that SAE and HBF can represent different size information of aerosol particles, finally deriving the particle size information of 100-1000 nm. Moreover, it should be noted that both SAE and HBF mentioned 115 above are the calculation results from scattering and backscattering coefficients under the Aurora 3000 nephelometer light source condition.
In order to derive accurate scattering and backscattering information which is also affected by the mass concentration of BC and its mixing state with other aerosols, not only PNSD but also black carbon (BC) data are needed to run the Mie model.
According to Ma et al. (2012), when considering the amount of externally mixed BC and core-shell mixed BC, we use R ext to 120 represent the ratio of the mass concentration of the externally mixed BC (M ext−BC ) to that of the total BC (M BC ): (1) It is pointed out that R ext is sensitive to HBF. Therefore, on the basis of Mie model, we use PNSD, M BC and the assumed R ext value to calculate HBF. Next, the calculation HBF is compared with the observation result of nephelometers. If their difference is minimal, the assumed R ext value is considered true. Deriving mass concentration of BC and PNSD data, 125 assuming that the true R ext is consistent at each size and there is no difference in the radius of core-shell mixed particles with the same size, we can calculate the number size distribution of core-shell mixed BC and externally mixed BC. Furthermore, the refractive index can also be obtained, making it possible to derive more precise information of scattering, backscattering and then SAE and HBF. Details about this method of retrieving PNSD and refractive indices can be found in Ma et al. (2012). 130 Since it is difficult to use a linear regression equation to figure out the relationship between these six physical quantities (three SAE and three HBF) and CF, a random forest machine learning model from the scikit-learn machine learning library (Prettenhofer et al., 2011), an effective method that can be used for classification and nonlinear regression (Breiman, 2001), is adopted. The random forest model has several advantages (Zhao et al., 2018) as follows: First, it involves fewer assumptions of dependency between observations and results than traditional regression models. Second, there is no need for a strict 135 relationship among variables before implementing model simulation. Third, this model requires much fewer computing resources than deep learning. Finally, it has a lower over-fitting risk.
In brief, our correction method of nephelometers under the dry condition encompasses the following procedures ( Fig. 5): (1) Obtain information on particle number size distribution (PNSD), black carbon (BC), and mixing state (R ext ) of field observation (1)-(7). 140 (2) Calculate the scattering and backscattering by Mie model under the conditions of the nephelometer light source at the wavelengths of 450 nm, 525 nm and 635 nm.
(3) Calculate the hemispheric backscattering fraction HBF at the three wavelengths.
(4) Calculate the scattering Ångström index SAE of the three band combinations (450+525 nm, 450+635 nm, 525+635 nm). (6) Based on the results of the second and fifth steps, calculate theoretical CF at the three wavelengths.
(7) Use six parameters, including three HBF and three SAE, and theoretical CF of each wavelength to train the machine learning model, deriving RF predictor.  Müller et al. (2011) established the linear fit relationships between CF and the corresponding SAE at different wavelengths.

Correction under different RH conditions
By using the data of Gucheng to verify his method, our paper finds that the predicted correction factor gradually becomes 155 much more different from the theoretical Mie-calculation correction factor as the relative humidity increases, and thereby the statistical relationship between CF and SAE can no longer represent most cases. Especially under elevated relative humidity conditions, a correction method taking the hygroscopicity into account is needed, because, with the increment of relative humidity, the non-absorbing component in the aerosol particle can take up water due to its hygroscopicity and then grow up.
Accordingly, the water content and particle size may change, resulting in a certain change of CF for the same group of aerosol 160 particles.
The hygroscopicity or aerosol hygroscopic growth could be indicated by the scattering hygroscopic growth curve f(RH) and the backscattering hygroscopic growth curve ( ): At low relative humidity, the growth due to aerosol taking up water is weak and thus the change of f(RH) and ( ) is small; as relative humidity goes up, the aerosol hygroscopic growth is obvious, and particle size changes a lot. Correspondingly, the change of f(RH) and ( ) is large. Referring to researches of 165 Kuang et al. (2017) and Brock et al. (2016), the following formulas are used to describe f(RH) and ( ): where κ sca and κ bsca are fitting parameters representing the hygroscopic growth rate in aerosol scattering and backscattering.
According to the PNSD of outfield observation (1)-(7) and different assumed size distributions of κ, the theoretical Mie-170 calculation values are presented as scatter points in Fig. 6. On the basis of the above formulas, the lines represent fitted curves under the condition of nephelometer light source. As can be seen, for the three bands, Eq. (2) and Eq. (3) basically describe the trend of f(RH) and ( ). In other words, aerosol scattering and hemispheric backscattering hygroscopic growth can be represented by parameters of κ sca and κ bsca . As a result, we wonder whether or not the hygroscopic growth of scattering correction factor, or C(RH), could be fitted similarly as above formulas with parameter κ c . The black scatter points in the 175 figure do not lie close to the black dashed lines, and accordingly, the fit formula cannot accurately describe C(RH).  Therefore, this paper attempts to derive CF under different RH conditions in a similar machine learning way as described for the dry state. First of all, we need to figure out parameters impacting CF under different RH conditions. Aerosol size 185 accounts for CF, referring to Sect. 2.2, and thereby SAE and HBF in the dry state at three wavelengths are needed. Besides, hygroscopicity matters to a large extent; κ-Köhler theory (Petters and Kreidenweis, 2007) is thus applied, which uses hygroscopicity parameter κ to describe the hygroscopic growth of aerosol particles under different relative humidity conditions: Where S is saturation ratio; D is the diameter of the aerosol particle after hygroscopic growth; is the diameter of the aerosol 190 particle in the dry state; / is the surface tension at the interface between the solution and air; T represents absolute temperature; is the molar mass of water; R is the universal gas constant and is the density of water.
With the information of PNSD, refractive index of dry aerosol, mixing state, size distribution of κ, and water refractive index of 1.33 − 10 −7 i (Seinfeld and Pandis, 2006), on the basis of κ-Köhler theory (Eq. (4)), we can calculate the aerosol optical parameters at different RH, deriving f(RH) and ( ). Next, Eq.
(2) and Eq. (3) are used to fit the curve of f(RH) and ( ) at each wavelength, deriving fitting parameters κ sca and κ bsca which can imply the size-resolved hygroscopicity.
Combined with relative humidity, the estimated change of CF with the relative humidity involves up to 13 physical quantities.
To summarize, our correction method of nephelometers under different relative humidity conditions encompasses the following procedures (Fig. 7): (1) Obtain information on particle number size distribution (PNSD), black carbon (BC), mixing state ( R ext ), aerosol 200 hygroscopicity parameter (κ), and relative humidity RH of field observation (1)-(7).  (9) Based on the results of the fifth and eighth steps, calculate theoretical CF at the three wavelengths. 215 (10) Use thirteen parameters, including three HBF and three SAE, relative humidity RH, three κ sca and three κ bsca at this RH, and theoretical CF of each wavelength to train the machine learning model, deriving the RF predictor.

Results and discussions
In order to verify the methods introduced above, on the basis of Gucheng data and the derived RF predictor, we have predicted CF and compared it with the theoretical Mie-calculated CF.

Under dry conditions 225
As can be seen from Fig. 8, for 450 nm and 525 nm, the prediction performance is relatively good, and the correlation coefficient between prediction value and the theoretical Mie-calculation is 0.88 and 0.84, respectively; more than 90 % of the points fall within the error range of 2 %, and most of them are basically concentrated near the 1:1 line. For 635 nm, the result is slightly worse, with the correlation coefficient at 0.76 and 85.88 % of points in error by less than 2 %. In general, compared with the traditional correction method, our method does not need to consider whether or not the aerosol has strong or 230 wavelength dependent absorption, which improves the accuracy of the CF estimation in the dry state; in addition, the input parameters can be obtained by the nephelometer's observation.

Under different RH conditions
The paper uses each PNSD of the field observation (1)-(7) and averages them to plot Fig. 9 which represents the variation characteristics of CF with the change of relative humidity and aerosol population hygroscopicity, at three wavelengths of 450 240 nm, 525 nm and 635 nm, respectively. When it comes to the aerosol hygroscopicity, according to 24 size distributions of κ obtained from Hachi field observation (Liu et al., 2014), the paper takes their average size distribution (the total volumeweighted κ is 0.281) as the basis; next, in order to obtain a sequence of size distributions of κ, the basis κ is multiplied from 0.05 to 2, with 0.01 as the interval. Therefore, different colors in the figure indicate the overall hygroscopicity of different aerosols. 245 As shown in Fig. 9, under all the different relative humidity conditions, CF of the 450 nm is the largest, with that of 525 nm coming second, and that of 635 nm is the smallest. All CFs at the three bands increase with the increment of relative humidity. Furthermore, if the relative humidity remains constant, CF also increases with aerosol hygroscopicity increasing. Therefore, the results support the view that both the environment relative humidity and the hygroscopicity of aerosols impact CF.

Conclusions 270
The aerosol scattering coefficient is a significant parameter for estimating aerosol direct radiative forcing, which can be measured by nephelometers. However, nephelometers have the problem of non-ideal Lambertian light source and angle truncation, hence the observed scattering coefficient data need to be corrected. The scattering correction factor (CF) relating to the aerosol size and chemical composition is thus put forward. The most direct calibration method is to use the particle number size distribution and Mie scattering model to correct the scattering values at each wavelength under the actual 275 nephelometer light source and the ideal light source condition. However, this method requires the auxiliary particle number size distribution and black carbon observation data, which are expensive and difficult to acquire. Later, scattering Ångström index (SAE) measured by nephelometer itself is utilized to represent aerosol particle size information, and the relationship between SAE and CF is then established. After verification, it is found that the method lacks precision and accuracy, because SAE is affected by both particle size and refractive index, while the correction factor is scarcely impacted by the refractive index. If relative humidity increases, the particle size and refractive index may also increase accompanied by the change of aerosol water content. That is to say, the relationship between SAE and CF is complicated. Moreover, the absorption properties of sampled particles also alter the wavelength dependence of scattering, contributing to errors of this correction method for absorbing particles. Therefore, a single parameter SAE cannot predict CF very well, and our paper has proposed the new method of nephelometer self-correction. 285 Under dry conditions, after analysis, SAE and HBF can represent different ranges of aerosol particle size information.
With the use of the existing observation results of PNSD, black carbon and R ext to obtain SAE and HBF, the paper applies random forest machine learning model to figure out the relationship between CF and calculated SAE and HBF, deriving the RF predictor. With the dataset of Gucheng, the verification results show that this method is relatively accurate. The commonly used integrating nephelometer can derive in situ scattering and backscattering coefficients at three wavelengths to calculate 290 three SAE and three HBF. Therefore, with the use of derived RF predictor and nephelometer calculation of SAE and HBF, CF could be predicted by the nephelometer itself.
Under other relative humidity conditions, in addition to the dry aerosol particle size information, we should also consider the change in particle size and water content brought by the growth of aerosol taking up water. This paper finds that CF increases with the increment of relative humidity and aerosol hygroscopicity. Therefore, on the basis of κ-Köhler theory, the 295 existing observation results of PNSD, black carbon, R ext , aerosol hygroscopicity parameter κ, and relative humidity are used to run the Mie model, obtaining the theoretical CF and 13 quantities relating to the change of CF under different RH conditions. Similarly, the random forest machine learning model is adopted to figure out the relationship between CF and the 13 quantities, deriving the RF predictor. With the dataset of Gucheng, the verification results show that the accuracy of CF obtained by this method is very high. The humidified nephelometer system can observe scattering and hemispheric backscattering coefficients 300 at three wavelengths under both dry and elevated RH conditions, obtaining corresponding f(RH) and ( ) under the nephelometer light source condition. As a result, all the 13 quantities, including six physical quantities of SAE and HBF representing dry aerosol size at each wavelength, six fitting parameters κ sca and κ bsca representing particle size-resolved hygroscopicity at each wavelength, and the relative humidity, can be directly obtained from nephelometers. Therefore, with the use of derived RF predictor and the above 13 quantities, CF could be predicted in situ by nephelometer itself. 305 The strengths of our new method are summed up as follows: Under either dry or any other relative humidity conditions, the prediction performance of CF at three wavelengths is excellent. Furthermore, at each relative humidity, the accuracy of CF estimation is almost the same. All inputs can be obtained through the nephelometer's observation, achieving self-correction; that is, on the basis of ensuring accuracy of correction, there is no need for other aerosol microphysical observations.
When it comes to the weaknesses, in this study, the new method is put forward based on the datasets of continental 310 aerosols. Moreover, due to limitations of the Mie theory, our method cannot be applied to analyse datasets which include desert and marine aerosols and hence further studies are needed. There might be errors in applying limited RF predictors to predict CF all over the world. Therefore, more field observation datasets are needed to verify and perfect this method, hopefully establishing a database of RF predictor in the future.  Competing interests. The authors declare that they have no conflict of interest.