Method to retrieve cloud condensation nuclei number concentrations using lidar measurements

Determination of cloud condensation nuclei (CCN) number concentrations at cloud base is important to constrain aerosol-cloud interactions. A new method to retrieve CCN number concentrations using backscatter and extinction profiles from multiwavelength Raman lidars is proposed. The method implements hygroscopic enhancements of backscatter and extinction with relative humidity to derive dry backscatter and extinction and humidogram parameters. Humidogram parameters, Ångström exponents, and lidar extinction-to-backscatter ratios are then linked to the ratio of CCN number 15 concentration to dry backscatter and extinction coefficient (AR#). This linkage is established based on the datasets simulated by Mie theory and κ-Köhler theory with in situ measured particle size distributions and chemical compositions. CCN number concentration can thus be calculated with AR# and dry backscatter and extinction. An independent theoretical simulated dataset is used to validate this new method and results show that the retrieved CCN number concentrations at supersaturations of 0.07%, 0.10%, and 0.20% are in good agreement with theoretical calculated values. Sensitivity tests indicate that retrieval 20 error in CCN arise mostly from uncertainties in extinction coefficients and RH profiles. The proposed method improves CCN retrieval from lidar measurements and has great potential in deriving scarce long-term CCN data at cloud base which benefits aerosol-cloud interaction studies.


Introduction
Anthropogenic activities have caused an increase in atmospheric aerosols, and some of the aerosol particles affect the climate by serving as cloud condensation nuclei (CCN).CCN in clouds can modify cloud-forming processes and cloud microphysical properties (Rosenfeld et al., 2014).Although numerous impacts of aerosol-cloud interactions on radiative forcing (McCoy et al., 2017;Zhou et al., 2017), precipitation (Xu et al., 2017;Fan et al., 2018), cloud electrification (Wang et al., 2018), and severe weather or hazards (Fu et al., 2017) have been discovered, constraining the relationships between aerosols and clouds is still a big challenge (Seinfeld et al., 2016).Lacking the knowledge of aerosolcloud interactions limits our ability to estimate climate forcing caused by aerosols (Boucher et al., 2013).
Aerosol CCN supersaturation activation spectrum is one of the most critical parameters to quantify aerosol-cloud interactions (Schmale et al., 2018).Despite the fact that a large number of CCN number concentrations near ground have been measured worldwide (Tao et al., 2018a), groundmeasured CCN may not represent CCN at cloud base that alter clouds directly.Obtaining CCN near cloud base becomes a crucial issue.Cloud base CCN can be measured in situ on aircraft platforms, but airborne measurements have the limitations of huge costs and discontinuity.Satellites have difficulty observing CCN at cloud base because clouds can obscure aerosol signals beneath them.Rosenfeld et al. (2016) have proposed an alternative approach for satellites to retrieve CCN concentrations using clouds as CCN chambers; Published by Copernicus Publications on behalf of the European Geosciences Union.
W. Tan et al.: Lidar retrieval of cloud condensation nuclei number concentrations however, employing CCN concentrations derived with this strategy limits our exploration of the relationship between CCN concentrations and cloud droplet concentrations in the natural environment.So far, CCN concentrations at cloud base are scarce for aerosol-cloud interaction studies.
Ground-based lidars can continuously provide optical properties of aerosol particles from ground up to cloud base (Mattis et al., 2016;Li et al., 2019), suggesting great potential in deriving CCN concentrations near cloud base.Ghan and Collins (2004) propose a simple method to infer CCN profiles with the combination of surface in situ CCN and aerosol optical measurements.The method is only applicable when the boundary layer is well mixed from surface to cloud base (Ghan et al., 2006).Mamouri and Ansmann (2016) investigate the potential of single-wavelength polarization lidar to retrieval CCN for three aerosol types (desert, nondesert continental, and marine).The polarization lidar can separate desert and nondesert by means of the particle linear depolarization ratio.Based on datasets from multiyear AErosol RObotic NETwork (AERONET) observations, valid relationships are found between particle extinction coefficients, and number concentrations of particles with dry radius larger than 50 nm (for nondesert and marine) and 100 nm (for desert).CCN concentrations at different supersaturations are parameterized with the particle number concentration derived from extinction profiles according to aerosol types.The consideration of the hygroscopicity of ambient particles is empirical.In addition, single-wavelength lidar also lacks sufficient information to quantify particle number concentration, which will bring large uncertainty on CCN retrieval.
Multiwavelength Raman lidars (MWRLs) have been increasingly used to detect aerosol vertical distributions in recent years.The principle of MWRLs allows independent retrieval of particle backscatter (β) and extinction coefficients (α), which provides more information about particle microphysical properties (Müller et al., 2016).The 3β+2α MWRL systems (backscatter coefficients at 355, 532, and 1064 nm and extinction coefficients at 355 and 532 nm) have been widely recommended to derive particle microphysical properties (Burton et al., 2016).Existing approaches to retrieve CCN using MWRLs are based on microphysical inversion techniques.Lv et al. (2018) build a lookup table based on AERONET datasets to retrieve particle number size distributions from backscatter and extinction profiles.Then assumed activation critical diameters according to aerosol type classification together with the retrieved optically equivalent particle size distributions are utilized to calculate CCN concentrations.It is worth noting that most of the foregoing methods implement crude particle type classification to determine particle hygroscopicity.
There are three major challenges in CCN concentration retrieval with lidars.The first is the conversion of lidarderived optical properties into particle number concentrations.High uncertainties of retrieved particle number con-centrations could be an important source of CCN retrieval error.The second one is the determination of particle hygroscopicity in order to evaluate the ability of particles to participate as CCN.Particle hygroscopicity, which is highly related to chemical composition and the aging/coating effect, is found to cause nonnegligible variations in cloud droplet activation (Hudson, 2007;Zhang et al., 2017).The last is the influence of high relative humidity (RH) near clouds.Aerosol particles are likely to be humidified in the ambient environment, and the consequent changes in optical properties make CCN retrieval more challenging.Most studies working on CCN retrieval with MWRLs mainly focus on deriving particle number concentrations, but seldom commence to solve the issue of hygroscopicity.
In recent years, several aerosol hygroscopic studies based on lidar measurements have been carried out (Fernández et al., 2018;Lv et al., 2017;Bedoya-Velásquez et al., 2018).Backscatter and extinction enhancement factors can be derived with lidar measurements and RH profiles.The enhancement factor, which is associated with both particle size and hygroscopicity (Kuang et al., 2017), is defined as where f ξ is the enhancement factor of the optical property ξ (backscatter or extinction) at a specific light wavelength λ and RH, and RH ref is the reference RH value.Many studies manifest that lidar-derived enhancement factors are in good agreement with in situ measurements (Wulfmeyer and Feingold, 2000;Pahlow et al., 2006;Fernández et al., 2015;Rosati et al., 2016).Feingold and Morley (2003) demonstrate that the extent of backscatter and extinction enhancements hints at the ability of particles to serve as CCN.Tao et al. (2018b) use in-situ-measured light-scattering enhancement factors to predict N CCN at 0.07 % supersaturation, and the result shows strong consistency with the CCN counter.
In this paper, a new method to retrieve CCN number concentrations for 3β + 2α MWRL systems is proposed.Different from the foregoing approaches which use AERONET datasets, we use in-situ-measured microphysical and chemical data in this study.Theoretical simulations based on in situ measurements are carried out to seek the relationship between CCN number concentrations and lidar-derived optical properties.The simulation implements κ-Köhler theory (Petters and Kreidenweis, 2007) to describe particle hygroscopic growth and activation processes.Mie theory (Bohren and Huffman, 2007) is utilized to calculate particle backscatter and extinction coefficients from in-situ-measured aerosol microphysical and chemical properties.The enhancements of backscatter and extinction with RH are introduced to quantify particle hygroscopicity instead of using empirical estimation according to aerosol type classification.The new method is applicable to well-mixed aerosol layers.We take datasets in the North China Plain (NCP) as an example of this method.The NCP is influenced by heavy and complex pollution, which shows strong characteristics of continental aerosols.Mineral dust and marine particles are not considered in this study.
The paper is structured as follows.The filed campaign and in situ measurements are introduced in Sect.2.1.Section 2.2 briefly introduces the simulations to calculate CCN number concentrations, backscatter, and extinction coefficients from in-situ-measured microphysical and chemical data.The new CCN retrieval method for MWRLs is described in Sect.3.1 in detail.Sensitivity of the method to the systematic and random errors of backscatter, extinction, and RH is tested in Sect.3.2.Results and discussions are given in Sect. 4. Section 5 summarizes the paper.

Data
Since it is not easy to accumulate large datasets of simultaneous measurements of lidar and aircraft, ground-measured aerosol microphysical and chemical data are used to simulate lidar-derived backscatter and extinction coefficients and corresponding CCN number concentrations.The simulations are based on κ-Köhler theory and Mie theory.The required datasets include particle number size distribution (PNSD), black carbon (BC) mass concentrations (m BC ), mixing state of BC-containing particles, and size-resolved hygroscopicity.The simulation results are used to establish and validate the new retrieval method.

Datasets of aerosol microphysical and chemical properties
In-situ-measured aerosol properties were collected from five field campaigns at three different measurement sites in the NCP.The measurement sites are located at Wuqing (39 • 08 E, 51 m a.s.l.) in Hebei Province.The specific locations, topographical information, and pollution status of these measurement sites are shown in Fig. S1 in the Supplement.These three sites all lie inside the polluted NCP region and are highly representative of the polluted background (Xu et al., 2011;Bian et al., 2018;Sun et al., 2018).Time periods, measured parameters, and corresponding instruments of the individual campaign are listed in Table 1.During these field campaigns, except measurement for size-resolved chemical compositions, ambient particles were drawn in through a PM 10 inlet (16.67 L min −1 ), passed through a silica gel diffusion drier, and then were split into different instruments.All instruments were operated at RH less than 30 %.
The particle number size distributions (PNSDs) were measured with the combination of a twin differential mobility particle sizer (TDMPS, IfT, Leipzig, Germany) or a scanning mobility particle size spectrometer (SMPS) and an aerody-namic particle sizer (APS, TSI, Inc., Shoreview, MN USA, model 3320 or model 3321).The statistical information about the measured PNSDs is shown in Fig. 1a.The peaks of the PNSDs are at about 100 nm (diameter in log scale), which shows strong characteristics of continental aerosols.
The black carbon (BC) mass concentrations (m BC ) were measured by a multi-angle absorption photometer (MAAP, Thermo, Inc., Waltham, MA USA, model 5012).As for mixing states of BC, BC and other non-absorbing compositions were found to be both externally mixed and core-shell mixed during the campaigns (Ma et al., 2012).The mass fraction of externally mixed BC (r ext ) is defined to quantify the mixing states of BC: where m ext_BC is the mass concentration of externally mixed BC.According to Ma et al. (2012), r ext can be retrieved from hemispheric backscattering fractions (HBFs) measured by an integrating nephelometer (TSI, Inc., Shoreview, MN USA, model 3563).Size-resolved chemical compositions all come from campaign C2.The size-resolved aerosol sampling was carried out with a 10-stage Berner low-pressure impactor (BLPI).Chemical species including inorganic ions (NH + 4 , Na + , K + , Mg 2+ , Ca 2+ , NO − 3 , SO 2− 4 , Cl − ), elemental carbon, organic carbon, water-soluble organic carbon, and some other species such as dicarboxylic acids were analyzed from sample substrates.After transforming the ambient wet aerodynamic diameters into dry volume-equivalent diameters, size-resolved κ distributions were derived from measured size-resolved chemical compositions.The chemical compositions are found to be size dependent during campaign C2, especially the mass fraction of organic matter (Liu et al., 2014).A total of 25 typical size-resolved κ distributions in the NCP are given in Fig. 1b.The measured size-resolved κ distributions vary a lot and cover a wide range of aerosol hygroscopicity (Kuang et al., 2018).More details about the measurements can be found in Liu et al. (2014).

Datasets of CCN number concentrations and lidar-derived optical properties
In-situ-measured aerosol properties mentioned above are utilized to calculate CCN number concentrations and particle backscatter and extinction coefficients based on κ-Köhler theory and Mie theory.For each simultaneously measured PNSD, m BC , and r ext (16 183 sets of data), simulations are carried out with every one of the 25 size-resolved κ distributions.
CCN number concentrations can be calculated with PNSD and size-resolved κ distributions based on the κ-Köhler equation.Petters and Kreidenweis (2007)   RH with a single hygroscopic parameter κ: where D dry is particle dry diameter, σ s/a is the surface tension of the solution-air interface, M w is the molecular weight of water, R is the universal gas constant, T is temperature, and ρ w is the density of water.For a specific supersaturation, critical activation diameter can be derived with the κ-Köhler equation using size-resolved κ distributions.CCN number concentrations can thereby be calculated by integrating number concentrations of particles larger than the critical diameter.CCN number concentrations at the supersaturations of 0.07 %, 0.10 %, 0.20 %, 0.40 %, and 0.80 % are accordingly simulated.The selected supersaturation ratios are widely used in CCN measurements.
Particle backscatter and extinction can be calculated with PNSD, m BC , and r ext using Mie models.Mie theory can solve light-scattering problems of homogeneous and coated spherical particles.Without the consideration of mineral dust, us-ing the Mie model is quite reasonable because particles are likely to be spherical near clouds where the RH could be relatively high.When simulating particle backscatter and coefficients, PNSD, m BC , r ext , and the complex refractive index are needed.PNSD at different RHs can be calculated with the κ-Köhler equation as well.The refractive indices of BC, the non-absorbing component, and pure water are set to be 1.8 + 0.54i (Ma et al., 2012), 1.53 + 10 −7 i (Wex et al., 2002), and 1.33 + 10 −7 i, respectively.Backscatter coefficients (355, 532, and 1064 nm) and extinction coefficients (355 and 532 nm) at dry conditions and RH from 60 % to 90 % are simulated with an interval of 1 %.
The simulations are introduced in detail in Sect.S3 in the Supplement.The new method and all the analyses in this paper are based on the Mie-model-simulated datasets, and all the simulations mentioned above are implemented.

Methodology
3.1 Method to retrieve CCN number concentrations using MWRL

Overview
An optically related CCN activation ratio, AR ξ , is introduced to bridge the gap between CCN and lidar-derived optical properties.AR ξ is the ratio between CCN number concentration and backscatter or extinction coefficient, which can be expressed as where N CCN is the CCN number concentration, and N aerosol is the total number concentration of aerosol particles.AR ξ can be divided into two parts: one is the ratio of CCN to the total particles, which is the origin definition of CCN activation ratio; the other is the ratio of total number concentration to backscatter or extinction at dry conditions.Bulk CCN activation ratio is related to particle size distribution and hygroscopicity, and the relationship between particle number concentration and optical properties is mainly controlled by size distribution.Therefore, AR ξ could be quantified with size and hygroscopicity information.The key point of our method is to seek parameters that can indicate size and hygroscopicity of particles from lidar measurement and use these parameters to estimate AR ξ .In addition, deriving backscatter and extinction coefficients at dry conditions is also important.
A schematic diagram of the method to retrieve CCN number concentration is shown in Fig. 2.
Firstly, enhancement of backscatter and extinction coefficients with RH (also called humidogram) is derived from lidar measurements and additional ancillary data (i.e.pressure, temperature, RH profiles).Humidogram parameter which can indicate particle hygroscopicity can be fitted from humidograms with parameterization equation.Particle dry backscatter and extinction can also be inferred from the humidograms.This step is applied to all the 3β +2α parameters.The approaches to select appropriate hygroscopic layers and fit humidogram parameters, dry backscatter, and dry extinction are described in Sect.3.1.2.
Then, Ångström exponent (å) and lidar extinction-tobackscatter ratio (lidar ratio, s a ) are calculated from inferred dry backscatter and extinction coefficients.Extinctionrelated Ångström exponent (å α ) is the most commonly used parameter to reveal information about the predominant size of aerosols.Generally speaking, a smaller å α represents more large particles.Similarly, backscatter-related Ångström exponent (å β ) is often employed in lidar analysis (Fernández et al., 2015), and particle backscatter coefficients of different wavelengths have also been proven to have a valid Ångström exponent relationship (Komppula et al., 2012).Ångström exponent of dry backscatter and extinction coefficients (å ξ ) be- tween two wavelengths can be derived using Eq. ( 5): where the subscripts 1 and 2 represent different wavelengths.
Another widely used parameter to express aerosol characteristics in lidar studies is the particle lidar extinction-tobackscatter ratio (lidar ratio, s a ), which is defined as the ratio of extinction coefficient to backscatter coefficient at a specific light wavelength: As is shown in Eq. ( 6), lidar ratio is determined by the scattering phase function at 180 • P (π ) and the single-scattering albedo ω.P (π ) is mainly influenced by particle size and ω indicates the content and mixing state of light-absorbing components.Lidar ratio is often utilized in aerosol type classification and is proven to be very sensitive to particle sizes (Zhao et al., 2017).The lidar ratio can provide information on particle type and also serve as a proxy for particle hygroscopicity.Therefore, lidar ratio of dry particles could be a reliable parameter to estimate AR ξ .Next, å ξ , s a , and humidogram parameters are utilized to estimate AR ξ .AR ξ of all the 3β + 2α parameters is calcu-lated.Statistical relationships among humidogram parameters å ξ , s a , and AR ξ are used in our new method.The estimation of AR ξ is introduced in Sect.3.1.3in detail.The implementation of å ξ and s a is quite similar to the microphysical inversion process for particle size distribution retrieval.Microphysical inversion is a physics-based approach but will bring large uncertainties in retrieving particle number concentrations.Constraining AR ξ directly with a statistical relationship is a much more simple and straightforward way.
Finally, after AR ξ values of backscatter and extinction at different wavelengths are derived, CCN number concentration can be calculated by multiplying AR ξ by the corresponding ξ dry .The average value of CCN concentrations calculated by different ξ dry is the final retrieval result.

Derivation of humidogram parameters, dry backscatter, and dry extinction from lidar measurement
A constraint needs to be satisfied when quantifying the enhancements of backscatter and extinction coefficients with lidar measurements.The selected vertical layers must be wellmixed, so we can guarantee that the variations in particle backscatter and extinction coefficients are caused by different RH and not by various aerosol types or loads.Atmospheric vertical homogeneity is fulfilled if the layer has little variability of virtual potential temperature profile and water vapor mixing ratio profile (Lv et al., 2017).Additional analyses can also be considered to evaluate vertical mixing of air masses, such as backward trajectory, horizontal wind velocities at different altitude, or the third moment of the frequency distribution of vertical wind velocities (Bedoya-Velásquez et al., 2018).
Once vertical homogeneity is ensured, physical and chemical properties at dry conditions can be assumed to be uniform in the selected layer, and the number concentrations are proportional to air molecule number density.Accordingly, the relative variations in particle backscatter and extinction coefficients against different RHs can be achieved after normalizing the backscatter and extinction coefficients with air molecule number density.
Humidogram parameterization is needed to find a representative parameter for the relationship between enhancement factor and RH.Unlike in-situ-controlled RH measurements, there is no such generic reference RH for dry conditions for lidar measurements to derive enhancement factor.Inferring backscatter and extinction coefficients at dry conditions (ξ dry ) is also an important issue in CCN retrieval.Therefore, humidogram parameterization of lidar-derived optical properties should combine ξ dry and f ξ (RH, λ) together.
Many equations to parameterize enhancement factors have been proposed by previous studies (Titos et al., 2016).Two one-parameter equations are selected to test their performance on estimating ξ dry and representing particle hygroscopic growth characteristics.The first equation is the most commonly used one initially introduced by Kasten (1969): where the exponent γ ξ is the fitting parameter and describes the hygroscopic behavior of the particles; the other equation is proposed based on physical understanding by Brock et al. (2016), which has been reported to have better performance in describing light-scattering enhancement factor than Eq. ( 7) (Yu et al., 2018): where κ ξ is the fitting parameter and shows significant correlation with bulk hygroscopic parameter κ (Kuang et al., 2017).Here, Eqs. ( 7) and ( 8) are denoted as the γ equation and κ equation, respectively.With given backscatter and extinction at different RHs, ξ dry and γ ξ or κ ξ can be fitted simultaneously by means of least squares.
Comparisons between the performances of the γ equation and κ equation on inferring backscatter and extinction at dry conditions are carried out to select a better parameterization.Four RH ranges (60 %-90 %, 60 %-70 %, 70 %-80 %, and 80 %-90 %) are selected.The fitted ξ dry values are compared with the ξ dry calculated by the Mie model.The slopes of linear regressions, determination coefficients (R 2 ), and relative errors are listed in Table 2. Apparently, the κ equation has a better performance than the γ equation for all RH ranges.Inferring ξ dry with the γ equation will underestimate by about 10 %-30 %.It is consistent with the finding of Haarig et al. (2017) that the γ equation does not hold for RH lower than 40 %.The bias of backscatter is found to be larger than the bias of extinction.
The RH range of humidogram equations also influences the fitting results.Table 2 shows the fitted ξ dry values have larger bias when the value of RH increases.The fitted humidogram parameters γ ξ and κ ξ from different RH ranges are compared to each other, and the results are displayed in Table 3. Parameterization equations are not always perfect for the whole RH range, so humidogram parameters fitted with various RH ranges can be different.If γ ξ and κ ξ are used to represent hygroscopic behavior of particles, more careful attention should be paid to the RH ranges.
Based on the comparisons above, Eq. ( 8) (κ equation) is selected as our humidogram equation to derive ξ dry and κ ξ .The RH range for parameter fitting used is fixed to 60 %-90 % in the following method.

Estimation of AR ξ
Ångström exponents, lidar ratios, and optical humidogram parameters κ ξ are used to estimate the optically related activation ratio AR ξ .Concerning that the Ångström exponents and lidar ratios are not independent of each other (any parameter can be calculated from other parameters), we reduce the number of parameters to a sufficient number to represent all the information.The selected nine parameters are listed in Table 4.One possible way to seek the relationship between the nine parameters and AR ξ is to build a lookup table, but too many input parameters would make the lookup table too large to build and operate.
In the past few decades, machine learning has been a field that has developed rapidly, which experiences a very wide range of applications (Grange et al., 2018).Compared to traditional statistical methods, many machine learning techniques are nonparametric and do not need to fulfill many assumptions required for statistical methods (Immitzer et al., 2012).Random forest (RF) is an ensemble decision tree machine learning method that can be used for regression.(Breiman, 2001;Tong et al., 2003).In addition to the free restraints on input parameters and assumptions, RF also has the advantage of being able to explain and investigate the learning process (Kotsiantis, 2013)  Particle dry lidar extinction-to-backscatter ratio at 355 nm s a532 Particle dry lidar extinction-to-backscatter ratio at 532 nm å α355&532 Ångström exponent of particle dry extinction coefficients between 355 and 532 nm å β532&1064 Ångström exponent of particle dry backscatter coefficients between 532 and 1064 nm RF model, and the AR ξ values of 3β + 2α are the output parameters.
Some tuning parameters required by the RF model need to be specified by users.Experiments are made to determine the optimal values of the tuning parameters.Experiment results are showed in Fig. S7 in the Supplement and the detailed settings of the RF model are listed in Table S2 in the Supplement.In this case, the results are rather insensitive to the tuning parameters.Data simulated with datasets measured from campaigns C1-C4 are utilized as the training data, and those from C5 are used as test data.

Sensitivity test
Both systematic and random errors exist in lidar-retrieved backscatter and extinction coefficients (Mattis et al., 2016).Systematic errors in backscatter and extinction can come from instrumentation setup, data processing method, and retrieval algorithm.Sensitivity test is carried out to test the impact of systematic errors of backscatter and extinction on CCN retrieval.Errors in backscatter or extinction influence the value of Ångström exponents and lidar ratios.The errors of individual backscatter or extinction are considered to be independent, though systematic errors of different parameters are related.The systematic errors are given in the range of −20 % to 20 % with an interval of 2 %.In each test, the error is only applied to one parameter, and other parameters are error-free.
RH is another crucial factor in this new method to retrieve CCN.Profiles of RH derived by remote-sensing techniques are also influenced by errors.At present, RH profiles are usually obtained with the combination of temperature from microwave radiometer and water vapor mixing ratio from MWRL.Both measurements can cause systematic and random errors in RH (Bedoya-Velásquez et al., 2018).Errors in RH will influence the values of ξ dry and κ ξ , which in turn influence all the nine input parameters.Systematic errors ranging from −10 % to 10 % in intervals of 1 % are considered for RH.
Random errors in observations can be reduced by temporal averaging but cannot be eliminated.The influence of random errors in backscatter, extinction, and RH on CCN retrieval are investigated with the Monte Carlo method.Three sets of sensitivity tests for random errors are conducted.Errors obeying Gaussian distribution are generated randomly with the mean value of zero.The standard deviation of Gaussian distribution is fixed at 10 % for backscatter and extinction, and the standard deviation of RH is set to be 5 %, 10 %, and 20 % for each test.The procedure is repeated 2000 times.All the 80 575 sets of data from campaign C5 are used for sensitivity test.
4 Results and discussions 4.1 Supersaturations for lidar CCN retrieval CCN number concentrations are related to supersaturations.Critical diameters of each supersaturations calculated with 25 size-resolved κ distributions are shown in Fig. 3a.Most of the critical diameters at a supersaturation of 0.07 % are larger than 200 nm, while critical diameters at a supersaturation of 0.80 % are around 50 nm.Suitable supersaturations for lidar CCN retrieval depend on the ability of lidar optical properties to provide information about number and hygroscopicity of CCN-related sizes.
Size cumulative contributions of particle number of all measured particle size distribution and corresponding calculated backscatter and extinction at dry conditions are also displayed in Fig. 3a.As the cumulative contributions of particle number suggest, particles with diameter less than 100 nm dominate particle number concentrations (over 65 %).However, most backscatter and extinction come from particles larger than 200 nm (around 90 %) and almost 100 % come from particles larger than 100 nm.If critical diameter is small, dry backscatter and extinction are insensitive to particle diameters that contribute to most CCN concentrations.
Size-resolved enhancement contributions of backscatter and extinction are calculated to discuss hygroscopicitysensitive size of optical enhancement factor measurement.The enhancement contribution is defined as the difference between optical cross sections of RH at 90 % and 60 %, and represents the proportion of each size to the enhancement in backscatter or extinction.As is shown in Fig. 3b, the contributions of the extinction enhancements are concentrated in the diameters within 200 to 700 nm, and extinction enhancement at 355 nm is related to smaller particles than that at 532 nm.Similar to particle number, particles with diameters smaller than 100 nm contribute little to the enhancements of both backscatter and extinction.
Figure 3b also shows that different κ ξ values are sensitive to the hygroscopicity of different size.Size-dependent hygroscopicity is important to estimate CCN rather than bulk hygroscopicity information, especially for different supersaturation conditions.One humidogram may indicate the bulk hygroscopicity, but it is the hygroscopicity of small particles that influences CCN number concentrations most.Using κ ξ of all the 3β +2α values can provide some information about the hygroscopicity of small particles.
Comparing sensitive size of optical properties and critical diameters at different supersaturations, 3β + 2α MWRL systems have potential to retrieve CCN number concentrations at supersaturations smaller than 0.20 %.It is not recommended to estimate CCN concentrations using lidar data at supersaturations larger than 0.40 %.

CCN number concentrations retrieved with error-free data
With error-free data as input, the model-predicted extinctionrelated activation ratio at 532 nm (AR α532 ) and the retrieved CCN number concentrations at supersaturations of 0.07 %, 0.10 %, and 0.20 % are compared to the theoretical calculated values.A total of 80 575 pairs of data calculated from campaign C5 are used for verification.The retrieval results are displayed in Fig. 4. The values AR α532 at a specific supersaturation are distributed in a wide range and can span over an order of magnitude, indicating that the relationship between CCN and optical parameters is very complex.According to Fig. 4, all data points are distributed almost evenly on both sides of the 1 : 1 line and the relative errors of most points are within 20 %.The determination coefficients (R 2 ) of CCN concentrations are all larger than 0.97, and the results do not show obvious systematic deviations.The retrieval errors are found to grow with supersaturation.Retrieval results for higher supersaturations (i.e.0.40 % and 0.80 %) are displayed in Fig. S8 in the Supplement.There are larger errors for supersaturations of 0.40 % and 0.80 %.Only 47.76 % of the retrieved CCN number concentrations at a supersaturation of 0.80 % have relative errors less than 20 %.The results demonstrate again that lidars may not be sufficient enough to retrieve CCN number concentrations at supersaturations lager than 0.40 %.

Importance of size-related and hygroscopicity-related parameters
RF models can evaluate the importance of features (input parameters) by calculating the mean decrease impurity (MDI) for each feature among all the trees in the forest.The MDIs and corresponding standard deviations of each parameter at different supersaturations are shown in Fig. 5. Importance of the nine input parameters varies with supersaturations.For 0.07 % and 0.10 %, κ α355 and κ β1064 are the two most important parameters, showing the impact of hygroscopicity on the relationship between CCN and optical properties.For 0.20 %, å α355&532 becomes much more important.Among the nine input parameters, κ ξ values are denoted as hygroscopicity-related parameters, and å ξ values are denoted as size-related parameters.In particular, s a can be regarded as a parameter related to both size and hygroscopicity.As is shown in Fig.  centrations retrieved with and without κ ξ are compared to show the importance of κ ξ .When retrieving CCN without κ ξ , the RF model is also trained with datasets from campaigns C1-C4, but the input data only contain Ångström exponents and lidar ratios.The retrieved CCN concentrations are all compared with datasets from campaign C5, and the results are listed in Table 5. R 2 of retrieved CCN decreases from 0.991 to 0.887 for supersaturations of 0.07 %, from 0.992 to 0.857 for 0.10 %, and from 0.973 to 0.785 for 0.20 %.Retrieval errors also increase overwhelmingly, and there are significant positive systematic biases.Parameters which are derived from backscatter and extinction enhancements, κ ξ , are indispensable parameters in CCN retrieval.

Impact of systematic and random error on CCN retrieval
Figure 6 shows the relative errors of CCN retrieved with systematic errors in backscatter and extinction.Errors of retrieved CCN increase as errors of backscatter and extinction increase, and higher supersaturations are more affected by errors of optical parameters.Errors in extinction coefficients at 355 nm (α 355 ) influence the retrieval results most.On average, a positive relative error of 20 % in α 355 will cause about a 20 % overestimate in CCN number concentrations for supersaturation of 0.07 %, about a 40 % overestimate for 0.10 %, and about a 60 % overestimate for 0.20 %.A negative error of 20 % in α 355 will underestimate CCN concentrations, and the degree of impact is slightly smaller than the positive error.Errors in extinction coefficient at 532 nm (α 532 ) and at 355 nm have the opposite effect on retrieval error.Errors in α 532 do not show a significant impact at supersaturations of 0.07 % and 0.10 %, but an overwhelming effect is found at supersaturations of 0.20 %.It is interesting to note that the errors in backscatter coefficients do not affect the results much.However, in practical applications of MWRLs, the errors in extinction are always much larger than the errors of backscatter.If the error of retrieved CCN concentrations needs to be limited to 20 % at a supersaturation of 0.20 %, the errors of retrieved extinction coefficients should to be controlled within 5 %.
The test result of systematic error in RH is shown in Fig. 7.When RH has a negative systematic error, CCN concentrations are overestimated, and the extent of overestimation increases as the error increases.A negative error of 10 % in RH will overestimate CCN at supersaturations at 0.20 % by about 60 % on average, and the standard deviation is over 60 %.Effects of positive errors in RH are much smaller than negative errors but more complex.The standard deviations 0.07 % 0.991 0.991 −0.8 ± 6.0 0.877 0.866 4.6 ± 26.1 0.10 % 0.992 0.989 0.1 ± 6.3 0.857 0.837 5.9 ± 26.7 0.20 % 1.005 0.973 3.9 ± 9.0 0.860 0.785 11.9 ± 28.1 of retrieval relative error increase with RH error, and the extreme value of the mean retrieval error appears at the RH error of 5 %.Underestimating RH will cause much more errors than overestimation.Great care should be paid to RH profiles if enhancements of backscatter and extinction with RH are utilized.
The relative error of retrieved CCN with random errors is presented in Table 6.The retrieval error does not change significantly as the random error of RH increases.For all the conditions that are tested, the mean values of relative error are below or near zero, and the standard deviations are within 18 %-28 %.The impact of random errors on the nine input parameters is also evaluated and is shown in Fig. 8. Random errors will underestimate κ ξ by 30 %-35 % on average for 5 % RH error, 80 %-85 % for 10 % RH error, and 90 %-95 %  for 20 % RH error.s a355 , s a532 , and å β532&1064 are likely to be overestimated.As the random error of RH grows, the absolute relative error of input parameters will become larger.

Summary
CCN number concentration at cloud base is a crucial and scarce parameter to constrain the relationship between aerosols and clouds.A new method to retrieve CCN number concentrations using backscatter and extinction coefficients  The method is established and verified by theoretical simulations using Mie theory and κ-Köhler theory with insitu-measured particle size distributions, mixing states, and chemical compositions.The values of AR ξ are found to have large variations due to different size distributions and hygroscopicity.Theoretical analyses show that optical properties provided by current 3β + 2α MWRL systems basically contain size distribution and hygroscopicity information of particles with diameters larger than 100 nm, which only fits the critical diameters for supersaturations lower than 0.20 %.Accordingly, CCN number concentrations at supersaturations of 0.07 %, 0.10 %, and 0.20 % are retrieved.The performance of the new method is evaluated with error-free data, and CCN number concentrations at all three supersaturations are in good agreement with theoretical calculated values.
Sensitivity tests are carried out to show the influence of systematic and random errors of lidar-derived optical properties and auxiliary RH profiles on CCN retrieval.Systematic errors in extinction coefficients and RH are found to have a large impact on error in retrieved CCN.Parameters fitted from backscatter and extinction enhancements (i.e.ξ dry and κ ξ ) are significantly influenced by RH.The uncertainty of RH profiles derived by remote-sensing techniques is a major problem in CCN retrieval.Optical properties near cloud base from lidar measurements are always influenced by high RH.Thus, transforming backscatter and extinction coefficients at ambient RH to dry conditions is a must for CCN retrieval, and accurate RH profiles are in high demand.
The importance of humidogram parameters κ ξ is demonstrated by comparing the error of CCN concentration retrieved with and without κ ξ .Neglecting hygroscopicity information contained in backscatter and extinction enhancements will bring huge errors to CCN retrieval by lidars.The performance of two parameterization schemes for backscat-ter and extinction humidograms is evaluated.The κ equation shows better performance on inferring dry backscatter and extinction than the γ equation.The κ equation, therefore, is recommended to describe the hygroscopic behaviors of the backscatter and extinction coefficients from lidar measurements.The fitted hygroscopic parameters are found to be sensitive to fitting RH range when the RH range is limited and relatively high (between 60 % and 90 %).This is an extreme essential problem for current research for aerosol hygroscopicity with lidar measurements.Great care should be paid to the RH range when evaluating the hygroscopic growth of the lidar-related optical properties.
It should be noted that the theoretical analyses in this paper are based on datasets of continental aerosols, and the implementation of Mie theory also limits the scope of the results.The results can be applied in the North China Plain but are not fit for sea salt and mineral dust.Studies with datasets of other aerosol types should be carried out in the future.Although the applicability of this new method is limited by large uncertainties in RH profiles, comparison between real measured MWRL data and airborne in situ measurement should also be conducted.
This work furthers our understanding of the relationship between CCN and aerosol optical properties and providing an optional way to retrieve CCN number concentration profiles from lidar measurements.The newly proposed method has the potential to provide long-term CCN at cloud base for aerosol-cloud interaction studies.Data availability.All of the datasets from field measurement and the corresponding simulated datasets can be obtained from the repository with the doi https://doi.org/10.5281/zenodo.3255086(Tan et al., 2019).

Figure 1 .
Figure 1.(a) Box plot of particle number size distributions (PNSDs) in the datasets from five field campaigns.Each PNSD is normalized by its maximum value at the peak diameter.Green markers "+" represent the mean value of each diameter.The boxes extend from the lower to upper quartile values, with orange lines at the median.The whiskers extend from the box to the minimum-maximum values or extend from the box by 1.5 times the interquartile range.The flyers are not shown in the plot.(b) A total of 25 typical size-resolved κ distributions.Each dotted line with color represents one size-resolved κ distribution.The solid black line represents the mean value of the size-resolved κ distributions.

Figure 2 .
Figure 2. Schematic diagram of newly proposed method to retrieve cloud condensation nuclei number concentrations using multiwavelength Raman lidar.

Figure 3 .
Figure 3. (a) Cumulative contributions (accumulate from large particle size to small particle size) of particle number concentrations (measured), dry particle backscatter coefficients (simulated), and dry particle extinction coefficients (simulated).The solid and dashed lines represent the median values of five field campaigns, and the shadows cover from the lower to upper quartile values.The box plots in brown contain statistical information about critical diameter of each supersaturation condition (right y axis).The boxes extend from the lower to upper quartile values, with lines at the median.The whiskers extend from the box to the minimum-maximum values or extend from the box by 1.5 times the interquartile range.The markers "o" are the flyers.(b) Normalized size-resolved enhancement contributions when relative humidity increases from 60 % to 90 %, which are theoretically calculated by the mean particle number size distribution, the mean black carbon mass concentration (4.717 µg m −3 ), the mean mass ratio of externally mixed black carbon (0.664 %), and the mean size-resolved κ distribution.

Figure 4 .
Figure 4. Comparison of the theoretical calculated extinctionrelated CCN activation ratio at 532 nm (true AR) and the modelpredicted extinction-related CCN activation ratios at 532 nm (retrieved AR) at supersaturations of (a) 0.07 %, (c) 0.10 %, and (e) 0.20 %, and comparison of the theoretical calculated CCN number concentrations (true CCN number concentration) and the retrieved CCN number concentrations at supersaturations of (b) 0.07 %, (d) 0.10 %, and (f) 0.20 %.A total of 80 575 pairs of data calculated from campaign C5 are used.The solid line is the 1 : 1 line, and the dashed lines are 20 % relative difference lines.Colors represent the relative density of the data points normalized by the maximum data density of each panel.The relative error shown in the figure is mean value ± 1 standard deviation.

Figure 5 .
Figure 5. Importance of each feature (input parameter) output by the random forest model for predicting optically related CCN activation ratios at supersaturations of (a) 0.07 %, (b) 0.10 %, and (c) 0.20 %.The values of feature importance indicate the decrease in impurity for each feature.The length of the bar represents the mean values among all trees and the error bars give the standard deviations.

Figure 6 .
Figure 6.Relative errors in retrieved CCN number concentrations at supersaturations of (a) 0.07 %, (b) 0.10 %, and (c) 0.20 % as a function of systematic errors in backscatter or extinction.The markers are the mean values, and the error bars denote the standard deviations.

Figure 7 .
Figure 7. Relative errors in retrieved CCN number concentrations at supersaturations of 0.07 %, 0.10 %, and 0.20 % as a function of systematic errors in relative humidity.The markers are the mean values, and the error bars denote the standard deviations.

Figure 8 .
Figure 8. Relative errors in fitted and calculated parameters with 10 % random errors for backscatter and extinction and 5 % (blue), 10 % (orange), and 20 % (green) random error for relative humidity.The dots are the median values, and the error bars denote the 5th and 95th percentiles.The dashed red line marks the position of zero.

Table 1 .
Locations, time periods, parameters, and instruments of five field campaigns.

Table 2 .
Slopes of linear regressions, determination coefficients (R 2 ), and relative errors (RE) between Mie-model-simulated particle dry backscatter or extinction coefficients and those inferred from humidogram functions.A total of 404 575 pairs of the simulations from the in situ dataset are used.The REs are given in the form of mean value ± 1 standard deviation.

Table 4 .
Lidar-derived parameters for predicting optically related CCN activation ratio AR ξ .

Table 5 .
Slopes of linear regressions, determination coefficients (R 2 ), and relative errors (RE) between theoretical calculated CCN number concentrations and CCN number concentrations retrieved with and without κ ξ as input parameter.The relative errors are given in the form of mean value ± 1 standard deviation.

Table 6 .
Mean and 1 standard deviation (SD) values (mean ± SD) of relative errors in retrieved CCN number concentrations at different supersaturations with different random error conditions.The uncertainty of backscatter and extinction coefficients of all the tests is 10 %, and the uncertainties of relative humidity are 5 %, 10 %, and 20 %. % ± 27.8 % −9.1 % ± 26.3 % −5.2 % ± 18.0 % from MWRL measurements is proposed.Enhancements of backscatter and extinction coefficients with RH are implemented to derive dry backscatter and extinction ξ dry and humidogram parameter κ ξ .The ratio of CCN number concentration to dry backscatter or extinction coefficient AR ξ , which is estimated by κ ξ , Ångström exponents, and lidar ratios, is introduced to retrieve CCN number concentrations.