A methodology for investigating dust model performance using synergistic EARLINET / AERONET dust concentration retrievals

Introduction Conclusions References Tables Figures


Introduction
Desert dust is emitted from arid regions around the world, and in many cases it is the dominant aerosol type.Dust aerosols affect the radiation balance and temperature structure of the atmosphere by interacting both with short-and long-wave radiation (Sokolik and Toon, 1996;Pérez et al., 2006b;Balkanski et al., 2007); they also affect cloud microphysical properties and precipitation patterns by acting as cloud condensation and ice nuclei (DeMott et al., 2003;Karydis et al., 2011) and, due to their large spatial and temporal extent, have an important effect on climate (Rosenfeld et al., 2001).The main source regions of dust are located in northern Africa and western and central Asia, but due to the prevalent wind patterns they have significant impact on the air quality of Europe, North America, and East Asia, far away from their sources, affecting the health of large populations (Morman and Plumlee, 2014).Additionally, mineral dust aerosols are suspected to be an important source of soluble iron in the marine ecosystems and, thus, an important factor of marine bio-production (Mahowald et al., 2010;Nickovic et al., 2013;Gallisai et al., 2014).
Given this complexity, dust models are an important tool for studying the complete dust cycle in the atmosphere.Such models simulate dust's lifecycle, including production in arid regions, transport in the atmosphere, and wet and dry deposition (Tegen, 2003).These models simulate the complete 3-D fields of dust concentration and can be used to study the processes and sensitivities controlling the dust distribution and to compute regional and global budgets of dust.Dust models have been used, for example, to quantify the effect of dust on air quality of Mediterranean cities (Jiménez-Guerrero et al., 2008), to study the effects of dust on weather forecasts (Pérez et al., 2006b), and to quantify the impact of lofted dust particles on cloud formation (Klein et al., 2010;Solomos et al., 2011).To perform these simulations, models rely on the physical description of atmospheric processes, on the choice of parameterization, and on the tuning of individual components in the model; consequently, modeling outputs need to be regularly tested against in situ and remote sensing measurements to evaluate their performance.When used as a forecasting tool, models can assimilate remote sensing measurements to improve their forecasting skill (Benedetti et al., 2009;Sekiyama et al., 2010;Wang et al., 2014).
Dust model evaluations typically include a combination of surface concentration, deposition fluxes, and remote sensing measurements (e.g., Basart et al., 2012b;Gama et al., 2015).On the remote sensing side, evaluations typically rely on observed columnar aerosol properties.For example, a typical quantity used is aerosol optical depth (AOD) measured by the Aerosol Robotic Network (AERONET) photometers or satellite platforms such as the Moderate Resolution Imaging Spectroradiometer (MODIS) (e.g Pérez et al., 2011;Basart et al., 2012b).In these comparisons, the modeled dust volume concentration is converted to dust optical depth using spherical particle approximation and a modeled size distribution.These evaluation attempts are limited by the contribution of non-dust aerosols, and so are restricted to cases or regions where dust is the dominant aerosol type (e.g., Basart et al., 2009;Cuevas et al., 2014).Usually, the dust vertical distribution is not examined even though it may affect the model performance in many aspects.An accurate representation of dust vertical structure is needed to model dust transport and deposition processes, to capture the effects of dustradiation and dust-cloud interactions, and to properly produce air quality forecasts (e.g., Wang et al., 2014).
The vertical distribution of dust over Europe has been studied using active remote sensing instruments such as lidars (e.g., Ansmann et al., 2003;Papayannis et al., 2005Papayannis et al., , 2008)).Lidars directly measure profiles of total aerosol optical properties, i.e., backscatter and extinction coefficients, and such measurements have been used to examine dust model performance.Many such examinations have focused on a limited number of case studies (e.g., Pérez et al., 2006a;Uno et al., 2006;Müller et al., 2009;Heinold et al., 2009).In other studies, long-term observation of aerosol optical properties have been compared with modeled dust optical profiles.For example, Mona et al. (2014) have presented a systematic examination of BSC-DREAM8b (Barcelona Supercomputing Center -Dust REgional Atmospheric Model 8 bins) modeled dust distribution over Potenza, Italy, for the 2000-2012 period, using lidar-derived backscatter and extinction profiles.Similarly, Gobbi et al. (2013) compared the lidar dust extinction profiles with those modeled by BSC-DREAM8b over Rome, Italy during the 2001-2004 period.Results from these studies indicate that the dust models adequately represented the vertical distribution of dust despite underestimating the total extinction profiles.However, these studies compare modeled dust properties to total aerosol properties, as they do not separate the contribution of dust from other atmospheric aerosols, like smoke and pollution.In most cases no comparison can be made in the Planetary Boundary Layer (PBL) where the load of fine anthropogenic aerosols is always expected to be high, especially in the majority of measurement sites in Europe.Depolarization lidars can overcome this problem by separating dust to non-dust aerosol backscatter coefficient, based on known depolarization ratios of dust and other aerosol types (Shimizu et al., 2004;Tesche et al., 2009) but these techniques have been used only in few model evaluation studies (e.g., Heinold et al., 2011).
An alternative strategy for dust model comparison is based on the conversion of lidar backscatter signals to total aerosol volume concentration using scattering simulations (e.g Barnaba andGobbi, 2001, 2002).Such an approach was used to examine the performance of three dust transport models using 34 elastic lidar profiles over Rome, Italy, for the 2001-2003 period (Kishcha et al., 2005(Kishcha et al., , 2007)).
Recently, a number of newly developed algorithms are using the synergy of lidar and sun/sky photometer data to retrieve dust concentration profiles (e.g., Ansmann et al., 2012;Lopatin et al., 2013;Chaikovsky et al., 2015).Such algorithms can separate the contribution of dust from that of other aerosol types, so they can be used to examine the dust model performances even in cases where the dust particles are mixed with smoke, for example.These products are based on indirect observation of the aerosol size distribution -instead of relying on a modeled size distribution -further improving the results.Up to now, the comparison of these algorithms with models has been restricted to single cases; for example, Tsekeri et al. (2013) presented a case study where the output of BSC-DREAM8b model was compared with dust concentration retrieved using the Lidar/Radiometer Inversion Code algorithm (LIRIC) over Athens, Greece, finding satisfactory agreement.These algorithms have been implemented in many European lidar stations, opening new possibilities for dust observation on a continental scale.
In this paper, we propose a strategy for cross-examining modeled dust concentration profiles and profiles retrieved using such lidar/sun-photometer synergy.As an example, we use an observation data set produced with the LIRIC algorithm.The recent implementation of LIRIC in many advanced European Aerosol Research Lidar Network (EAR-LINET) remote sensing stations (Chaikovsky et al., 2012) allows the systematic examination of model performance in a wider geographical region.In this paper we present a general methodology for comparing measured and modeled vertical dust concentration, including the strategies that could be used, the caveats that should be taken care of, and suggest the appropriate metrics that could help explore the data set.Next, we apply this methodology to compare dust concentration profiles retrieved at 10 European remote sensing sites to 4 European regional dust transport models.
The four models that participate in this inter-comparison are BSC-DREAM8b v2, Nonhydrostatic Multiscale Meteorological Model on the B grid/Barcelona Supercomputing Center -Dust (NMMB/BSC-Dust), DREAMABOL, and Dust REgional Atmospheric Model -Nonhydrostatic Multiscale Meteorological Model on the E grid -Monitoring Atmospheric Composition and Climate (DREAM8-NMME-MACC).All four models contribute to the Sand and Dust Storm Warning Advisory and Assessment System (SDS-WAS) that was established by the World Meteorological Organization (http://www.wmo.int/sdswas).The SDS-WAS aims to improve the present capabilities for reliable sand and dust storm forecasts; to do this it supports the development of comprehensive, coordinated and sustained observations and modeling capabilities of these events.The SDS-WAS consists of two regional nodes, one for northern Africa, the Middle East and Europe (NA-ME-E) -set in Spain, and one in Asia -set in China; each of these nodes deals with both operational and scientific aspects related to atmospheric dust monitoring and forecasting.All the models participating in the present study contribute to the NA-ME-E regional node.
Remote sensing profiling measurements can be used to improve dust modeling efforts at three different levels: diagnostic evaluation, near-real-time (NRT) evaluation, and assimilation (Seigneur et al., 2000;Sicard et al., 2015;Wang et al., 2014).In this study, we focus on the diagnostic evaluation of the model performance.We choose to study an extended time and space period that gives us better representation of different meteorological conditions, dust transport paths, and measurement locations.However, the considerations and metrics presented here can also be applied to the NRT evaluation scenario.
The rest of the paper is structured as follows.In Sect. 2 we present the EARLINET and AERONET remote sensing networks, we provide an overview of the new retrieval algorithms, such as LIRIC, and present the four dust models used in this study.In Sect. 3 we introduce the methodology of the cross-examination, and present the appropriate statistical indicators that can be used for future evaluation of dust models.Finally, in Sect. 4 we present the results obtained by applying this methodology to real measurements.In Sect. 5 we give conclusions and indicate directions for future work.

Measurement networks
The systematic observation of the vertical distribution of dust on a continental scale is possible due to the development of regional lidar remote sensing networks in main dust outflow regions like the European Aerosol Research Lidar Network (EARLINET, Pappalardo et al., 2014), the AD-Net in East Asia (Sugimoto et al., 2005), the Latin American Lidar Network (LALINET) in Latin America (Barbosa et al., 2014;Guerrero-Rascado et al., 2014), and the global Micropulse Lidar Network (MPLNET, Campbell et al., 2002).This study focuses on EARLINET, a lidar network that was established in 2000 with the aim of providing comprehensive information for the aerosol vertical distribution over Europe (Bösenberg et al., 2001).Currently, 27 stations participate actively in the network with regular contribution of data.The network includes 17 stations with multi-wavelength Raman systems, while 18 stations perform depolarization measurements, giving important information on the shape of the measured particles.All stations in the network perform climatological measurements -three times a week according to a predefined measurement schedule -together with extra measurements in special events, dust measurements based on an alerting system, and intensive observational measurement campaigns (Pappalardo et al., 2014).Considerable attention has been given within EARLINET to improve and homogenize the performance of the systems, including hardware tests, algorithm tests on synthetic data, and system intercomparison campaigns (Matthias et al., 2004;Böckmann et al., 2004;Pappalardo et al., 2004).The optical products calculated from all the systems are stored in a standardized data format in a central database and are available for external users.The first volumes of the EARLINET database have been published in biannual volumes at the World Data Center for Climate (The EARLINET publishing group [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009][2010]2014).Similarly, regional-to-global sun/sky photometer networks like Aerosol Robotic Network (AERONET, Holben et al., 1998), Global Atmosphere Watch -Precision Filter Radiometer network (GAW-PFR, McArthur et al., 2003), Skyrad Network (SKYNET, Takamura and Nakajima, 2004;Kim et al., 2008), and the China Aerosol Remote Sensing Network (CARSNET, Che et al., 2009) have also been developed.Many of these instruments are collocated with lidar system of the corresponding lidar networks, thus allowing the development of synergistic algorithms.In this study, we use AERONET, a global network of automatic sun/skyscanning photometers that was created in the mid 90s in order to provide global aerosol data not provided at the time by satellites and to act as a validation platform for future satellite missions.Its current aim is to provide long-term, continuous measurements of columnar aerosol optical and microphysical properties.The network consists of standardized photometers produced by Cimel Electronique and all participating instruments undergo regular calibration and intercomparison with reference instruments.The photometers in the AERONET network perform both direct-sun and skyscanning almucantar measurements at several wavelengths (between 340 and 1640 nm).The output of direct-sun measurements is the AOD in several wavelengths, while the skyscanning measurements are also used for retrieving aerosol microphysical properties (Dubovik and King, 2000;Dubovik et al., 2006).The processing is centrally performed and the results are made public in near-real time.

Retrieval algorithms
As described in the introduction, a new class of algorithms can retrieve dust volume concentration profiles utilizing lidar profiling measurements and sun/sky photometer data.The output of these algorithms is the vertical concentration of a number of separate aerosol types.In these algorithms, dust microphysical properties are neither assumed a priori nor are derived from model outputs, but are based on photometer measurements or known properties of pure dust.In this way, they address a core issue of model evaluation from remote sensing measurements: dust transport models simulate mass concentration while the main measured quantities of remote sensing instruments are optical aerosol properties; a conversion is always necessary to make the two quantities comparable.When the conversion is made on the model side, the model's mass concentration is converted to extinction profiles using a predefined volume-to-extinction ratio.If the dust transport model treats the dust size distribution in a realistic way, e.g., separating the dust concentration in many different size bins, a better conversion can be achieved using forward scattering calculations (typically based on Mie theory).The use of the synergistic algorithms allows to directly compare the retrieved volume concentration profiles to model output, removing from our study an extra factor of uncertainty.
In this work, we will use the LIRIC algorithm as an example to demonstrate the proposed methodology.LIRIC is used in many European remote sensing stations and takes full advantage of the remote sensing networks EARLINET and AERONET.The results we present are, nevertheless, applicable to similar data sets retrieved by other algorithms.Before presenting the algorithm's details, we present a brief overview of this class of algorithms to make clear in what aspects LIRIC can be considered a representative example.
Volume retrieval algorithms fall in two broad categories.The first category uses lidar measurements and intensive op-tical properties of some aerosol types to retrieve the concentration of these types in the atmosphere.The used aerosol intensive properties can be derived from past observations, laboratory measurements, model data or a combination of the above.When the range of such input values is too wide for a reliable retrieval, photometer measurements are sometimes used as a proxy for the missing parameter.For example, the polarization lidar photometer networking (POLIPHON) algorithm (Ansmann et al., 2011(Ansmann et al., , 2012) ) is based on dust depolarization and extinction-to-backscatter coefficient ratio (aerosol lidar ratio) observed during the Saharan Mineral Dust Experiment (SAMUM) and long-term EARLINET measurements of dust transport events over Europe.In addition, POLIPHON uses the volume-to-AOD ratio derived from the photometer to approximate the variable volumeto-extinction ratio for dust and smoke aerosols.Extending this approach, Mamouri and Ansmann (2014) use laboratory measurements of fine and coarse dust depolarization ratio to further separate these two sub-classes of dust.In a similar approach, Nemuc et al. (2013) derive the volume-to-extinction ratio of different aerosol types from the Optical Properties of Aerosols and Clouds (OPAC) database (Hess et al., 1998).Other approaches combining lidar measurements with airborne measurements or complex AERONET processing have also been developed (Cuesta et al., 2008;Lewandowski et al., 2010).
The second category of algorithms pursues a more tight integration of lidar and photometer data.Specifically, the volume concentration profiles are calculated to optimally fit the lidar and photometer measurements (Dubovik, 2005).In the case of the Generalized Aerosol Retrieval from Radiometer and Lidar Combined data algorithm (GARRLiC, Lopatin et al., 2013), the optimal fit of the lidar and photometer measurements is found using a multi-term least square approach.Similarly, LIRIC (Chaikovsky et al., 2015) uses the AERONET inversion products to derive the intensive properties of fine and coarse aerosols; consequently, the algorithm finds the optimal profiles of these types based on lidar measurements and total-column volume concentration profiles provided by AERONET.The higher integration of the photometer and lidar comes with a price.These algorithms require simultaneous lidar and photometer measurements and this limits the available measurements, especially because photometer sky-scanning measurements require a cloud-free conditions and are performed only during daytime.They also typically require more complex lidar systems, performing multi-wavelength measurements, introducing limitations regarding the lidar systems that they could be applied.Moreover, simulating the complete atmospheric column makes the algorithms sensitive to the conditions near the ground, where typical lidar systems cannot observe.On the other hand, their benefit is that they can distinguish coarse spherical and nonspherical particles, separating, for example, dust from marine particles.
In this paper, we use results from the LIRIC algorithm to show the benefit of using such algorithms for dust model evaluation.The details of LIRIC can be found in Chaikovsky et al. (2004Chaikovsky et al. ( , 2012)); Wagner et al. (2013); Chaikovsky et al. (2015) so only a brief overview is given here.
LIRIC uses as input elastic lidar signals at three wavelengths (355, 532, 1064 nm) and aerosol microphysical properties retrieved from the AERONET inversion algorithm.It can optionally use also depolarization measurements at 532 nm.LIRIC assumes that atmospheric particles can be separated in fine, coarse spherical, and coarse spheroid modes.It calculates the microphysical properties of these three modes using the AERONET retrieval of columnar size distribution, refractive index and sphericity.It separates the fine and coarse size distribution by finding the minimum concentration values 0.194-0.576µm range.The algorithm calculates the intensive properties (e.g., volume-to-extinction coefficient) at all lidar wavelengths using the same sphere and spheroid kernel functions as AERONET (Dubovik et al., 2006).Additionally, it calculates the total volume concentration of each mode integrating the size distribution and using the sphericity parameter to separate the coarse-mode volume to spherical and spheroid components.LIRIC assumes that the properties of these modes do not change with altitude, but the concentration of each mode C m (z) can vary freely.The algorithm uses as input pre-processed lidar signals.The signal time series is averaged to achieve good signal-to-noise ratio.The signals are normalized to a reference altitude z n and are also cut at the altitude of full overlap z O .
LIRIC finds the volume concentration profiles C m (z) for the three modes by optimizing (a) the fit to the lidar signals, (b) the fit to the AERONET columnar volume concentration, and (c) user-defined smoothness constraints that act as a regularization parameter.The relative importance of these three constraints is selected by the user through appropriate weighting factors.The optimization is performed using a multi-term least square algorithm.The concentration bellow the full overlap height is considered constant, i.e., C m (z) = C m (z O ) for z < z O .LIRIC's final output are the volume concentration profiles of fine, coarse spherical and coarse spheroid particles.If depolarization measurements are not available, the coarse mode is not separated in two components, and the final output is concentration of only fine and coarse modes.
LIRIC includes several underlying assumptions.First, each aerosol mode is considered to have constant microphysical properties with altitude, and only vary its concentration.In case that two aerosol types are averaged in one mode, e.g., when smoke and urban particles are both present in the atmosphere, this assumption will introduce some errors.When no depolarization measurements are present, and consequently the algorithm does not separate the spherical and non-spherical components, the coarse mode could include both marine and dust particles, but this will affect mainly the PBL.With depolarization measurements available, LIRIC re-trieves the coarse spheroid mode, and this could incorporate more than one aerosol type if the atmosphere includes desert and volcanic dust, or even dust from two very different sources.These cases are rare and will have a small effect in a statistical comparison.We cannot exclude, however, that they can become important for specific cases.Secondly, the aerosol complex refractive index and sphericity parameter are considered to be size-independent, i.e., the same for fine and coarse-mode aerosols.The effect of this assumption on the retrieved volume concentration is not thoroughly studied, but has been addressed in the GARRLiC algorithm (Lopatin et al., 2013).Thirdly, LIRIC assumes that aerosol scattering properties can be represented by the AERONET spherical and spheroid kernels.This assumption could be problematic because the AERONET kernels were not developed to represent the phase function at the backscattering direction.Less importantly, the spheroid particle aspect ratio is adapted to represent coarse-mode dust particles and could be inappropriate for fine-mode particles.The fourth assumption, as mentioned before, is that aerosol below the full overlap height, z O , are well mixed.This will not be true if the PBL height is lower than this altitude.Consequently, the effect of this assumption will depend on the atmospheric condition and will be different from case to case.Finally, if the photometer and lidar measurements are not simultaneous, the retrieval assumes that columnar intensive and extensive aerosol properties did not change between the measurements.Again, the effect of this variability will be different in each case but could be checked using available ancillary measurements e.g., from direct-sun photometer or collocated ceilometer (Wiegner et al., 2014;Madonna et al., 2014).Note that these assumptions will mostly affect the total value of the concentration profiles.The shape of the profile is mostly determined by lidar measurements of spectral dependence of the backscatter and the depolarization coefficient.
A full uncertainty analysis of LIRIC retrievals is still an open topic.The output of LIRIC has been validated against POLIPHON retrievals that do not rely on a specific aerosol model (Wagner et al., 2013); the comparison indicates that the spheroid model that represents non-spherical particles does not induce significant errors in the retrieval.A further source of uncertainties is the choice of user-defined parameters for each retrieval; such parameters include, for example, minimum and maximum altitude, the altitude of an aerosolfree region, and regularization parameters used in the inversion.Granados-Muñoz et al. (2014) show that the retrieval is stable to the choice of these parameters, but further work is needed to generalize these results; in the examples shown in that paper, the result retrieval errors remain below 20 %.

Dust models
Dust transport modeling was a point of intense research since the 1990s and several global and regional models have been developed (Tegen and Fung, 1994;Nickovic and Dobricic, 1996;Benedetti et al., 2014).In this study, we focus on regional transport models setup over the domain of North Africa and Europe; these models are frequently used to predict dust transport over Europe and to explore the effects of dust in the European atmosphere.
As mentioned in the introduction, the four models used for the demonstration of the described methodology are BSC-DREAM8b v2, NMMB/BSC-Dust, DREAMABOL, and DREAM8-NMME-MACC.Being part of the SDS-WAS program, all models undergo near-real-time evaluation against satellite-and ground-based columnar observations.The Dust Regional Atmospheric Model (DREAM; Nickovic et al., 2001) is based on the Euler-type partial differential nonlinear equation for dust mass continuity and is driven by NCEP/Eta.It assumes a viscous sublayer between the smooth desert surface and the lowest model layer (Janjic, 1994;Nickovic et al., 2001).The updated version of the model is the BSC-DREAM8b v2 model (Pérez et al., 2006a, b;Basart et al., 2012b) which is developed and operated at the Barcelona Supercomputing Center, Spain (BSC; http://www.bsc.es/projects/earthscience/BSC-DREAM/).It includes a set of updates, such as an approximation of the dust size distribution by 8 size bin, improved source representation, and updated wet and dry deposition schemes.The model has been extensively evaluated against observations (e.g., Pay et al., 2010;Basart et al., 2012b, a).
The DREAMABOL model is an online integrated regional mineral dust model developed at the Institute of Atmospheric Sciences and Climate, Bologna, Italy, as part of the atmospheric composition and meteorology model BOLCHEM (Mircea et al., 2008;Maurizi et al., 2011).The meteorological component is the BOLAM primitive equation hydrostatic model (Buzzi et al., 2003).The dust model part is inspired by DREAM (Nickovic et al., 2001) but is completely rewritten and includes different assumptions on the model source and on the wet removal (Maurizi and Monti, 2015).DREAM-ABOL provides data to the SDS-WAS since June 2014 and participates since then in the near-real-time evaluation.
The DREAM8-NMME-MACC is developed and operated at the South East European Virtual Climate Change Center (SEEVCCC; http://www.seevccc.rs/),Serbia.The DREAM8 model is embedded in the NCEP Nonhydrostatic Mesoscale Model (NMM) on the E-grid (Janjic et al., 2001), while initial and boundary conditions are taken from ECMWF global forecast.This version of DREAM8 assimilates ECMWF dust analysis in the initial dust field, with dust sources defined from Ginoux et al. (2001).DREAM8-NMME-MACC provides daily dust forecasts available at the SEEVCCC website.
Finally, the NMMB/BSC-Dust model is a regional to global dust forecast system designed and developed at BSC in collaboration with NOAA NCEP, NASA Goddard Institute for Space Studies and the International Research Institute for Climate and Society (IRI) (Pérez et al., 2011).It is an online  multi-scale atmospheric dust model fully embedded into the NMM on B-grid (Janjic et al., 2011).As with DREAM, this model assumes a viscous sublayer between the smooth desert surface and the lowest model layer while it includes a physically based dust emission scheme, which explicitly takes into account saltation and sandblasting processes (White, 1979;Marticorena and Bergametti, 1995;Marticorena et al., 1997).
The NMMB/BSC-Dust model has been evaluated at regional and global scales (Pérez et al., 2011;Haustein et al., 2012).It provides operational dust forecast for the Barcelona Dust Forecast Center (BDFC; http://dust.aemet.es/) the first specialized center of the WMO for dust prediction.
While each model has a different setup, they use common description of dust size distribution using eight size bins between 0.1 and 10 µm (Pérez et al., 2011) with intervals taken from Tegen and Lacis (1996) and Pérez et al. (2006a).Dust within each transport bin is assumed to have a timeinvariant log-normal distribution (Zender et al., 2003) with the shape of the distribution fixed to a mass median diameter of 2.524 µm (Shettle, 1986) and a geometric SD of 2.0 (Schulz et al., 1998).The dust mass in each bin depends on model processes.Many other subcomponents are shared between some of the models.
In the present analysis, various model output fields at 3hourly resolution are compared.The research teams at the modeling centers configured their model experiments independently and not necessarily following the setup of their respectively daily operational forecast.The spatial resolution, domain size, initial and boundary conditions all differ, as do the physical parameterizations implemented in the models summarized in Table 1.

Methodology
In this section we present the considerations for constructing the remote sensing data set and choosing statistical indicators that can be used for the model and measurement crossexamination.Special attention is given in selecting a representative data set, avoiding possible biases due to the geographical restrictions of the measurement location, the selection of vertical resolution, and the effect of local dust sources in the study of the PBL.The considerations that guided our choices are given below.
As discussed in Sect.2.2, synergistic retrieval algorithms avoid possible comparison biases caused by the presence of aerosol mixtures, by separating the dust contribution from that of other aerosol types.However, direct comparison with dust models should be done carefully, because the part of aerosol identified as dust could differ depending on the sewww.atmos-meas-tech.net/8/3577/2015/Atmos.Meas.Tech., 8, 3577-3600, 2015 lected algorithm.Thus, in the case of LIRIC, dust is assumed to be a particle component larger than ∼ 0.5 µm in radius.On the other hand, the total dust load predicted by the models also includes smaller particle sizes in the first few bins of the dust size distribution.The contribution of these small particles in the total aerosol volume should be typically low, especially near the source (d 'Almeida, 1987;Mahowald et al., 2014), but could become more important in few cases of long-range dust transport where the larger particles have been gravitationally removed (Mamouri and Ansmann, 2014).When using a statistical approach, including different locations and transport paths, as in the present study, these few cases are expected to have a small effect on the overall comparison.The exact amount of fine-mode transported dust is an open issue and should be further investigated.The fine-mode contribution, however, is expected to be important when performing a case study evaluation, and then only specific bins from the model output should be used instead.
In the case of statistical model evaluation, the selected measurement profiles should also be independent in order to give a correct representation of the model performance.Specifically, it should be avoided that the used measurements from each station sample the same event multiple times, but should instead measure independent dust transport events.This consideration is less important when using data from automatic instruments; in the case of EARLINET, however, the available data set could contain data from long observations periods and intensive measurement campaigns, as described in Sect.2.1.Ideally, only a climatological data set would be used, but the number of the available cases would be limited from the measurement frequency, the sporadic nature of dust transport episodes, and, when using synergistic algorithms, the availability of AERONET data.In this study, we consider to sample independent dust transport events by measurements that had at least 24 h time difference, compatible with the expected variability of tropospheric aerosols (Anderson et al., 2003a, b).
The vertical resolution of lidar and dust model profiles should be taken into account during their comparison.The lidar signals have a raw vertical resolution of a few meters and the final products have an effective resolution of a few hundred meters depending on filtering procedures and smoothness constraints used in the retrieval (Pappalardo et al., 2004).The vertical resolution of the models, on the other hand, is typically coarser but depends on the vertical resolution of the meteorological driver (Simmons and Burridge, 1981;Mesinger, 1984).When performing a statistical comparison, the different vertical resolutions are less important as the features of individual dust transport cases will be smoothed.When comparing aerosol extensive properties (both optical and concentrations) the remote sensing profiles should be upscaled to the model resolution.When, however, the aim of the comparison is to evaluate the dust-layer geometrical properties and values at a specific location, e.g., the peak concentration values, the finer resolution remote sens-ing profiles should be used.In this study and in order to facilitate the comparison of models of different vertical resolutions, we interpolate all available profiles to a common 100 m vertical resolution.We used this resolution to examine the geometrical properties and peak concentration value of dust layers, but used 500 m averages to calculate the statistics on the vertical profiles presented in the next section.The models simulate the dust concentration profiles on a specified horizontal grid, so bilinear interpolation was used to estimate the concentration values at the exact location for each station.Linear interpolation was also utilized to estimate the concentration profiles at the exact time of the available measurements.
Correct representation of the dust mixing in the PBL can impact the forecasted air quality and also affect the removal processes of dust in the model.In this process, dust is mixed with locally produced aerosols, so lidar optical profiles cannot be used to directly study the dust effect.The mass retrieval algorithms, like LIRIC, are able to separate the dust component in the PBL and give some insights to study this process, even though several limitations remain.Firstly, local dust sources could contribute to the dust load in the PBL (Korcz et al., 2009), although the exact effect of such sources to the vertical dust distribution, to our knowledge, has not been systematically studied.Secondly, as dust comes in contact with other types of particles and high relative humidity, some of the assumptions of the retrieval algorithms could be invalid.For example, it is reasonable to assume that polluted and humid PBL will lead to dust being coated and water layer to form on the dust particles, changing their optical properties (Levin et al., 1996;Kumar et al., 2011b;Perry et al., 2004).Such effect could be important for the exact quantitative characterization of dust but does not completely prevent studying the mixing of dust in the PBL.Lastly, most lidar systems have a high overlap function and can only detect the initial mixing of dust in the upper parts of the PBL.Given these factors, the study of this mixing process could be done better for specific case studies.If a statistical approach is followed, the data set should be large enough to give significant results, as only few profiles cannot capture this dynamical mixing phenomenon.
The direct output of all the synergistic retrieval algorithm mentioned before is volume concentration profiles of fixed aerosol types.This can be converted to mass concentration profiles, the typical output of dust transport models, by using the aerosol bulk density.In the case of dust, the typically used value is 2.6 g cm −3 (Köpke et al., 1997;Ansmann et al., 2012) while the actual bulk concentration could differ by location (e.g., Todd et al., 2007).In the case of dust model evaluation, however, selecting a value of 2.6 g cm −3 is compatible with the assumptions of most dust transport models (Tegen and Fung, 1994;Nickovic et al., 2001;Yumimoto et al., 2012), and thus a further reason for discrepancies is removed from this study.We perform the comparison firstly by examining single statistical indicators of each measurement case and secondly looking into indicators at different altitude ranges.This approach allows assessing both the total performance of the models and the detailed performance across the profile.The single parameters examined are center of mass, total concentration, peak concentration value, and dust-layer thickness.For the profile parameters, apart from the average profiles, we examine the mean bias error, correlation coefficient, root mean square error, and fractional gross error.This set of parameters was chosen because it can provide a detailed view of performance while remaining compatible, as much as possible, with the metrics already in use in the SDS-WAS columnar evaluation.
An important indicator for model vertical profiles is the center of mass (CoM), a parameter that gives in a single number an indication of the altitude of the dust distribution.In cases were a single aerosol layer is present in the atmosphere, the CoM gives an indication of its mean altitude; in case of multiple layers, however, the CoM could be located in areas without any considerable dust load (Mona et al., 2006(Mona et al., , 2014)).
The second single-value measure to compare is the dust total concentration, C, calculated across the altitude range where both measured and model profiles provide valid results.In this way, this comparison will be a little different than comparing directly columnar measurements, as in the case of comparing photometer and total column model values.In the latter case the used range includes the lower few hundred meters of the profile, thus including the contribution of local dust sources to the total column aerosol load, possibly producing a bias in the measurements.
A third metric examined is the peak value of the profile, P .In cases where the main dust mass is located near the ground, the lidar system can fail to detect the true maximum, and instead show a maximum value at the lowest point of the profile, i.e., the first point of full overlap.In these cases we considered as maximum value the first lofted layer peak, located as the first peak after the first local minimum of the concentration profile.
The forth metric examined is the dust-layer thickness, l.It is defined here as the region where dust concentration exceeds a certain limit, here chosen at 5 µg m −3 .In previous studies the layer thickness was defined using the derivative of the lidar signal (e.g., Mona et al., 2014).We use a threshold approach to overcome limitations related to smoothing included in many volume retrieval algorithms.
Based on these metrics, we qualify the performance of each model by calculating the correlation coefficient r and fractional bias F B for all the available cases.To make values more robust, we exclude outliers that could strongly affect these values.Specifically, for each point we calculate the differences between model and observations and exclude points where the difference is more than 4 standard deviations from the mean value.
Figure 1 sketches the steps used to preform the comparison of a model and an observation profile.The example measurements were performed at Potenza, Italy (40.60 • E, 15.72 • N) on 11 April 2011, when a strong lofted dust layer was observed at 3-5 km.The LIRIC retrieval is performed based on input of raw lidar signals and AERONET microphysical retrieval (left plots).The retrieval outputs are volume concentration profiles for fine, coarse spherical, and coarse spheroid modes (center plot).In this specific case, the coarse spherical mode concentration is almost zero at all altitudes.Dust mass concentration profiles are calculated using the retrieved coarse spheroid mode concentration and assuming bulk dust density of 2.6 g cm −3 .These profiles are compared with model profiles that are interpolated at the station location using linear interpolation at the exact time and space (right plot).The right panel of the figure includes the described statistical indicators that summarize the similarities and differences of the two profiles.
Profile statistical indicators are calculated by first averaging the compared profiles at 500 m resolution then computing a set of statistics for each altitude range.This resolution was chosen as a trade-off between detailed aerosol structure and the signal noise of the lidar measurements.This value, however, needs to be determined in each study based on the number of available profiles.Apart from the mean value profiles, the first set of metrics used are the mean bias and the root mean square error (RMSE); being expressed in units of concentration, these values are suitable for the intercomparison of models but can be misleading for the performance of models with altitude.In addition, RMSE is strongly dominated by the largest values, due to the squaring operation, so in cases where prominent outliers occur, RMSE becomes less useful and its interpretation more difficult.These limitations are addressed using a second set of statistical indicators, including correlation coefficient, fractional bias, and fractional gross error.Fractional bias is a normalized measure of the mean bias and indicates only systematic errors which lead to under/over-estimation of the measured values.Similarly, the fractional gross error is a positively defined indicator that gives the same figure with respect to under-and over-estimation.Definitions of the used statistical indicators are given in Table 2.

Results and discussion
In this section we apply the described methodology to simulations performed by the four models described in Sect.2.3.The aim is not to perform a full model evaluation.As described in the introduction, this would require the use of a set of complementary remote sensing and in situ measure- ments.Instead this section aims to demonstrate the potential of the new concentration retrieval algorithms in future model evaluation activities.
Ten European remote sensing stations contributed data to this intercomparison, mainly concentrated in the Mediterranean area, as shown in Fig. 2. Their location and the data supplied can be seen in Table 3.All stations are part of the EARLINET and AERONET networks, a fact that guarantees that the provided data are of uniform quality.The participating stations provided, in total, 55 LIRIC retrievals of dust profiles for an agreed time period from January 2011 to February 2013.The number of measurements is limited by the sporadic nature of dust transport events, the requirement for simultaneous lidar and AERONET observations, and the available manpower for manual analysis of each case.Each station selected the cases and performed the LIRIC retrievals independently, based on the available measurements.For each station, the selected profiles were screened for having at least 24 h time distance, as described before, to consider only measurements of different dust transport events.The time difference between lidar and photometer measurements was kept as small as possible (65 % -< 1 h, 87 % -< 3 h).In all cases, attention was given to have stable atmospheric conditions between the measurements of the two instruments.A set of quality checks was performed to assure the consistency of these measurements.The AOD difference between the time of photometer and lidar measurements was kept less than 30 %, with the average difference for the data set being 0.36 %, as shown in the first panel of Fig. 3.The AOD was calculated using mainly the AERONET direct sun measure- Fractional gross error ments or the lidar Raman extinction retrieval if available.A similar check was performed for the aerosol fine-mode fraction (FMF), to detect possible change in aerosol mixture.In all cases, FMF was kept below 20 % and the average difference of the data set is 0.44 %, as shown in the second panel of Fig. 3.These values indicate that, in average, the AOD and FMF changes are not expected to introduce any bias in the data set.Additionally, for each case we performed backtrajectory analysis for both photometer and lidar measurements and checked qualitatively for any significant changes in the air-mass origin.All inversions were made using either level 1.5 (cloud-screened) or level 2 (cloud-screened and quality-assured) AERONET data.We have used photometer measurements with AOD greater than 0.1 at 440 nm, as shown in the right panel of Fig. 3.This value is lower than the AERONET level 2 quality limit, nevertheless we used the value as a compromise to allow the study of weaker dust transport events.
The majority of cases occurs during the spring and summer periods (see left panel of Fig. 4), when most Saharan dust transport episodes occur over Europe and cloud-free conditions, needed for the measurements, are usually found (Mona et al., 2006;Papayannis et al., 2008).The selection of cloud-free sky could bias our sampling towards meteorological conditions and transport paths that favor such cloudfree weather.The actual number of available measurements varies with altitude as shown in the right panel of Fig. 4. In the lower altitudes, the number is limited by the ground level altitude of the stations and the incomplete measurement range of the instruments.In the higher altitude the lidar profiles were cut at the points were no dust was further detected.The observational data set was selected to include only dust cases and the results should be interpreted accordingly.For example, if models represent the correct amount of dust but predict its arrival at different time, this would appear as a model negative bias in our comparison.For evaluating the AOD difference [%]  simulated aerosol burden, an observational strategy with systematic measurements should be followed.The four examined dust transport models were run for the given period and the output was stored for 3 h intervals.
The comparison based on center of mass (CoM) reveals that models correctly track the main vertical location of transported dust.The first row of Fig. 5 presents this comparison for the four models, and shows that the models perform well when simulating the dust CoM in almost all cases.The difference of predicted and measured CoM exceeds 1 km only in 2 cases (4 %) for BSC-DREAM8b v2, 3 cases (5 %) for DREAMABOL, 8 cases (15 %) for NMMB/BSC-DUST, and 6 cases (11 %) for DREAM8-NMME-MACC.The BSC-DREAM8b v2 and DREAMABOL models show almost zero bias tracking the location of dust almost perfectly, except in few outlying cases.These are cases where the model practically does not predict the transport event, and the CoM is determined by some residual concentration in the profile.Instead, NMMB/BSC-DUST and DREAM8-NMME-MACC overestimate the center of mass altitude, especially in cases with observed CoM above 3 km; the fractional bias values for NMMB/BSC-DUST and DREAM8-NMME-MACC are 0.16 and 0.14 respectively.The correlation coefficient, values for the four models are 0.67 for BSC-DREAM8b v2, 0.81 for DREAMABOL, and 0.74 for NMMB/BSC-DUST, and 0.83 for DREAM8-NMME-MACC.
Our examination indicates that four models simulate systematically lower total amount of dust relatively to the LIRIC profiles.The second row of Fig. 5 presents the comparison of the dust concentration integrated across the common altitude range for each case.The mass concentration from the four models shows significant correlation with the measured one, but in general it is underestimated.For high concentration cases (values greater than ∼ 0.3 g m −2 ) NMMB/BSC-DUST and DREAM8-NMME-MACC predict sufficiently well the concentration values, while the other two models tend to underestimate.For low concentration values (less than 0.3 g m −2 ) all models apart from DREAM8-NMME-MACC underestimate the dust concentration in many cases.This could be caused by insufficient dust source strength, overestimated deposition and wet scavenging parameters, or a combination of both; the current data set is not sufficient to discriminate the exact factor affecting the comparison from the model point of view.It is believed, however, that using the present approach as part of a complete, multi-sensor evaluation exercise would help investigating possible model limitations.The improved performance of DREAM8-NMME-MACC could be attributed to the assimilation scheme used only by this model.The total fractional bias values for the models range from −1.00 to −0.22, while correlation coefficients range from 0.51 to 0.83.
The third row of Fig. 5 shows the relationship of peak simulated values for each profile and the measured ones.Also in this case, the models underestimate the maximum value of each profile.The fractional bias for the four models ranges from −0.85 to −0.27, while the correlation coefficient has smaller values than before (from 0.61 to 0.78).This result can only partly be explained by the overall concentration underestimation that was noted before.The lower original resolution of the models, compared to the lidar, could lead to a "smoothing" effect of individual peak values in the compared cases.A similar effect could be caused by the mixing of the dust in all the volume of the model's grid.The last row of Fig. 5 compares the dust-layer thickness parameter, i.e., regions where dust concentration is above 5 µg m −3 .All models show good performance in predicting the dust layer, but there are individual differences.The DREAM8bV2 and DREAMABOL models systematically underpredict the dust-layer thickness, probably due to the underrepresentation of dust concentration.DREAM8-NMME-MACC systematically overpredicts the dust-layer thickness, as spreads the observed dust in higher altitude and in many cases does not reproduce correctly the top-layer boundary.The effect of our sampling strategy (only cases with observed dust) is apparent in the low values of these plots: in several cases the models do not predict dust transport while we miss the cases were models predict dust when none is observed.The fraction bias ranges from −0.45 to 0 and the correlation coefficient from 0.56 to 0.70.A summary of the aforementioned statistical indicators for all the examined models is given in Table 4.
In summary, the current study indicates that the examined dust models represent well the altitude of transport while the total concentration is predicted lower than measured, with sharp peaks smoothed out.The performance of models in specific cases, however, can vary significantly.Figure 6 summarizes the performance of all models on a case-by-case   comparison.For each model-measurement pair we calculate the vertical correlation coefficient of the volume concentration profiles as well as the fractional bias, and the results are plotted in a scatterplot.We assess the model variability for each case by calculating the same parameters for model profiles at −3 and +3 h of the observations, and depict the range of values with the error bars.The ideal model would have correlation one, i.e., it would predict perfectly the shape of the dust profile, and 0 fractional bias, i.e., predicting correctly the quantity of transported dust.While individual cases show a big variability, each model shows a characteristic pattern.For BSC-DREAM8b v2 and DREAMABOL most cases have high correlation but negative fractional bias i.e., the models can often predict correctly the shape of the dust profile but underestimate the concentration.In contrast, NMMB/BSC-DUST and DREAM8-NMME-MACC have fractional bias value distribution near 0 but a wider spread of correlation values.For all models there is a considerable spread of values for the specific comparisons, a further argument for the need for a statistical evaluation of dust model performance.
These results are further supported by directly comparing the profile data provided by the model, indicating that models do not only capture the general altitude of dust transport but, on average, predict correctly the shape of the dust profile.In Fig. 7 the mean measured concentration profile for all 55 cases is compared with the corresponding profiles of the four models.The profiles show good agreement in the predicted shape of the dust concentration, but have wider spread in the absolute values.BSC-DREAM8b v2 and DREAM-ABOL predict the maximum dust concentration in the region 2-3 km a.s.l., in agreement with the observations, while the other two models have the maximum value at slightly higher altitude of 3-4 km.DREAM8-NMME-MACC overestimates the concentration of dust in altitudes above ∼ 5 km; specifically, while the observed values of dust are below 10 µg m −3 above 6 km, the model predicts these values only above 8 km.The concentration values show wider discrepancy: while the peak value of the mean profiles is retrieved at ∼ 65 µg m −3 the models peak values range from ∼ 30 to ∼ 50 µg m −3 .The observed increased concentration at high altitudes in some models could be related to misrepresentation of the tropopause (Janjic, 1994;Mona et al., 2014) that normally limits the maximum altitude of dust transport.In higher altitudes, the main removal mechanism of dust is sedimentation, and the removal of any dust reaching high altitudes is slower, allowing the artificial accumulation of dust.When examining the profile data, we can observe the differences in high and low concentration cases that were described before, as shown in Fig. 8. NMMB/BSC-DUST and DREAM8-NMME-MACC have particularly good agreement at the high concentration cases.As noted before, such findings highlight the importance of statistical comparison approach and indicate that this trend should be investigated in a future complete evaluation study.
The above results are further explored in Fig. 9.The top left panel presents the mean bias of the four studied models.All models show negative bias below 4 km while above that altitude NMMB/BSC-DUST has almost 0 bias and DREAM8-NMME-MACC has positive bias values.At the altitude range where most dust is located, i.e., from 2 to 4 km a.s.l., the biases range from −46 to −5 µg m −3 .Such bias is compatible with similar comparison again lidar extinction profiles (Mona et al., 2014)  range, the mean values range from 40 to 70 µg m −3 , with the maximum value reached by DREAMABOL at 2 km a.s.l.The profiles of the correlation coefficient for the four models are shown in the bottom left panel.All four models show significant correlation for altitudes ranging from 1 to 6 km, which is the region where most dust particles are typically observed (Mona et al., 2006).The mean values range from 0.52 for DREAMABOL to 0.68 for NMMB/BSC-DUST.Finally, the bottom right panel shows the fractional gross error profiles.The minimum values for F E , ranging from 0.73 to 1.09, are observed at 2-4 km.At higher altitudes, the F E values are higher, with values ranging from 1.18 to 1.56 at 6 km a.s.l.A summary of the different behavior of the four models is given in Fig. 10 using Taylor diagrams (Taylor, 2001).The data of the models and measurements were averaged at 1 km thick altitude bins (from 1 to 2 km, from 2 to 3 km etc.) before calculating the statistics, to give an overview of the model performance at different altitudes.Four Taylor diagrams are presented, for the altitude range from 1 to 5 km.DREAM8-NMME-MACC seems to capture correctly the range of values of the dust events in all altitude ranges, a property that can partly be attributed to the use of data assimilation.NMMB/BSC-DUST shows similar good performance, especially for 3 to 5 km.As observed before, the other two models underestimate the variability of dust in a consistent way with altitude.The model simulations have correlations from 0.4 to 0.8 at all four altitude ranges.
The presented results depend on regional and seasonal variations.The number of available cases is not sufficient to perform a seasonal analysis or to study in detail a regional (or even a per-station) performance.However, we consider that they can still be used to get a hint of the insight that can  be gained from a regional evaluation of the model performance.With this aim, the available stations were divided in two clusters, a west and an east one.The west cluster of stations, including Évora, Granada, and Barcelona, is affected by dust events arriving only after a few days of transport.The east cluster, including Potenza, Lecce, Athens, Thessaloniki, and Bucharest, is affected by longer transport of dust from both the west and central Sahara.The top row of Fig. 11 presents the regional comparison of the mean dust concentration profiles.The average profiles indicate that the dust is transported at different altitudes, with the maximum value observed around 2 km a.s.l. for the west cluster and around 3 km a.s.l. for the east cluster, a behavior that is well captured by all models.The correlation coefficient at all altitudes is higher for the east rather than the west cluster as shown bottom row of the same figure.Specifically, the average correlation at the altitude range from 2 to 5 km ranges for the west cluster from 0.46 to 0.72 and for the east cluster from 0.56 to 0.82.This difference can be attributed to the strong effect of orography on the west cluster, as the Atlas Mountains and orography of the Iberian Peninsula make the prediction of the dust transport difficult, while the transport to the east cluster is performed, for a large part of the transport path, over the Mediterranean Sea.Misrepresentation of wet convection events in the region of the Atlas Mountains, a known problem of regional dust models, can also contribute to this discrepancy (Reinfried et al., 2009;Solomos et al., 2012).Additionally, the longer transport to the east cluster, typically 1-2 days longer according to back-trajectory analysis, homogenizes the dust transport event and makes small inconsistencies in space and time less relevant.These preliminary results indicate that the regional aspects in prediction of the vertical distribution of dust should be further studied.

Conclusions
A methodology for the examination of dust model data using volume concentration profiles retrieved using the synergy of lidar and sun photometer has been presented.The proposed approach adapts previous experience from SDS-WAS to the use of dust volume concentration profiles.The methodology was applied for the examination of 4 dust models using 55 dust concentration profiles retrieved from the EAR-LINET/AERONET stations across Europe using the LIRIC algorithm.
This first comparison indicated that dust models correctly represent in average the dust structure, but their performance for simulating individual event structure and the exact amount of dust should be further explored.The four models can individually predict different aspects of dust transport, but show considerable differences in their performance despite many similarities in their setup, including the number of dust size bins and deposition processes.Previous studies, examining older version of the DREAM model (Kishcha et al., 2007), have indicated a good agreement between model and lidar volume estimates, and this indicates the need for further investigation of both algorithms and models.Understanding the causes of model and observation discrepancies should be the topic of future evaluation studies including a variety of sensors, e.g., AERONET photometers, satellite AOD measurements, and in situ measurements from PM 10 monitoring stations, to explore different aspects of dust modeling systems.In total, the study hints that an ensemble dust models products would better simulate the dust observations, even if some discrepancies would remain.
Additionally, the results point towards future developments needed in the observational infrastructure and remote sensing algorithms used.The number of available remote sensing measurement should be increased to allow better characterization of regional and seasonal aspects of model performance.For this to happen, automatic retrieval algorithms and continuous operating lidar systems should be developed and used.This would also allow the near-realtime evaluation of dust models, providing important feedback both to modelers and end-user communities.A further step needed from the retrieval algorithms perspective is a better characterization of the error, both at the statistical and systematic levels.This will allow distinguishing more subtle effects in different model setups.Such improvements are actively pursued in the framework of ACTRIS-2 and other projects across Europe.
In total we believe that this study is an important step toward the systematic use of remote sensing atmospheric profiling measurements to model-evaluation studies.The increased availability of advanced profiling data from multi wavelength lidars and sun photometers will form a solid base to improve dust model performance and lead to better understanding of the effect of dust on air-quality, weather and the climate.

Figure 2 .
Figure 2. Map of the ACTRIS/EARLINET remote sensing stations providing data for testing the proposed methodology.

Figure 3 .
Figure 3. Quality analysis of the LIRIC data set: (a) difference of AOD between lidar and photometer measurements, (b) difference of fine-mode-fraction between lidar and photometer measurements, (c) histogram of photometer AOD for all cases.The red lines in (a) and (b) indicate the mean value of the data set.

Figure 4 .
Figure 4. Number of available measurements (a) per month and (b) per altitude.

Figure 5 .
Figure 5.Comparison of single statistical indicators (rows) for the four models (columns) against LIRIC retrievals.First row shows the center of mass (CoM), second row the total concentration (C), third row the peak concentration (P ), and fourth row the dust-layer thickness (l).The model error bars represent the value for −3 and +3 h from the time of measurements.LIRIC error bars show indicative values of error 30 % for concentration, 10 % for center of mass, 30 % for peak concentration, and 20 % for dust-layer thickness.These values are only approximate as the full characterization of LIRIC uncertainties is still an open issue.

FFigure 6 .
Figure 6.Scatter plot of vertical correlation and fractional gross error.Black dots represent the ideal performance (0, 1).Each point on the plot corresponds to a pair consisting of one LIRIC and one model profile.The error bars represent the value for model profiles −3 and +3 h from the time of measurements.The bars on the axis indicate the univariate distribution of the data for each variable.

Figure 7 .
Figure 7. Average profile comparison as simulated by four models and retrieved by LIRIC.Shaded areas indicate the standard deviation of the mean values.

Figure 8 .
Figure 8.Comparison of average profiles simulated by all four models for low and high concentration cases, separated at 0.3 g m −2 .Shaded areas indicate the standard deviation of the mean values.

Figure 9 .Figure 10 .
Figure 9.Comparison of profile statistics for the four models against LIRIC measurements.(a) Mean bias, (b) root mean square error, (c) correlation coefficient r, (d) fractional gross error F E .Gray shading indicates altitude ranges with less than 15 profiles available.

Figure 11 .
Figure 11.Comparison of west and east station cluster performance.The top row shows mean concentration profiles for the two station clusters.Shaded areas indicate the standard deviation of the mean value.The bottom row shows correlation coefficient profiles for the two clusters.Gray shading indicates altitude ranges with less than 15 profiles available.

Table 2 .
Definition, symbol, value range, and ideal score for the statistical performance indicators used in the systematic examination of dust model concentration profiles.c denotes the concentration at altitude z.M i and O i represent modeled and observed profiles, respectively for the ith measurement pair.Altitude dependence is omitted for brevity.

Table 3 .
The following 10 stations provided dust concentration profiles retrieved by the LIRIC algorithm.Three measurements of the Évora station do not include depolarization information.The provided references give further information for each station and the measurement instruments.
• N, • E) Altitude (m) Lidar channels No. of profiles Reference

Table 4 .
Correlation coefficient (r) and fractional bias (F B ) for single value metrics of the compared profiles.