MAX-DOAS observations of aerosols, formaldehyde and nitrogen dioxide in the Beijing area Comparison of two profile retrieval approaches

. A 4-year data set of MAX-DOAS observations in the Beijing area (2008–2012) is analysed with a focus on NO 2 , HCHO and aerosols. Two very different retrieval methods are applied. Method A describes the tropospheric proﬁle with 13 layers and makes use of the optimal estimation method. Method B uses 2–4 parameters to describe the tropospheric proﬁle and an inversion based on a least-squares ﬁt. For each constituent (NO 2 , HCHO and aerosols) the retrieval outcomes are compared in terms of tropospheric column densities, surface concentrations and “characteristic proﬁle heights” (i.e. the height below which 75 % of the ver-tically integrated tropospheric column density resides). We ﬁnd best agreement

butions of these near-surface concentrations show however a quite good agreement, and this indicates that near-surface concentrations derived from MAX-DOAS are certainly useful in a climatological sense. A major difference between the two methods is the dynamic range of retrieved characteristic profile heights which is larger for method B than for method A. This effect is most pronounced for HCHO, where retrieved profile shapes with method A are very close to the a priori, and moderate for NO 2 and aerosol extinction which on average show quite good agreement for characteristic profile heights below 1.5 km.
One of the main advantages of method A is the stability, even under suboptimal conditions (e.g. in the presence of clouds). Method B is generally more unstable and this explains probably a substantial part of the quite large relative differences between the two methods. However, despite a relatively low precision for individual profile retrievals it appears as if seasonally averaged profile heights retrieved with method B are less biased towards a priori assumptions than those retrieved with method A. This gives confidence in the result obtained with method B, namely that aerosol extinction profiles tend on average to be higher than NO 2 profiles in spring and summer, whereas they seem on average to be of the same height in winter, a result which is especially relevant in relation to the validation of satellite retrievals.
Published by Copernicus Publications on behalf of the European Geosciences Union.

Introduction
Multi-Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a ground-based passive remote sensing technique that is used to detect tropospheric trace gases such as nitrogen dioxide (NO 2 ), formaldehyde (HCHO), sulfur dioxide (SO 2 ), nitrous acid (HONO), iodine oxide (IO), glyoxal (CHOCHO), bromine oxide (BrO) and aerosols (aerosol extinction) (e.g. Wittrock et al., 2004;Wagner et al., 2004Wagner et al., , 2009Irie et al., 2011;Coburn et al., 2011;Hendrick et al., 2014;Wang et al., 2014). MAX-DOAS instruments take spectral measurements of scattered sunlight in the ultraviolet (UV) and visible (Vis) part of the electromagnetic spectrum. Profile information is obtained from a scan which comprises spectral measurements at different elevation angles but in the same azimuthal direction. The main retrieval products are tropospheric column densities, concentrations near the surface and estimates of the vertical profile shape.
Because of this versatility MAX-DOAS is complementary to ground-based in situ observations (in a spatial sense) as well as to satellite observations (in a temporal and spatial sense, i.e. the vertical) and it can play an important role in bridging the gap between those techniques . Knowledge of the relationship between surface concentrations and integrated tropospheric column densities (in urban, suburban and rural regions) is important for the use of satellite observations in studies of air quality (e.g. Boersma et al., 2009;Mendolia et al., 2013).
MAX-DOAS has great potential to be used in regional or global networks similar to the AERONET (sun photometer) and EARLINET (lidar) networks because of its versatility, the relatively low cost per instrument, the fact that a radiometric calibration is not required, and the fact that instruments can operate autonomously. Long-term data sets can be used for e.g. air quality monitoring, validation of chemical transport models, validation of satellite tropospheric column density retrievals and potentially as input in data assimilation systems for air quality forecasts. With respect to satellite validation it is interesting to note that MAX-DOAS can provide not only tropospheric trace gas column densities for direct comparison, but also profile shape estimates for trace gases and aerosol extinction. These can replace the a priori profile shapes assumed for the satellite retrieval, such that one can assess the impact of the a priori profile shape assumption (both for aerosols and for the trace gas of interest) on the satellite retrieval accuracy (Rodgers and Connor, 2003). Proper knowledge of the accuracy of the profile shape assumptions that are used in the satellite retrieval is crucial for a realistic estimate of the potential biases in the retrieved tropospheric column density.
Mostly in the last decade, much progress has been made with respect to the quantitative interpretation of MAX-DOAS observations (e.g. Wagner et al., 2007;Roscoe et al., 2010), and MAX-DOAS instrumentation or similar (like PANDORA Herman et al., 2009) has been used for a wide range of gases and applications. In comparison to surface concentrations and profile shapes, tropospheric column densities are the most robust retrieval product. Several MAX-DOAS data sets have been used for validation of satellite observations of tropospheric column densities, predominantly for NO 2 (e.g. Irie et al., 2008b;Halla et al., 2011;Ma et al., 2013;Lin et al., 2014;Kanaya et al., 2014). Nearsurface concentrations are generally associated with higher uncertainties (primarily because of the quite limited vertical resolution of MAX-DOAS measurements), but nevertheless some studies have shown promising comparisons compared to independent ground-based in situ instrumentation, (see e.g. Wagner et al., 2011;Li et al., 2013). Most challenging is the retrieval of vertical tropospheric profiles, and also their validation.
Quite some groups have developed algorithms for the vertical profiles of aerosol extinction and trace gases (e.g. Frieß et al., 2006;Irie et al., 2008a;Clémer et al., 2010;Li et al., 2010;Wagner et al., 2011;Vlemmix et al., 2011;Sinreich et al., 2013). Especially in relation to satellite validation there is a great need for simultaneously measured trace gas and aerosol extinction profiles, and MAX-DOAS is one of the few remote sensing methods which can satisfy this need. At the same time it is well known that the MAX-DOAS profiles are only first-order estimates, due to the fact that the information content of MAX-DOAS observations with respect to the vertical distribution of aerosols and trace gases is very limited: the degrees of freedom for signal typically varies from 1 to 3, see Sect. 2.5 and Vlemmix et al. (2011).
Comparatively few studies have been published however which directly address the quality of MAX-DOAS tropospheric profiles obtained from real observations. This is largely due to the fact that suitable long-term (multi-year) data sets which can serve as golden standard in a comparison (e.g. profiles measured with high vertical resolution) do not exist. In turn, the lack of a thorough validation of MAX-DOAS profiles limits their usefulness in validation studies where MAX-DOAS itself would be the reference.
The present study is highly motivated by the need for further assessment of the quality of MAX-DOAS profiles. Our approach is based on three pillars. First, the use of two very different profile retrieval algorithms, both run with various a priori profile shape assumptions. Second, the use of a 4-year data set covering a wide range of conditions (e.g. pollution levels, seasons, meteorological conditions). Third, analysis of profiles for three different components: formaldehyde, NO 2 and aerosols (aerosol extinction profile). With this we address in this work the following specific questions: how consistent are the retrievals of individual profiles with different algorithms? How consistent are the retrievals on average? Do the column densities and profiles -on averageshow a diurnal and seasonal variation? How strong or weak is the dependence on a priori assumptions? Which atmospheric conditions most critically limit the quality of the profile re- The two profile retrieval methods that are compared in this study do not retrieve profiles on the same vertical grid. One way to perform a profile comparison would be to interpolate profiles retrieved with both methods to a common vertical grid. A comparison performed in this way would give results for all layers which define the common grid. Such an approach would probably be favourable if the vertical resolution of the measurements (and therefore the DOFS) was high, but this is not the case for MAX-DOAS measurements, as noted above. Because of the low DOFS (1-3), it was decided to derive from each profile three suitable quantities and to compare the two profile retrieval methods based on those quantities: tropospheric column density, near-surface concentration and "characteristic profile heights" H 75 . The latter quantity is defined as the height below which 75 % of the integrated profile resides (75 % of the tropospheric column density). An advantage of this quantity (a scalar) is that it allows a first-order description of the profile shape of pollutants which reside primarily in the atmospheric boundary layer.
The paper is structured as follows: in Sect. 2 we describe the data set of MAX-DOAS observations that is used: the instrument characteristics and measurement sites; the settings of the DOAS fitting procedures for the UV and Vis; the two MAX-DOAS profile retrieval algorithms, both of which are run with different "internal" settings to test the dependence on a priori assumptions. The last part of this section describes the criteria that are applied to select data with sufficient quality. Results for selected days and the statistical analysis based on the entire data set are shown and discussed in Sect. 3. Section 4 contains a discussion, and the conclusions are in the last section.

MAX-DOAS measurements and profile retrieval algorithms
The retrieval of vertical profiles from spectral measurements with MAX-DOAS typically consists of three steps. First, differential slant column densities (of O 4 , NO 2 and HCHO) are derived by applying the DOAS spectral fitting technique to the measured spectra. Second, differential slant column densities of O 4 are used as input for the aerosol extinction profile retrieval algorithms. Third, differential slant column densities of the trace gas of interest (in this work: NO 2 and HCHO) are used as input for the trace gas profile retrieval algorithm, together with the estimated aerosol extinction profile. In this section each of those steps is described in more detail.

Instrument and measurement site
The MAX-DOAS instrument used in this study has been designed and assembled by the Belgian Institute for Space Aeronomy (BIRA-IASB), see Clémer et al. (2010). It consists essentially of a telescope mounted on a sun-tracker (which can point at any elevation and in any azimuthal direction) combined with two spectrographs: one for the UV (300-390 nm), and one for the Vis (400-720 nm). Although the instrument is also capable of taking direct sun observations, we use here only the scattered sunlight observations taken towards the north.

DOAS retrieval of differential slant column densities
The DOAS spectral fitting method (Platt and Stutz, 2008) is applied to the spectra measured with the UV and Vis spectrometers. The DOAS analysis is performed with the QDOAS software that has been developed at BIRA (Fayt et al., 2011). Table 1 gives for some relevant parameters the values used in both of the channels. More details of the DOAS settings used can be found in  for the UV channel, and Hendrick et al. (2014) for the Vis channel. Note that a scaling factor of 0.8 is applied to the measured differential slant O 4 column densities (see Clémer et al., 2010) in order to obtain sufficient agreement between simulations and measurements. This scaling factor is used for both methods A and B. After the DOAS analysis the differential slant column densities corresponding to each elevation are linearly interpolated in time (with a 20 min sampling), such that as input for the profile retrieval code one "scan" can be provided, as if the measurements were performed at the same time. Since the DOAS analysis is performed with the zenith-noon spectrum as a reference, the (interpolated) zenith differential slant column densities of a scan is subtracted from all the differential slant column densities. This procedure reduces the sensitivity to trace gases in the stratosphere and upper part of the free troposphere to almost zero.

Method A -algorithm developed at BIRA
The first algorithm (method A) has been developed at BIRA Hendrick et al., 2014;Wang et al., 2014) and makes use of the optimal estimation method (Rodgers, 2000). Forward simulations of differential slant column densities and weighting functions are performed using the LI-DORT radiative transfer model (Spurr, 2008). Trace gas and aerosol extinction profiles are described by partial columns of 13 layers in a fixed altitude grid: the first 10 layers (below 2 km) each have a vertical extent of 200 m, between 2 and 3 km there are two layers of 0.5 km, and the uppermost layer of the profile goes from 3 to 4 km. An important input parameter for retrieval model A is the a priori profile, which is the initial profile from which the profile retrieval code iteratively searches for a more optimal solution. Retrieved profile shapes can in principle be very different from the a priori, but only if the information content of the measurements is sufficiently high (depending on trace gas and measurement conditions). If this is not the case, the retrieved profile shape will be very similar to the a priori. In the original implementation of the retrieval code  this a priori profile concentration profile n(z) was described by an exponential function which is characterized by a specific a priori scale height H prior s : For trace gases, the profile shape is scaled such that the integrated profile corresponds to the first-order estimate of the tropospheric trace gas column density (N V prior ), namely the differential slant column density measurement at 30 • elevation. The corresponding geometrical differential air mass factor (see e.g. Brinksma et al., 2008) is equal to one. For aerosols the initial column estimate (the AOD, aerosol optical depth) was set to 0.15 for all retrievals.
A second important input parameter is the a priori error estimate for each layer. Tests have shown that setting this value high -this would give the algorithm most flexibility to realize diverse profile shapes -leads to frequent retrievals of profile shapes showing oscillations that are not likely to be realistic. For this reason a relatively low value (20 %) of the a priori is chosen, although this limits the potential of the algorithm to deviate significantly from the a priori, see also the discussion in Sect. 4.2.
Tests performed prior to the study presented here have shown that the interplay between the a priori profile and its error estimate, combined with the fact that the sensitivity of MAX-DOAS decreases with altitude, leads to an undesired effect for relatively high a priori scale heights (> 1.5 km), namely that the retrieved tropospheric trace gas column or AOD in the case of aerosol extinction retrieval is systematically too high.
This unwanted mechanism works as follows: for a priori scale heights higher than 1.5 km, the exponentially decreasing a priori profile does not go to (almost) zero in the upper part of the altitude grid (4 km). Because above approximately 1.5 km the information content of the observation is low, the retrieval will have a tendency to stay close to the a priori and not be allowed to go to zero. As a consequence, the retrieved profiles will have a considerable fraction of the partial column above ∼ 1.5 km, even when this is not the case in reality. This effect will lift up the mean profile height, and this goes together with a systematic overestimation of the integrated trace gas column (or AOD).
By modifying the definition of the a priori profile shape such that it decreases to zero at the top of the altitude grid, the overestimation of columns and AOD is greatly reduced. The following profile shape definition is forced to low values above 1.5 km and even zero at the top of the altitude grid: (2) Figure 2 shows a priori profile shapes obtained with this definition, for H prior s = {0.5, 1.0, 1.5} km. Note that the range in terms of H 75 is different: {0.6, 1.0, 1.3} km.
The impact of the a priori profile shape on the retrieved profile can be quite high. For this reason the profile retrieval with method A is performed with three different a priori scale heights (H prior s = {0.5, 1.0, 1.5}), leading to three versions: A1, A2 and A3. The final product that is compared to method B is a composite of the retrievals with these three a priori: for each retrieval quantity (see Sect. 2.6) the mean of the values obtained with A1-A3 is taken as the solution, and the difference between the maximum and minimum as the uncertainty. The reason to follow this approach is that the impact of the a priori is substantial and there is no external information available instead which justifies the choice for one specific a priori. Tests have indicated that errors estimated in this way are in general considerably larger than the smoothing error, a commonly used parameter in the optimal estimation framework to quantify the impact of the a priori on the retrieval error (Rodgers, 2000). The relative smoothing error per layer is typically less than 20 %, for both the aerosol extinction and the trace gas retrieval. The a priori based error is about 1.6 times higher in case of the aerosol extinction retrieval and about 4.8 times higher in case of the NO 2 retrieval (both numbers are median values).

Method B -algorithm developed at KNMI
The profile retrieval approach of method B (Vlemmix et al., 2011) is quite different from method A: it makes use of a profile shape parameterization with just a few (2-4) free parameters; forward simulations are performed by making use of a look-up table which has been created with the Doubling Adding KNMI (DAK) radiative transfer model (De Haan et al., 1987;Stammes et al., 1989); a standard least-squares algorithm is used, without any form of regularization. The main reason to use a low number of free parameters is that the information content of MAX-DOAS observations with respect to the vertical distribution of aerosols and trace gases is quite limited (see Fig. 3). With a suitable choice of free parameters a sufficiently wide range of possible profile shapes can be retrieved, especially in combination with the ensemble approach described below. Compared to the description in Vlemmix et al. (2011) the algorithm has been modified in the following ways: the profile shape parameterization is slightly different, this is described below; the look-up table is compiled to allow for more extreme aerosol optical thicknesses (τ ) needed in China with τ = 3.2 as maximum; the lookup table is expanded with a UV component (central wave- length: 360 nm); no correction is applied to compensate for the temperature dependence of the differential cross-section of NO 2 (similar strategy for method A) -a fixed temperature is used (296 K). This will affect the accuracy of both retrieval methods similarly. The up to four free parameters that are used to parameterize the profile (see Fig. 2) are: (i) the tropospheric column density for trace gases and the AOD in case of aerosols; (ii) the top height of the mixing layer; (iii) the "shape parameter", which determines the linear increase or decrease of the trace gas concentration or aerosol extinction in the mixing layer; (iv) the fraction of the total trace gas column density which resides (uniformly distributed) in the layer starting at the top of the mixing layer up to 2 km above. The vertical extent of this layer varies with parameter (ii). Parameter (iv) replaces the free tropospheric layer which in the earlier version of the algorithm (Vlemmix et al., 2011) was put at a fixed altitude. Parameter (iii), already tested and introduced as part of a sensitivity study in Vlemmix et al. (2011) is also newly applied here. An important characteristic of this profile shape parameterization is that with parameter (ii) it can mimic the dynamic behaviour of the cloud-free boundary layer, which can be very shallow in the morning (especially after a cold, cloudfree night with little wind) and become quite deep during the day, especially in summer. Parameter (iii) is included especially to allow for profile shapes which peak at higher altitudes (e.g. somewhere near the top of the mixing layer). With parameter (iv) elevated trace gas concentrations at higher altitudes can be described. From Vlemmix et al. (2011) it is known that the accuracy of this part of the profile is generally low. For the aerosol extinction profile retrieval, parameter (iv) is not used for practical reasons (computation time). As a consequence it is not possible to perform accurate aerosol extinction profile retrievals under measurement conditions with elevated aerosol layers above the mixing layer -aerosol extinction profiles which peak near the top of the mixing layer can be described with the shape parameter (iii). Such cases are however indicated by high values of χ 2 and can therefore be flagged (or excluded), see below. The cost function used for method B is defined as where N S α i is the measured differential slant column density for elevation i, N S α i is the simulated differential slant column density and α i is the error estimate for the differential slant column density. Due to the low number of free parameters used in method B (2 to 4), it is more difficult to get optimal agreement between simulations and measurements (i.e. to obtain low residuals) than with method A (13 profile layers). Therefore, and also because there is no a priori to fall back on, the individual retrievals with method B tend to be more unstable with respect to one or more retrieval parameters.
It is important to note that this instability is not necessarily an unwanted effect: it is an expression of the fact that (under some conditions) the MAX-DOAS observations contain very limited information about the profile shape. For such conditions it is desirable to have a good estimate of the uncertainty. This is obtained by making use of an ensemble approach: the retrieval code is run 50 times, each time with slightly different input. The differential slant column density measurements are perturbed by adding Gaussian noise with a standard deviation corresponding to 10 % of the original differential slant column density (obtained with the semisimultaneous zenith measurement). For each scan an ensemble of solutions is obtained, and for each retrieval quantity the median is taken as the final result. The width of the distribution for each parameter (e.g. described by the end of the first and beginning of the fourth quartile) provides an estimate for the retrieval uncertainty. Note however, that this retrieval uncertainty does not account for the uncertainty with respect to the profile shape parameterization. For this reason the retrieval is run for several profile shape parameterizations at the same time (see Table 2) and a composite retrieval product is constructed a posteriori. A posteriori selection of plausible profile shape parameterizations (among B1-B4) is done by considering the distribution of the reduced χ 2 (χ 2 ν ). This parameter is defined as  I, II  B2  I, II, III  B3  I, II, IV  B4 I, II, III, IV where N is the number of observations (differential slant column densities at various elevations) minus the number of model parameters (i.e. 2 to 4). If the median value of the χ 2 ν distribution (after 50 runs) for a certain profile shape parameterization is approximately equal to one, then the selected retrieval model is capable of producing simulations that agree with the observations within the estimated measurement error.
After the algorithm is run 50 times for all four models, it is determined which models are included in the a posteriori composite retrieval product, namely all models which have a median χ 2 ν < 1.5. For each model individually the retrieval outcomes for a certain quantity (e.g. surface concentration) is defined as the median value of the distribution (after 50 runs) for that particular quantity. The lower limit of the corresponding uncertainty estimate is defined as the value which marks the transition from the first to the second quartile of the distribution. The upper limit is defined similarly as the value which marks the transition from the third to the fourth quartile of the distribution. This implies that 50 % of the retrievals is within the error bar. The composite product is constructed simply by averaging the medians of the selected models, and the error bars are constructed by averaging the lower limits and upper limits separately. The procedure that is followed here (including all models among B1-B4 which have sufficiently low median of χ 2 ν ) yields a more realistic uncertainty estimate than if only the model with lowest median χ 2 ν would be used, because it takes into account the uncertainty with respect to the profile shape.

Selection criteria and uncertainty estimates
Comparison of methods A and B is done only for profile pairs which satisfy three criteria: they should pass the quality control criteria of method A, and those of method B, and they should coincide with AERONET observations. The third criterion provides an indirect way of selecting cloud-free periods. MAX-DOAS profiles are only included in the comparison if at least three AERONET level 2.0 (cloud screened,  quality controlled) measurements are taken within an hour around the MAX-DOAS measurement. Quality control for method A is based on two quantities: the size of the residual of the profile fit and the degrees of freedom for signal (DOFS, see Rodgers, 2000). The residuals are defined as the sum of squared differences between simulations and measurements, divided by the simulated differential slant column density for an elevation of 30 • (this quantity provides a first-order estimate of the tropospheric vertical column density): Figures 3 and 4 show the histograms of these two parameters before the quality control is applied. These figures illustrate clearly that in general profile retrieval is more challenging for HCHO than for NO 2 : the DOFS for HCHO are often well below 2, whereas for NO 2 the DOFS are often > 2. Also the residuals for HCHO are considerably higher for a considerable fraction of all data (note that Fig. 4 shows the logarithm of the residual). The same is illustrated in Fig. 5: the averaging kernels for HCHO are lower than for NO 2 and are less orthogonal with respect to one another. Profile pairs of A and B are excluded from the comparison if the minimum value of the DOFS is < 1 for one or more of the models A1-A3. Also they are excluded if the maximum residual of A1-A3 is larger than 0.1. Quality control for method B consists of selecting only those profiles where the median value of the reduced χ 2 ν for the profile fit of method B is < 1.5. For aerosol 0.1 km 0.3 km 0.5 km 0.7 km 0.9 km 1.1 km Figure 5. Examples of averaging kernels for retrievals performed with method A, both for the UV (left column) and Vis (right column). The upper row shows averaging kernels for low AOD (representative for the winter season), the bottom row for high AOD (representative for the summer season).
extinction only the median χ 2 ν of the aerosol extinction profile fits is considered, for the trace gas retrieval also the median χ 2 ν of the trace gas profile fits. The impact of the quality control criteria defined above is discussed in Sect. 4.1.

Retrieval quantities
We compare results mostly based on three quantities: the tropospheric vertical column densities (N V ), the concentration (n surf ) or volume mixing ratio (X surf ) of trace gases near the surface and the characteristic height (or H 75 , see Sect. 1) of the retrieved profile. Similar quantities are used in the case of aerosols: aerosol optical depth (τ aer ), aerosol extinction near the surface e surf , and H 75 . A fourth quantity that is used is the a posteriori scale height (not to be confused with the scale height of the a priori profile of method A (H for aerosols. The reason to consider this first-order profile height estimate in addition to H 75 is that, as will be shown in Sect. 3.2, for method A it depends less on the a priori Figure 6. Examples of NO 2 profiles retrieved with method A -grey (H s = 0.5 km) and black (H s = 1.5 km) -and method B -parameterizations B1-B4 (see Fig. 1 and Table 2) shown in red, blue, orange and light blue respectively.
than H 75 . This indicates to some extent that the measurements contain information about the profile height that is not extracted in an optimal manner in this particular retrieval setup.

Example day
Figures 6-8 show retrieval results for 19 May 2012. The individual profiles obtained with method A and B (Fig. 6) show good agreement in the sense that in the morning they are all quite low, and in the afternoon they are all quite high. Nevertheless, this example also illustrates that retrieved profile shapes can be very different: not only do the four versions of method B show considerable differences but so do the two versions of method A, especially those retrieved after 12:00 p.m. LT (bottom row). Time lines for the different quantities that can be derived from the profile retrieval are shown in Fig. 7 for the same example day. Both for the aerosol extinction retrieval (left column) and the NO 2 retrieval (right column) the χ 2 ν are quite low for most of the day, indicating a good quality of fit. On this particular day, the retrievals agree quite well for most quantities, especially for the column densities (row 1) and surface concentrations (row 3). Agreement is worst for H NO 2 75 and H NO 2 s in the afternoon, where method B occasionally shows much higher values. This is a consequence of the fact that the retrieval is not regularized, in combination with relatively low surface concentrations in the afternoon. Because the surface concentration is the denominator in Eq. (6), one can understand that a small change (error) in the surface con-centration can lead to a much larger change in H NO 2 s . This figure also clearly demonstrates the potential impact of different profile shape assumptions on H NO 2 75 . Figure 8 shows results for the same day, but this time for the aerosol extinction and HCHO retrieval in the UV. In general there is much more disagreement compared to NO 2 . There is on this day almost no retrieval where the agreement is good for all quantities at the same time. The agreement between most quantities is especially low between 10:00 a.m. and 4:00 p.m. LT. High values for χ 2 ν in the aerosol extinction retrieval indicate that the retrieval with method B is not successful and therefore this period is flagged with grey bars on top of each figure. Quite remarkable is the disagreement in terms of HCHO column densities in the remaining part of the day (before 10:00 a.m. and after 4:00 p.m.). In the morning of this day the higher column densities (for method B compared to A) seem to go along with higher H HCHO 75 .

Statistical analysis
Results of both retrieval methods are compared for 16 quantities in terms of correlation, slope and intercept of linear fit, and median, mean and standard deviation of relative differences, see Table 3. The comparison of 12 of these quantities is also shown in Figs. 10, 14, 15 and 21. We will discuss these results separately in terms of tropospheric column densities (AOD for aerosols), profile heights and surface concentrations (or aerosol extinction). sities. Note that measurements before 2010 are made in Beijing. From 2010 onwards, the observations are made in Xianghe. A clear seasonal cycle with a winter minimum of about 5 × 10 15 molec cm −2 and a summer maximum roughly five times as high can be seen for HCHO. Compared to NO 2 and aerosols, the variability per month is quite small. A weaker, but similar seasonal cycle can be seen for aerosols, with typical winter values around 0.2 and a summer median between 0.5 and 1.0. For NO 2 the seasonal cycle of monthly median values is quite weak as well. Winter medians are roughly between 20 × 10 15 and 30 × 10 15 molec cm −2 , summer medians between 10 × 10 15 and 20 × 10 15 molec cm −2 . Noteworthy is the fact that especially the peak values in winter can be high with values above 100 × 10 15 . Peaks in tropospheric NO 2 column densities in midsummer do not exceed 30 × 10 15 . Figure 10 and Table 3 show that very good agreement is found for tropospheric NO 2 column densities. The standard deviation of relative differences is however considerable: almost 10 %. The third and fourth columns of Fig. 10 show that the relative difference increases with in-creasing tropospheric column density and with increasing profile height. Also for tropospheric HCHO column densities the agreement is good. Relative differences are however considerably larger than for NO 2 . The dependence of relative differences on the tropospheric column density itself (second row, third column) shows opposite behaviour as for NO 2 , whereas the dependence of relative differences on the profile height shows a similar increase as for NO 2 . Despite the quite high correlation found for the AOD, the agreement between method A and B is moderate, with slopes 1.20 (UV) and 1.39 (Vis) and substantial mean relative differences. Figure 10 (bottom row) shows for the Vis that these differences in AOD are strongly related to the difference in aerosol extinction profile height, but also tend to increase with the AOD itself. The agreement between method B and AERONET is much better, which provides confidence in the AOD retrievals obtained with method B. The frequency distributions of AERONET and AODs retrieved with method B show good agreement and differ with respect to method A in the fact that they include much more cases with AOD between 1.5 and  Figure 8. Example of quantities derived from the profile retrieval in the UV (Xianghe, 19 May 2012). The left column shows the results of the aerosol retrieval, the right column shows results for the HCHO retrieval. Grey horizontal bars above each panel indicate periods that are flagged because of high values of χ 2 in the aerosol or in the trace gas retrieval.  3.5. But for the highest 25 % of characteristic profile heights, method B seems to overestimate the AOD systematically by about 20 %. Figure 11 shows for NO 2 and HCHO the relation between AOD (as measured by AERONET) and tropospheric trace gas column densities for different seasons. There are clear seasonal differences with largest differences for NO 2 vs. AOD between summer and winter. The two models show good agreement, with only moderate systematic differences for HCHO column densities in spring and summer. This is in line with the example day (Fig. 8) which shows considerable differences between tropospheric HCHO column densities retrieved by methods A and B. Note that on this example day the AOD is high, and the differences in characteristic HCHO profile height are considerable. The quite linear relationship between NO 2 and AOD, and HCHO and AOD illustrate that trace gas emissions are often accompanied by aerosol emissions. From that perspective the flattening of the curves for high AODs (mainly in summer and autumn) is remarkable. Possibly this is related to aerosols from natural sources (dust), emissions of which do not go along with emissions of trace gases. Another explanation would be that high AODs cause systematic underestimation of the tropospheric column density, but the flattening of the curves is not seen in winter and spring, even for higher AODs.

Profile heights
Figures 12 and 13 show for methods A and B monthly median values of characteristic heights, with a distinction between retrievals before 10:00 a.m. (red) and retrievals after 12:00 p.m. (blue). For all three species and both retrieval methods, we find higher profiles in the afternoon than in the morning, especially with method B. Only for H HCHO 75 obtained with method A are the differences between morning and afternoon negligible. This is most likely an artefact, which is also seen in Fig. 14, and which is discussed in Sect. 4.2. The morning-afternoon differences found in all other cases are qualitatively in agreement with the expected diurnal variation in the mixing layer height, and provide a first-order check to see if the algorithms behave as expected. Variability per month and between months is however much larger with method B. Highest monthly median characteristic profile heights are found with method B for aerosol extinction profiles in summer. This is in agreement with the general expectation that mixing layers are more shallow in winter and grow deeper in summer (see e.g. Luo et al., 2014). That this effect is weaker for NO 2 might be related to the shorter lifetime of NO 2 in summer which limits the effective transport of NO 2 from the surface to the higher parts of the mixing layer, see also Halla et al. (2011) and Mendolia et al. (2013). Figure 14 and Table 3 show that the agreement between the two methods in terms of profile heights is considerably lower than the agreement in terms of column densities, which is to be expected because the information content of MAX-DOAS with respect to the vertical distribution is limited. The best agreement is found again for NO 2 , with for the profile heights correlation 0.76, slope 1.44, intercept −0.20 km, and mean relative difference 6.51 %. The standard deviation of relative differences is high: 33.18 %. It can be seen in Fig. 14 that the dynamic range of NO 2 profile heights found with method A is somewhat lower than with method B. In particular, the fraction of profiles with height above 1 km is significantly higher with method B. For the HCHO profile height we have correlation 0.62. This is quite surprising because the dynamic range of profile heights found with method A is very small compared to method B and this also explains the exceptional slope (7.47) and intercept (−5.33 km). Even though no independent data are available, it is quite safe to conclude that this very limited dynamic range is unrealistic, and therefore these HCHO pro-files should be used and interpreted with great care. As a result of this effect, it is difficult to judge the quality of the HCHO profile heights obtained with method B. One can see that here the dynamic range is comparable to that of NO 2 and aerosols, but the mode of the histogram has shifted to higher altitudes compared to NO 2 . Also for aerosols (Vis) the dynamic range found with method A is limited compared to method B. The correlation (0.62) is somewhat lower than for NO 2 and HCHO, but the slope and intercept are less extreme than for HCHO (2.82 and −1.31 respectively). Figure 15 demonstrates that in terms of the other estimator of the average profile height (H post s ) the agreement between methods A and B is better. Especially for HCHO, the slope is less extreme for H post s than for H 75 which is in line with the higher dynamic range seen for method A in Fig. 15    A different view on the quality of the profile height retrieval obtained with both methods is given by Fig. 16. From the left panel we can conclude that for both methods the internal consistency (UV vs. Vis) of aerosol extinction profile heights below 1.5 km is quite good, especially for method A. Above 1.5 km we have only very few cases with method A, and all of these cases show a strong bias between UV and Vis. For method B the bias appears to be quite constant over the entire range, with UV profiles that are approximately 25 % lower than profiles in the Vis. The middle panel of Fig. 16 shows a comparison of NO 2 and aerosol extinction profiles. In contrast to aerosols, we do not expect a strong agreement beforehand. What we hope to see, and this is partially the case, is that the general pattern is similar for both methods. Below 1.5 km the agreement is remarkably good, and this is certainly a confirmation that the results obtained with both methods make some sense. As mentioned before, the limited dynamical range of method A makes it almost impossible to draw conclusions on the reliability of profile heights above 1.5 km found with method B. Nevertheless, a possible expla-  . The three rows refer NO 2 , HCHO and aerosols respectively. Left column: frequency distributions obtained with method A (blue) and B (red). Second column: frequency distribution of relative differences (B minus A). Lines in orange indicate the quartiles. nation for the bias between NO 2 and aerosol extinction profile heights in this regime is the same as mentioned earlier: higher aerosol extinction profiles occur in summer, but then the lifetime of NO 2 can be very short, which leads to more shallow NO 2 profiles. Figure 17 shows for different seasons the characteristic aerosol extinction, NO 2 and HCHO profile heights as a function of the AOD. For aerosol extinction profile heights, we see a much stronger seasonal cycle with method B than with method A. In principle a seasonal cycle is also expected: higher boundary layers occur in summer, when the thermal convection is strongest. A possible interpretation of the results seen on the top row (decline of H aer 75 with increasing Method A Method B Figure 16. Three binned scatter plots: aerosol profile heights retrieved in the Vis, vs. profile heights retrieved in the UV (left), aerosol profile height retrieved in the Vis vs. NO 2 profile heights (middle), aerosol profile heights retrieved in the UV vs. HCHO profile heights (right). Note that models A and B have very different frequency distributions of characteristic profile heights for the three constituents, see Fig. 14.
AOD) is that growth of the boundary layer through convection is weakened by the presence of high aerosol loadings (see also Barbaro et al., 2013). Without independent simultaneous observations with other techniques, it can however not be excluded that this effect is related to the measurement technique itself (i.e. a retrieval artefact). Method B shows a weaker seasonal variation in NO 2 than in aerosol extinction profile heights and highest NO 2 profiles occur in spring. This might be due to the fact that in spring the NO 2 lifetime is not as short as in summer (allowing more time for vertical transport), whereas at the same time vertical transport through convection is stronger than in winter. Results for HCHO are more difficult to interpret. Because the lifetime is longer than for NO 2 , and because formaldehyde sources can be biogenic and anthropogenic (the relative contribution varies by season) the profile shapes can be very different from those of NO 2 . A quantity that is especially important in the context of satellite validation and satellite retrievals is the relative difference in NO 2 and aerosol extinction profile height. The impact of the relative characteristic profile heights on the slant column density measurement can be high, and lead to systematic biases if not accounted for in the retrieval. This quantity is shown for both methods as a function of season in Fig. 18 (also for HCHO). Similar as for the characteristic heights themselves, we see in Fig. 18 a higher dynamic range for method B than for method A. This is partly explained by the lower stability of method B, but also by the ability to retrieve a wider range of profile heights. Both methods detect in spring higher characteristic aerosol than NO 2 profile heights. In summer method B finds systematically higher values for H aer 75 − H NO 2 75 than method A. In winter and autumn, the systematic bias between H aer 75 and H NO 2 75 is smaller. As argued above, results for HCHO are more difficult to interpret (because of the artefact affecting the retrieval with method A). However, based on the results obtained with method B it appears as if aerosol extinction profiles are higher than HCHO profiles in spring and summer, and lower in fall and winter.

Aerosol extinction and trace gas volume mixing ratios near the surface
Seasonal variations of volume mixing ratios and aerosol extinction near the surface are shown in Figs. 19 and 20 for methods A and B respectively. For NO 2 a systematic difference is seen between morning and afternoon values, and this is clearly related to the dynamics of the mixing layer. For aerosols a similar effect is found. For HCHO however, this contrast is almost absent. This is related to fact that HCHO profiles shapes retrieved with method A show almost no deviation from the a priori (Sect. 3.2.2). As a consequence, the main driver of the surface concentration is the tropospheric column density of HCHO. This explains why for HCHO retrieval with method A the seasonal variation in volume mixing ratios is so similar to the seasonal variation in column densities. For method B (not shown) the results are quite different in winter months, when morning values are about three to four times higher than afternoon values of the HCHO volume mixing ratio. In summer months, this effect appears to be less pronounced, unlike for NO 2 . It is difficult to draw conclusions based on method B only, but this weaker diurnal variation in HCHO surface volume mixing ratios compared to winter could indicate that in summer local emissions on the surface have a relatively small impact. Based on this data set only, it can however not be excluded that absence of a strong morning-afternoon contrast for HCHO volume mixing ratios in summer is an artefact of the retrieval. ure 21 shows the results of the comparison of methods A and B in terms of trace gas volume mixing ratios and aerosol extinction near the surface (lowest profile layer). In contrast with the results found for the profile heights, the agreement is reasonable, with quite similar histograms for all three constituents. Nevertheless, the systematic relative differences are considerable. For NO 2 we have a mean relative difference of −7.92 % with a standard deviation of 32.97 %. For HCHO the relative differences are larger (−20.70, ±27.71 %) which is mostly explained by the differences in profile shape, because in terms of column densities the relative difference is smaller and of opposite sign (9.43 %). With respect to aerosol extinction near the surface, the agreement between methods A and B is good, with correlation 0.93, slope 1.33 and intercept −0.08. The mean relative difference is however considerable (10.27 %) and the standard deviation of relative differences is high: 30.43 %.

Discussion
In this section we address the main question of this paper: what can be concluded on the quality of aerosol extinction and trace gas profiles retrieved from MAX-DOAS observations. We begin with a discussion of strength and weaknesses of both profile retrieval methods, draw conclusions, and then give recommendations for improvements and use.

Impact of quality control
The ideal selection of high-quality data for this comparison study would be based on a validated cloud screening method which performs well under a wide range of aerosol conditions. Such a method was not available when this study was started (in the meantime promising results have been pub-lished by Wagner et al., 2014, andGielen et al., 2014). Therefore a pragmatic approach was chosen, see Sect. 2.5. A disadvantage of this approach is that a high number of retrievals is rejected. For example, there are many cases where the trace gas retrieval is rejected (despite a proper χ 2 ν ) because the χ 2 ν in the aerosol extinction retrieval is not sufficiently low. The criterion used might be more appropriate for a quality control intended for profiles -and for that reason it is used in this work -but it is probably too strict for a quality control intended for column densities only. Several tests have been performed to check the robustness of findings reported in this paper after changing the selection criteria. For example, the criterion on χ 2 ν has been relaxed to χ 2 ν < 5 and the number of AERONET observations in the same hour is lowered from 3 or more to 2 or more. This leads roughly to two times more aerosol extinction profile pairs (see second column of Table 3) and roughly two and a half and three times more profile pairs for HCHO and NO 2 respectively. The impact of these relaxed settings is considerable for the aerosol extinction retrieval (e.g. mean relative difference in H aer 75 increases from 2.74 to 12.35 %), but quite small for the trace gas retrieval. For example, the mean relative difference in H NO 2 75 increases from 8.71 to 9.85 %, and the mean relative difference in the volume mixing ratio for NO 2 decreases from −7.92 to −4.4 %, which is a small change compared to the standard deviation (32.97 %). There are no sign changes for quantities in Table 3 that are significantly different from zero. It should be noted that the results for the aerosol extinction retrieval obtained with these relaxed constraints are clearly considered to be less representative for ideal clear sky conditions. With every set of quality criteria, the results presented here will change slightly (largely due to a different the sampling of the full data set), however the settings used here are considered to be a reasonable balance between maintain- ing sufficient data pairs and rejection of data pairs which are likely to be affected through clouds.

Strength and weaknesses
In Sect. 3.2 it was shown that both methods show good agreement in terms of tropospheric NO 2 and HCHO column densities: the correlation is high, the slopes of linear fit are close to 1 and the intercepts are relatively close to zero. The agreement of characteristic profile heights is reasonable for NO 2 and aerosols, despite clear biases, especially above 1.5 km. The main strength of method A is its robustness (stability). This is a clear advantage especially when differential slant column densities are close to the detection limit, or when the assumptions that are made about fixed parameters (or cloudfree conditions) do not hold. In such cases, the retrieval can rely on the a priori. The characteristic profile height retrieval with method B is only stable under cloud-free conditions, and if assumptions about fixed parameters are not too far from the truth. The number of profiles which passes the quality control (Sect. 2.5) for method B is significantly smaller than for method A. A disadvantage of method A is that the combination of a profile parameterization based on 13 layers and a relatively low information content of the MAX-DOAS observations forces one to take measures to stabilize the retrieval. These measures are: (1) a relatively conservative estimate of the a priori error (for each profile layer 20 % of the a priori profile estimate) and (2) a profile which decreases to zero rapidly above 1 km. A consequence of this approach is that the absolute values of the a priori error estimate become very low above 1 km. This is believed to be the main reason why it is almost impossible to retrieve profiles with a characteristic height (H 75 ) much higher than that of the a priori. In most cases sufficient agreement between observations and simulations can be achieved by modifying the profile shape (compared to the a priori) only below 1.0 km. This explains why for retrieved NO 2 profiles reduction of H 75 compared to the a priori is seen much more often than increase. This is not seen for HCHO. For HCHO it appears that the information content is too low to obtain profiles (with method A) which deviate much from the a priori. A strong aspect of method B is that it can realize a high range of quite different profile shapes, with just a few free parameters. It can more easily realize profiles which have a characteristic height (H 75 ) well above 1.0 km. In this study, it is however not possible to fully judge the quality of these profiles because these cannot be retrieved with method A. Nevertheless, the monthly averaged morning to afternoon difference in profile height and the seasonal cycle of aerosol extinction profile heights (Fig. 13) correspond to the expected behaviour and this is at least an indirect indication of the quality of the profiles obtained with method B. Independent (e.g. lidar) observations at the same measurement site would be needed to say more about the quality of individual profiles.
This study also makes clear that the main disadvantage of method B is its instability, despite the limited number of free parameters and the ensemble approach. Note however that the retrieval is certainly not always unstable, see for example the retrievals in the Vis on 15 May 2012 (Fig. 7). The advantage of the ensemble approach taken with method B is that most often the instabilities go along with high-uncertainty estimates, and this provides a means for additional quality control. Unlikely retrievals with a low-uncertainty estimate occur also, but these can most often be excluded based on high values for χ 2 , either in the aerosol or in the trace gas part of the retrieval.
Atmos. Meas. Tech., 8, 941-963, 2015 www.atmos-meas-tech.net/8/941/2015/ Figure 21. Statistics of near-surface concentration retrievals. The three rows refer to volume mixing ratios of NO 2 (row 1) and HCHO (row 2) and aerosol extinction (row 3). Left column: frequency distributions obtained with method A (blue) and B (red). Second column: frequency distribution of relative differences (B minus A). Lines in orange indicate the quartiles. Column 3: relative differences sorted as a function of the volume mixing ratio (rows 1 and 2) and aerosol extinction (row 3). Column 4: relative difference sorted as a function of H 75 where the three bins refer to the lowest 25 %, middle 50 % and highest 25 % respectively.

Recommendations for algorithm improvements and further validation
Both profile retrieval algorithms have specific strengths and weaknesses, as described above. The challenge for improved algorithms is to combine the stability and precision of method A with the ability of method B to retrieve a high dynamic range of characteristic profile heights. A possible but not so practical solution could be to use method B to obtain an initial estimate of the a priori scaling height for method A and then as a next step to perform a retrieval with method A. This will work however only under strict cloudfree conditions because of the limitations for method B. Also it would transfer the impact of instabilities (for individual cases) from method B to method A. An alternative is to use the profile parameterization of method B in the framework of the optimal estimation method. Such a retrieval algorithm could be better capable of retrieving a wide range of profile heights and at the same time be more stable than the present implementation of method B. This would also lead to an algorithm which is considerably faster because there would be no need for an ensemble approach. Improving the stability of the retrieval by making use of a priori data (in combination with the optimal estimation approach) brings a certain risk, which is that systematic biases in the a priori climatology remain present in the a posteriori climatology. An advantage of more simple retrieval schemes (e.g. method B) is that they are predominantly driven by the observations themselves and therefore less prone to inheritance of systematic biases in the a priori, despite a low precision. It is almost impossible to make a choice which combines the best of both worlds: a very stable retrieval (i.e. precision of individual profiles) without introducing systematic biases in a climatological sense. Stability is important for comparison with satellite observations if the number of available cases is very limited; accuracy over a wide range of profile heights is important if MAX-DOAS would be used to provide a climatology of profile heights for better a priori estimates in the satellite retrieval. The recent work by Hartl and Wenig (2013) provides indications that the Phillips-Tikhonov regularization method can be used for MAX-DOAS profile retrievals which are more stable and at the same time (potentially) less biased in a climatological sense. To our best knowledge, their method has not yet been applied to a long data set of real observations with a similar focus on the ability to retrieve accurate (first order) profile height estimates. The present study has demonstrated the benefit of having a large data set covering a wide range of measurement conditions. Based on a small data set it would have been very difficult to entangle differences in accuracy and precision. More thorough validation requires simultaneous co-located observations with other techniques (lidar, NO 2 sonde). Such validation efforts are especially useful if a sufficiently large data set is available. In the presence of large differences in spatial representativity (this is very different for satellite, MAX-DOAS and in situ techniques) and a high variability in possible NO 2 and aerosol extinction profile shapes, it is almost impossible to draw conclusions about the accuracy of MAX-DOAS profile shapes based on a quite limited number of co-located observations, even if the precision and accuracy of the other techniques are high.

Summary and conclusions
A 4-year data set of MAX-DOAS observations in the Beijing area is analysed with two different methods for the retrieval of tropospheric NO 2 , HCHO and aerosol extinction profiles. The objective of this study is firstly to assess for each constituent (NO 2 , HCHO, aerosols) and retrieval quantity (AOD or tropospheric column density, characteristic profile height (H 75 ), aerosol extinction or surface concentration) the mutual consistency of the retrievals with both methods, and secondly to identify the mechanisms causing the differences. The two profile retrieval methods differ in many respects. Method A uses a profile parameterization with 13 layers (up to 4 km), on-line forward simulations with the LI-DORT radiative transfer model and an inversion based on optimal estimation. Method B uses a profile parameterization based on 2 to 4 parameters to describe the profile shape and a look-up table created with the DAK radiative transfer model. The inversion is based on a least-squares minimization and an ensemble approach is used to improve stability of the solutions and to estimate uncertainties. In the following we summarize the results of the comparison, first in a qualitative sense, then quantitatively. The strength of method A is the stability of the profile shape retrieval, even under cloudy conditions, which is a consequence of the relatively conservative estimate of the uncertainty of the a priori profile. The choice for stability is advantageous for the retrieval of tropospheric column densities and volume mixing ratios near the surface. A negative side effect of this conservative estimate of the uncertainty of the a priori appears to be that the retrieved characteristic profile heights have a relatively small dynamic range. This is most evident for the HCHO profiles retrieved in the UV, but also for aerosol extinction and NO 2 retrieved in the Vis. Method B is generally less stable, and this affects the precision of individual retrievals. The tropospheric column density is least sensitive to instabilities in the profile retrieval, whereas the characteristic profile height and volume mixing ratio near the surface are most sensitive. The most pronounced difference with method A is the higher dynamic range of retrieved profile heights for aerosols, HCHO and NO 2 . Although the higher dynamic range is partly a consequence of the instability of the retrieval (and therefore not necessarily meaningful), diurnal and seasonal patterns that show up after averaging many profiles give some confidence that the retrievals are meaningful. For example, we see low characteristic profile heights in the morning, and higher values in the afternoon, especially for aerosol extinction in summer. This can be related to the periodical cycles of the boundary layer. Also we find in spring and summer lower aerosol extinction profile heights with decreasing aerosol optical thickness. Although it cannot be excluded that this is a retrieval artefact, this might also be real (and therefore add to the credibility of method B), namely that higher aerosol loads reduce the thermal convection in the boundary layer and therefore lead to lower aerosol extinction profile heights. More quantitatively, we find best agreement for the tropospheric NO 2 column densities (correlation 0.99), with almost no systematic bias (slope 1.03, intercept −0.56×10 15 molec cm −2 ) and comparatively small relative differences (mean 0.25 % and standard deviation 9.3 %). For formaldehyde column densities we find a high correlation (0.95) and slope close to one (1.02), but also find that method B is systematically higher than method A: mean relative difference is 9.4 % and standard deviation of relative differences is 12.1 %. Relative differences in formaldehyde column densities are found to be related to differences in profile height: overestimations of the tropospheric column density (for method B compared to method A) often correspond to overestimations of the characteristic profile height (for method B compared to method A). Volume mixing ratios near the surface are systematically lower for method B compared to method A: ∼ 8 % relative difference for NO 2 and ∼ 21 % for HCHO. The differences can again be related to the differences in profile heights between methods A and B. The standard deviation of relative differences of surface volume mixing ratios is much higher than for tropospheric column densities: 33 % for NO 2 and 28 % for HCHO. Characteristic profile heights are systematically higher for method B than for method A. The mean relative differences are 6.5 % for NO 2 , 15.7 % for HCHO and 2.7 % for aerosols (Vis). The high standard deviation of relative differences (33, 37 and 44 % for NO 2 , HCHO and aerosols respectively) shows that the precision of characteristic profile heights is low. We find with method B that in spring and summer aerosol extinction profiles are systematically higher than NO 2 profiles. Also we find that in winter and summer mornings HCHO profiles are systematically higher than aerosol extinction profiles, and vice versa in summer afternoons. Note however that these findings are only indicative, because the limitations with method A prevent confirmation of the results obtained with method B. Altogether, Atmos. Meas. Tech., 8, 941-963, 2015 www.atmos-meas-tech.net/8/941/2015/ this study gives some indications about the quality of tropospheric column densities, surface concentrations and profile heights retrieved with MAX-DOAS. Since this study is based solely on MAX-DOAS observations the scope is limited and a more thorough validation is needed. In order to obtain robust validation results which can entangle differences related to accuracy and/or precision for a wide range of pollution and sky conditions, it is recommended to station MAX-DOAS instruments close to continuously monitoring surface in situ monitors (e.g. for NO 2 ), sun photometers and lidars which are sufficiently sensitive to boundary layer aerosols.