TROPOMI/S5P formaldehyde validation using an extensive network of ground-based FTIR stations

TROPOMI (the TROPOspheric Monitoring Instrument), on-board the Sentinel-5 Precursor satellite, has been monitoring the Earth’s atmosphere since October 2017, with an unprecedented horizontal resolution (initially 7x3.5 km, upgraded to 5.5x3.5 km since August 2019). Monitoring air quality is one of the main objectives of TROPOMI, with the measurements of important pollutants such as nitrogen dioxide, carbon monoxide, and formaldehyde (HCHO). In this paper we assess the quality of the latest HCHO TROPOMI products (version 1.1.[5-7]), using ground-based solar-absorption FTIR (Fourier Trans5 form Infrared) measurements of HCHO from twenty-five stations around the world, including high, mid, and low latitude sites. Most of these stations are part of the Network for the Detection of Atmospheric Composition Change (NDACC), and they provide a wide range of observation conditions from very clean remote sites to those with high HCHO levels from anthropogenic


Introduction
TROPOMI (the TROPOspheric Monitoring Instrument), on-board the Sentinel-5 Precursor (S5P) satellite, has been monitoring the column amounts of atmospheric constituents since October 2017, at the unprecedented horizontal resolution of 7x3.5 km 2 , upgraded to 5.5x3.5 km 2 since August 2019. This huge amount of data, delivered to the public and the scientific community, represents a big step to improve our knowledge of chemical and dynamical processes in the atmosphere. It is crucial to validate 30 the quality of these new satellite data to trust and benefit their scientific exploitation. This paper focuses on the first quality assessment of the latest publicly available TROPOMI HCHO data products (v.1.1. [5][6][7]).
to 2019-12-31). The version numbers and their dates of change are given in Table 1, and further details are given in the Readme file 1 . The Near-Real-Time (NRTI) product, for the same versions 1.1. [5][6][7], started in December 2018 up to December 2019 (last access). This product has also been validated, but the results being very similar to the RPRO+OFFL validation, we do not show them in details in this paper. The S5P HCHO retrieval algorithm is based on the DOAS method, and is directly inherited from the OMI QA4ECV product 5 retrieval algorithm (https://doi.org/10.18758/71021031). It consists in a 3-step method (slant column retrieval, air mass factor calculation, and conversion to tropospheric column), fully described in De Smedt et al. (2018). The retrieval of the slant columns (N s ) is performed in the UV part of the spectra (in TROPOMI channel 3), in a fitting interval of 328.5-359 nm. The HCHO cross-section is from Meller and Moortgat (2000). Together with the HCHO cross-section, the absorptions of NO 2 , BrO, O 3 (at two temperatures) and O 4 are fitted. A Ring cross-section and two pseudo-cross sections to account for non-linear 10 O 3 absorption effects are also included in the fit. References are given in De Smedt et al. (2018). All cross-sections have been pre-convolved for every row separately with an instrumental slit function adjusted just after launch. The DOAS reference spectrum is updated daily with an average of Earth radiances selected in the Equatorial Pacific region on the previous day. The result of the fit is therefore a differential slant column, showing increases over continental sources compared to the remote background. The conversion from slant to tropospheric columns (N v ) is performed using a look up table of vertically resolved 15 air mass factors (M ) calculated at 340 nm with the radiative transfer model VLIDORT v2.6 (Spurr, 2008). Parameters for each ground pixel are the observation geometry, the surface elevation and reflectivity, including the clouds (that are treated as reflecting surfaces), and a priori tropospheric profiles. The surface albedo is taken from the monthly OMI albedo climatology (minimum Lambertian equivalent reflectivity, (Kleipool et al., 2008)) at the spatial resolution of 1 • x1 • . A priori vertical profiles are specified using the TM5-MP daily forecast, at the same spatial resolution (Williams et al., 2017). Cloud properties 20 are provided by the S5P operational product in its CRB mode (Cloud as Reflecting Boundary, Loyola et al. (2018)). A cloud correction based on the independent pixel approximation (Boersma et al., 2004) is applied for cloud fractions larger than 0.1. In order to correct for any remaining global offset and stripes, a background correction is applied based on HCHO slant columns from the 5 previous days in the Pacific Ocean (N (s,0) ), as described in De Smedt et al. (2018). Finally, the background vertical 1 http://www.tropomi.eu/sites/default/files/files/publicSentinel-5P-Formaldehyde-Readme_20191213.pdf column of HCHO, due to methane oxidation, is taken from the TM5 model in the reference region (N CT M (v,0) ). The equation of the tropospheric HCHO vertical column can be written as follows: with M 0 the average of the air mass factors M of the slant columns selected in the reference sector, the Pacific Ocean (N (s,0) ).
Several diagnostic variables are provided together with the measurements. Quality assurance values (QA) are defined to perform a quick selection of the observations. QA>0.5 filters out most observations presenting an error flag or a solar zenith angle (SZA) larger than 70 • , a cloud radiance fraction at 340 nm larger than 0.6 or an air mass factor smaller than 0.1. The product Readme file reports that in the current version, the QA values are not always correctly set over snow/ice regions or 10 above 75 • of SZA. They also need to be further checked over cloudy scenes. In the forthcoming S5P version 2, QA values will be refined, and will exclude data with surface albedo larger than 0.2 and snow/ice warning, and remaining SZA larger than 75 • .
The tropospheric column uncertainty is divided into random (precision) and systematic components (accuracy), and is provided per pixel. It varies with the observation conditions. Over remote regions at moderate solar zenith angle, the precision of an individual observation is about 5x10 15 molec/cm 2 . This value agrees with the standard deviation of the columns in the same 15 region for a particular day. The random uncertainty is dominated by the random error on the slant columns. The tropospheric column accuracy is the combined systematic uncertainty resulting from the slant column, the air mass factor and the background correction errors. It varies between 30 and 60% of the columns. The column averaging kernel and the a priori profiles are provided for every observation. 20 We show in Fig. 1 a map of the ground-based FTIR stations used in this TROPOMI validation. The background image represents the global TROPOMI monthly mean tropospheric columns for September 2018, illustrating the different HCHO levels sampled by the ground-based network: from clean Arctic and oceanic sites to very high-concentrations sites such as Porto Velho, in the Amazon basin. Table 2 lists the ground-based FTIR stations, their coordinates and altitude, the spectrometer type, the retrieval code, and the 25 team involved in the measurements and/or the retrievals of HCHO. For more details on the monitoring of FTIR solar absorption spectra at these stations, we refer to Vigouroux et al. (2018) and references therein, and, for the FTIR retrieval principles, to e.g. Vigouroux et al. (2009).

Ground-based FTIR HCHO data
The same retrieval settings are used at all the stations to avoid introducing possible bias in the HCHO total columns between the stations and inconsistent comparisons with the satellite. Details are given in Vigouroux et al. (2018). The main settings that 30 might be responsible for internal biases within the network are the spectroscopic database and the fitted spectral windows, the spectroscopic parameters being the main source of the FTIR HCHO systematic uncertainties. The HCHO spectral signatures  (Rothman et al., 2013) for HCHO, which used the work of Jacquemart et al. (2010).

5
The retrieval codes used in the FTIR NDACC community are PROFITT9 (Hase et al., 2006) and SFIT4.0.9.4 (updated from SFIT2 (Pougatchev et al., 1995)), which are both based on the optimal estimation method (Rodgers, 2000). A past comparison exercise has shown a very good agreement between the retrieved products obtained with these two codes (Hase et al., 2004).
Based on a priori profile information (from the model WACCM, Garcia et al. (2007)), and a L1 Tikhonov regularization matrix (Tikhonov, 1963), low vertical resolution profiles can be retrieved in principle, as well as total columns. However, as described 10 in Vigouroux et al. (2018), the degrees of freedom for signal are very low for HCHO (median value of 1.1 for all FTIR sites), meaning that we essentially have one piece of information. The FTIR total column averaging kernel shows a decrease of the sensitivity at the surface, which is quite similar to the TROPOMI sensitivity. This can be seen in Fig. 2, as an example for the Maïdo station. We also show in Fig. 2 the FTIR a priori profile at Maïdo, which is based on a climatology (1980-2020) from the WACCM model calculated at Maïdo. A single profile is used for the whole time series at a specific station (Vigouroux et al.,15 2018), while TROPOMI uses daily a priori profiles from TM5 (Sect. 2). An example is shown in Fig. 2 for the 18th January 2019. The FTIR uncertainty budget is calculated following the formalism of Rodgers (2000) and is described in Vigouroux et al. (2018). It is separated into random and systematic components. The random uncertainty is dominated at all sites by the measurement noise uncertainty, which can vary from site to site depending on the spectrometer. The uncertainty on the retrieved FTIR total columns for individual sites is given in Vigouroux et al. (2018)  Tsukuba, Palau, and Xianghe, respectively.
The forward model parameters median systematic uncertainty on the HCHO FTIR total columns is 13% in the network described by Vigouroux et al. (2018). As already mentioned, the dominating systematic uncertainty sources are the spectroscopic parameters: the line intensities and the pressure broadening coefficients of the fitted HCHO absorption lines. We use 10% for the three parameters: the line intensity, and the air-and self-broadening coefficients. The systematic uncertainty can 10 be larger (up to 21-26%) at the stations using the PROFFIT9 retrieval code, due to an assumed uncertainty on the channeling that is not taken into account yet in the SFIT4 code. However, this channeling uncertainty can also be negligible at some sites (it depends on each instrument), and more investigation is needed at each station to avoid its under-or over-estimation. The median smoothing systematic uncertainty is 3.4%. For the five added sites, the median total systematic uncertainty is 13% (Jungfraujoch, Tsukuba, Palau), or 14% (Rikubetsu, Xianghe), commensurate with the other sites.

Collocation criteria
The precision of a single pixel TROPOMI HCHO measurement is expected to be below 1.2 ×10 16 molec/cm 2 (pre-launch requirements) or even better, as 5 ×10 15 molec/cm 2 for remote areas (after launch uncertainty analysis, see Sect. 2). These values are quite large compared to the measured levels of HCHO (around 1.5 ×10 15 molec/cm 2 for very clean sites to e.g. 20 around 9 ×10 15 molec/cm 2 for a city such Paris). It is therefore necessary to average several pixels in order to reduce the random uncertainty of the TROPOMI mean HCHO data, improve the detection level and increase the TROPOMI sensitivity to day-to-day variability. For this reason, we choose to average the TROPOMI pixels located within 20 km from the FTIR station.
Once we filter out the TROPOMI pixels that do not reach the recommended quality criteria (QA flag > 0.5; see Sect. 2), we obtain a median value of 34 pixels to average. In cloudy conditions, this number can be smaller. A collocation pair is kept when at least 10 pixels can be averaged. Higher number of pixels can be averaged for Arctic stations (around 45-60), which is useful due to the very low HCHO levels to be detected there. At sub-tropical / tropical stations, the median number of pixels is around 20-29. The higher number of pixels in Arctic is due to the fact that each FTIR measurement is co-located to all S5P 5 pixels that match the co-location criteria, even if these pixels originate from different orbits, with different overpass times.
Before choosing the 20 km collocation criteria, we have tested several distances (10, 20, 30, 40, and 50 km). The 10 km criterion was discarded because of the poor remaining coincidences leading to less robust statistics. The 20 to 50 km criteria give similar biases between TROPOMI and FTIR. The standard deviations of the comparisons usually decrease slightly with increasing collocation distance due to a smaller TROPOMI random uncertainty (more pixels to average), except at the most 10 polluted sites. However, the ratio between the standard deviations and the random uncertainty budgets is increasing with the collocation distance at all sites, pointing to an increased random error due to the collocation. We therefore choose the 20 km distance to reduce the random spatial collocation error.
The time coincidence criterion is set to ±3 hours. This choice is a compromise to obtain significant number of coincidences between TROPOMI and FTIR data, noting that the median FTIR measurement frequency is 5 per day (with a range of 3 to 15 10 depending on the station increased number of pixels (improved TROPOMI precision on the mean) in the 6 h collocation, mainly at Arctic sites with increased number of multiple orbits. Despite the smaller standard deviations usually obtained within a 6 h criterion, we finally choose 3 h to reduce the possible impact of some passing plumes and of the HCHO diurnal cycle on the comparisons. The diurnal cycle at most of the FTIR stations can be found in Vigouroux et al. (2018) and its Supplement. At many stations no significant diurnal cycle was observed but, in some cases, mainly polluted sites, we obtained a maximum around noon-1 p.m., 25 close to the TROPOMI overpass time. At the Mexico City station, where the diurnal cycle amplitude is the greatest, the effect of collocation time (6 h vs 3 h) on the statistical bias is 4%.

Building inter-comparable products
Some manipulation of the original data products is needed before looking at the differences between TROPOMI and FTIR data.
Both measurements provide total columns (for FTIR) or tropospheric columns (for TROPOMI) that have a lower sensitivity 30 near the ground (see Fig. 2), and their retrievals use a priori profile information that have been chosen differently (TROPOMI: daily a priori profiles from TM5; FTIR: single a priori profile from climatology of WACCM). To correct for this, for each S5P individual pixel collocated with each FTIR measurement, we use the comparison method described in Rodgers and Connor (2003). First, the a priori substitution is applied, using the S5P a priori profile x S,a as the common a priori profile. For this, the S5P a priori profile is regridded to the FTIR retrieval grid (x S,a/F ) using a mass conservation algorithm (Langerock et al., 2015). In the rare situation where the satellite pixel elevation is above the FTIR site, the S5P a priori profile is extended to the FTIR instrument's altitude. The regridded S5P a priori x S,a/F is then substituted following Rodgers and Connor (2003), and we finally use the corrected FTIR retrieved profile x ′ F in the comparisons: 5 where x F is the original FTIR retrieved profile, A F is the FTIR averaging kernel matrix, I is the unit matrix, and x F,a is the FTIR a priori profile.
The next step, following Rodgers and Connor (2003), is to smooth the corrected FTIR profile with the S5P column averaging kernel a S . For that purpose we regrid the corrected FTIR profile x ′ F to the S5P column averaging kernel grid (x ′ F/S ) and apply the smoothing equation: with c S,a the S5P a priori column derived from the S5P a priori profile. We obtain a smoothed FTIR column c smoo F associated with a collocated TROPOMI pixel. In the case of mountain sites where the pixel altitude is below the instrument's height, the regridding of the FTIR profile x ′ F/S is done such that the FTIR profile is extended with the S5P a priori profile (such an extension is invariant under the latter smoothing equation). Note that this FTIR regridding to the satellite grid has also the 15 advantage that only the FTIR profile up to the altitude of the satellite product (which is only a tropospheric column) remains in the regridded column: we therefore finally compare tropospheric columns in both products.
Next, we need to take into account that, for mountain stations, the difference between satellite columns and the original ground-based columns can be significant. To bring both satellite and smoothed FTIR column c smoo F (which is calculated as a column valid at the satellite's pixel surface) values to the scale of the original FTIR columns, we apply a scaling factor 20 f representative for the fraction of the partial column between the satellite pixel altitude and the FTIR station altitude. This scaling factor is derived from the satellite a priori profile and is defined as: where c ∆z S,a denotes the partial column derived from the S5P a priori profile between the pixel surface and the FTIR station. The TROPOMI column c S and its random and systematic uncertainties are also scaled with the same factor, so that finally the 25 collocated products are all expressed at the altitude of the FTIR site (and not of the pixel surface). For mountain stations, the scaling factor f , calculated for each satellite's pixel, can reach a minimum of 0.5 for stations located at about 2 km altitude from the satellite's pixel surface (Maïdo, Izaña, or Altzomoni), or even 0.3 at the higher sites Jungfraujoch and Zugspitze, while at sea-level sites it is of course close to 1.0. In the rare cases where the satellite pixel is above the FTIR station, we apply the conversion factor f = 1 + c ∆z S,a /c S,a , where the satellite a priori profile is extrapolated to the station surface in order to calculate the partial column of the a priori between both altitudes. The final step is to average the individual smoothed and scaled FTIR columns c smoo F × f that are taken within 3 h, and the TROPOMI c S × f individual pixel columns that are available within 20 km (which can belong to different orbits), to form the collocated pair FTIR i and TROP i used in the next section.

Estimation of the TROPOMI accuracy and precision
In Sect. 5.1, we assess whether the TROPOMI accuracy is compliant with pre-launch requirements (40-80%, as reported in the ESA official document S5P-RS-ESA-SY-164, 2014, Table 3, p. 19). The accuracy of the TROPOMI HCHO measurements will be estimated by deriving the median of the relative differences (BIAS) between the collocated TROP i and the reference FTIR i data at each station: We can note that the applied scaling factor f (see previous section) does not affect the BIAS estimation, even at high mountains stations, because it cancels in the division.
For robust statistics, the median is preferred to the mean due to the presence of outliers (a few remaining TROPOMI outliers after the QA filter, and some very small FTIR values that give very large relative difference after the division in Eq. 5). The 15 presence of TROPOMI outliers is minimized by using the median, but they should be ideally removed by the QA filter. An improvement of the QA value is foreseen in the next product version, which should improve, e.g., the filtering at Arctic sites (SZA>75 • ).
In the next section, we also compare the obtained BIAS with the systematic uncertainty on the difference σ syst , to evaluate the TROPOMI uncertainty budget: where σ S,syst is the systematic uncertainty of TROPOMI columns, as provided in the public release database (but scaled for altitude, see Sect. 4.2), a S is the TROPOMI total column averaging kernel, and S F,syst is the FTIR systematic covariance matrix provided in vmr 2 in the standardized GEOMS format converted in partial columns units. The last term is the impact of different low vertical resolution profile measurements (the smoothing error) on the comparisons (see Eq. 27 in Rodgers and Connor 25 (2003)), where for the systematic uncertainty part, we account for possible bias on x S,a by following von Clarmann (2014): The x S,a − < x > is not known and we follow Vigouroux et al. (2018), with x S,a − < x >=-50%, -20%, -10%, +10%, +8%, +5% for the ground-4 km; 4-8 km; 8-13 km; 13-25 km; 25-40 km; 40-120 km layers, respectively (expressed in molec/cm 2 ).
The last term of Eq. 6 is found to be of the order of a few percent, therefore negligible in σ syst . In practice, the systematic 30 uncertainty on the difference σ syst is dominated by the TROPOMI systematic uncertainty of about 40%, FTIR having a median systematic uncertainty of only 13% with a maximum of 26% (See Sect. 3).
Similarly, the precision of the TROPOMI HCHO products is estimated in Sect. 5.2, not with the usual standard deviation which is not robust in case of outliers, but with the median absolute deviation (MAD, see Huber (1981)) of the differences where k = 1.4826 for a correspondence with the 1-σ standard deviation for normal distribution without outliers.

5
In Sect. 5.2, we compare the obtained MAD to the random uncertainty on the differences σ rand , which is calculated following Rodgers and Connor (2003): where where σ S,rand is the random uncertainty of TROPOMI columns, as provided in the public release database (but scaled for altitude, see Sect. 4.2), S F,rand is the FTIR random covariance matrix, and S var,rand , to take into account the impact 10 of low vertical resolution in the random part of the uncertainty, is the natural variability matrix chosen to be 50%, 50%, 40%, 35%, 30%, 30%, 10% for the ground-4 km; 4-8 km; 8-13 km; 13-25 km; 25-40 km; 40-120 km layers, respectively (expressed in molec/cm 2 ). As for the systematic uncertainty part, the random uncertainty on the difference is dominated by the TROPOMI random uncertainty (median of about 1.1 ×10 15 molec/cm 2 for TROP i within 20 km), while FTIR i has a median random uncertainty of 2.0 ×10 14 molec/cm 2 . The last term of Eq. 8 is comparable to the FTIR one (median value of 2.4 15 ×10 14 molec/cm 2 ). We can use MAD as an upper limit of the TROPOMI precision, since collocation in space and time of the sounded air-masses are never perfect. It is compared in the next section to the pre-launch precision requirement. The MAD estimation is influenced by the scaling factor f , which is important only for high altitude sites (Sect. 4.2). It should be interpreted as an estimation of the precision of a TROPOMI column that would be measured at the altitude of the FTIR site. The random uncertainty on the 20 differences are also expressed at the altitude of the FTIR site, so that the comparison between MAD and σ rand is always valid.
The observed BIAS between TROPOMI and the reference FTIR data is statistically significant if it exceeds its statistical error: ERR B = 2 × MAD/ √ n (with n the number of coincidences).

Validation results
In this section, we provide a table and plots for the offline (RPRO+OFFL) HCHO TROPOMI product. We do not show detailed 25 results for the near real time (NRTI) product (version 1.1. [5-7]) because they are very similar to the offline version. Numbers for the main conclusions will be given in the text for this NRTI product.

TROPOMI observed BIAS and accuracy
In Table 3, we provide, at each individual FTIR station, the mean of the FTIR HCHO total columns (mean FTIR), the obtained median of the relative differences BIAS (in % to compare with the pre-launch TROPOMI accuracy requirements of 40-80%, Eq. 5), the error on the BIAS (ERR B ), and the number of collocated pairs n. The systematic uncertainty on a single difference is also given (in %, calculated from Eq. 6 where each term has been expressed in %, dividing by each instrument individual HCHO column).
We have ordered the stations, not in decreasing latitudes as in Table 2, but in increasing mean HCHO FTIR columns.
The reason is that we observe a tendency of the BIAS between TROPOMI and FTIR: while the BIAS is always (with the 5 exception of Eureka) positive or not significant (if BIAS<ERR B ) for very clean to clean sites with HCHO mean levels lower than 6.5×10 15 molec/cm 2 , it is negative and very consistent for the stations with higher HCHO levels, ranging from 8.7 to 28.6×10 15 molec/cm 2 (-29 to -36 %) with small error on the bias (2 to 6 %). Note that the BIAS is also consistent at Paramaribo (-26%) but with larger error (14%), due to small number of collocations. This dependence of the TROPOMI bias on the HCHO concentration levels can be visualized in Fig. 3, where the BIAS at each station is plotted as a function of the mean FTIR 10 columns. It is therefore not appropriate to use the median bias obtained using the data from all stations together ( The different TROPOMI BIAS at different HCHO levels is pointing to the presence of two kinds of bias: a constant one and a proportional one. They can be obtained by using the scatter plot of the two instruments, shown in Fig. 4: the constant bias is the intercept of the linear relationship between TROPOMI and FTIR, while the proportional bias is given by its slope. But this has to be done carefully: a usual linear regression by ordinary least squares (OLS) is not statistically robust and can give spurious results in the presence of outliers and/or heteroscedasticity. We are confronted to both problems in our scatter plot: we 25 do have outliers and the uncertainty is increasing with HCHO levels. Therefore, we use the robust Theil-Sen estimator (Sen, 1968) where the slope s of the scatter plot is the median of the slopes of the lines through all pairs of data points (TROP j - is then the median of (TROP i − s×FTIR i ). Using this robust estimator, we obtain the relation: TROP = 0.64 × FTIR + 1.10×10 15 molec/cm 2 . We have calculated the uncertainties in s and b using 2×MAD/ √ n, with MAD the median absolute deviation of the slopes and intercepts of the pairs of data points, and n 30 the numbers of pairs. We obtain an uncertainty of 0.03 and 0.05×10 15 molec/cm 2 for s and b respectively. Therefore, both the constant (1.10±0.05×10 15 molec/cm 2 ) and proportional (0.64±0.03%) biases are significant.
Using the scatter plot to derive the constant and proportional biases is very sensitive to the range of observed values. As an example, if one would only use HCHO FTIR data >8.5×10 15 molec/cm 2 , one would obtain a slope of 0.51 and an intercept of 3.2×10 15 molec/cm 2 ), which would point to a strong overestimation and underestimation of the constant and proportional Table 3. Validation of TROPOMI RPRO+OFFL. Please note that the ordering of the sites is by increasing mean HCHO column. For each station: mean of the HCHO FTIR total columns (in molec/cm 2 ), median of the relative differences BIAS=med((TROPi-FTIRi)/FTIRi) and its error ERRB (in %, see text), number of collocated pairs n, systematic uncertainty on a single difference σsyst (in %, Eq. 6), median absolute deviation (MAD, in molec/cm 2 , Eq. 7), random uncertainty on a single difference σ rand (in molec/cm 2 , Eq. 8), and pre-launch TROPOMI precision requirements associated to the choice of 20 km around the station Requ=1.2×10 16 / √ n pix molec/cm 2 , with npix the mean number of pixels averaged in the collocated TROPOMI data. The Pearson correlation coefficient R is given for individual coincidences (±3 h) and for monthly means of coincident data.  The BIAS given in Table 3 are a combination of the constant and proportional biases, and can be use to statistically assess the TROPOMI HCHO overall accuracy. We can easily see from Table 3 that all BIAS values are within the upper limit of the 5 pre-launch requirement of 80%, and they are within the 40% requirement lower limit for 20 of the 25 stations. The five stations exceeding a 40% BIAS are clean (Arctic or mountains) sites, with mean HCHO columns below 2.5×10 15 molec/cm 2 . But these are sites where the systematic uncertainty on the differences (see Table 3 and Eq. 6) are usually also the largest, leading to a good correspondence between observed higher BIAS and higher calculated uncertainty for 3 of these 5 stations (Zugspitze, Mauna Loa, and Kiruna). 10 Therefore, we can conclude that the TROPOMI HCHO accuracy satisfies the pre-launch requirements and that the systematic uncertainty budget is in very good agreement with observed bias, except at a very few stations (Ny-Ålesund 43>41%, Altzomoni 71>42%, and Porto Velho 36>31%). At most of the other stations, the reported systematic uncertainty tends to be larger than the BIAS. We find the same conclusions on TROPOMI accuracy when making comparisons with the NRTI products.

5
We can list some known difficulties of the satellite product: -The negative bias over high HCHO levels sites (biomass burning or mega-cities) could be due to aerosol effects. There is no plan to include a correction for aerosols in the operational product, but specific studies are foreseen to check its impact in a scientific product.
-The positive bias over clean polar sites could be due to the solar zenith angle (SZA) dependency of the slant columns 10 fit results (because of spectral interferences with ozone and BrO). As explained in Sect. 2, the QA values need to be improved at large SZA, which is foreseen in the next version.
-The current albedo climatology is too coarse for TROPOMI, which could be especially a problem for polar, mountain or coastal sites. A climatology based on TROPOMI measurements is under development.
-It is also foreseen to test a regional model at higher spatial resolution for an improvement of the a priori HCHO profiles. This should improve the TROPOMI retrieved product, especially at polluted sites. However, the validation presented here is already taking the a priori information and averaging kernels into account. We therefore do not expect an important effect of the improved a priori profiles on the validation results.

5
For discussing the observed TROPOMI precision, we provide in Table 3, the MAD for each station (in absolute value to compare with the pre-launch precision requirement of 1.2×10 16 molec/cm 2 for a single pixel, Eq. 7). The precision pre-launch requirement is provided at each site taking into account the mean number of pixels n pix involved in the collocated TROPOMI data (Requ.=1.2×10 16 molec/cm 2 / √ n pix ). We see that for all the cleanest sites (<2.5×10 15 molec/cm 2 ), where an additional collocation uncertainty is expected to be small, the MAD is well within the pre-launch requirements. The MAD for these 10 cleanest sites has a median of 1.3×10 15 molec/cm 2 , and a minimum of 0.9×10 15 molec/cm 2 . This is a good estimate of the precision that TROPOMI can reach in remote conditions. For a single pixel, the TROPOMI best precision at remote conditions is therefore 5-8×10 15 molec/cm 2 .
It must be noted that the pre-launch HCHO precision requirements were chosen based on pre-launch requirements for the instrument signal to noise ratio (equivalent to OMI). The actual signal to noise of the measurements appears to be better than 15 the requirements, especially in the HCHO wavelength fitting range. Furthermore, the good quality of the recorded spectra allowed to increase the size of the TROPOMI HCHO fitting spectral interval just after launch, further improving the precision of the slant columns. Indeed, as seen in Table 3, only at the three highest HCHO levels sites (Xianghe, Mexico City, and Porto Velho) the provided random uncertainties are as high as the pre-launch requirements. The actual provided random uncertainty are smaller, and we can see that, even for clean sites, the observed MAD is larger than the random uncertainty on the differences 20 by a factor of 1.6. This factor increases up to 1.8 if we take into account all the stations, but this is expected due to a collocation uncertainty that should have more impact at high-levels sites (the factor rises up to 2.3 for high HCHO levels >8.0×10 15 molec/cm 2 ). Our comparisons suggest that the TROPOMI random uncertainty is underestimated by at least a factor of 1.6 and up to maximum of 2.3 (if one would assume the collocation uncertainty to be smaller than the TROPOMI uncertainty). This underestimation could be due to the fact that currently the uncertainties associated to the air mass factor calculation and to the 25 background correction step are assumed to be fully systematic. The discrimination between random and systematic part of the uncertainties might be refined in the future, based on such validation results.

Observed TROPOMI monthly variability
The Pearson correlation coefficient is very good for the collocated monthly means of TROPOMI and FTIR data (0.91, see Table 3 and Fig. 4), and is usually good for individual sites. However, Pearson correlation is not robust and can give a wrong con-30 clusion when only few data are coincident, especially when outliers are present. We have 17 months of coincident TROPOMI and FTIR measurements in the best cases, while only 4 for the newest stations Palau and Porto Velho. We therefore verify that the TROPOMI precision allows the seasonal variability to be well captured, even at very clean sites which can be at the limit of the satellite detection, by plotting the individual monthly mean time-series in Fig. 5.
The seasonal variability, with a maximum in July-August, is well observed at all the Arctic sites (Eureka, Ny-Ålesund, Thule, Kiruna, and Sodankylä). The monthly mean correlation is better than 0.69, except at Eureka and Ny-Ålesund. It can be seen in Fig. 5 that Sept. 2019 is very high in TROPOMI data at Ny-Ålesund, and only 1 coincidence is found for this month.

5
Removing this last outlier gives a 0.76 correlation coefficient at this station. The northern mid-latitude clean sites (mountains: Jungfraujoch, Zugspitze, Izaña) also display a seasonal variability in very good agreement, with correlation coefficients higher than 0.70. The Japanese clean site Rikubetsu shows poorer correlation (0.60) but only few data are in coincidence. The stations where we find the poorer correlations are the oceanic sites. The poorest one is Mauna Loa, but this is expected due to the very small seasonal variability there, associated to a small number of coincidences. A similar situation is observed at the other recent  Table 3).

15
The higher HCHO level sites show a TROPOMI seasonal variability in very good agreement with FTIR, with correlation larger than 0.90 for Boulder, Wollongong, Toronto, Xianghe, and Porto Velho. At Tsukuba, removing the outlier of 1 coincidence in November 2018 increases the correlation to 0.93. The poorest correlation (0.14) is found at the coastal site Paramaribo, where usually only one coincidence per month is found. Looking at the highest HCHO level sites, these monthly mean timeseries also confirm that TROPOMI has more difficulty to reproduce the months with the highest enhancements, which is re-20 sponsible for the significant negative bias (-31%) found in the previous section for high HCHO levels (>8.0×10 15 molec/cm 2 ).

Conclusions
We have used a network of twenty-five FTIR stations, most of them affiliated to NDACC, to validate the latest TROPOMI HCHO tropospheric columns (v.1.1. [5][6][7]  both constant and proportional biases, was also recently observed (although with less FTIR sites involved) in another nadir satellite product, the formic acid observed by IASI (Supporting Information in Franco et al. (2020)). The NDACC FTIR network, which covers a large number of atmospheric species at wide ranges of concentrations, is a powerful source of reference data to detect such nadir satellites' biases.
Although significant, the observed overestimation and underestimation of TROPOMI are within the lower limits of the pre- count aerosol effects over polluted sites, improving the QA values at high SZA, and using an albedo climatology and a priori HCHO profiles at the TROPOMI spatial resolution. Except for the former, these improvements are foreseen in next versions of operational TROPOMI data.
The precision of TROPOMI OFFL products is estimated by the median absolute deviation (MAD) at the clean sites, where the collocation effect is expected to be small. For FTIR HCHO levels lower than 2.5×10 15 molec/cm 2 , the MAD is 1.3 ×10 15 5 molec/cm 2 , corresponding to a single pixel precision of 7×10 15 molec/cm 2 (5 to 8×10 15 at individual sites), which is well below the pre-launch precision requirements of 1.2×10 16 molec/cm 2 . However, the provided TROPOMI random uncertainties (after launch) were indeed found to be better than the pre-launch requirements, but they are too small by a factor of 1.6 compared to the MAD at the clean sites. There is a factor of 2.3 difference between MAD and the random uncertainty on the comparisons (dominated by TROPOMI random uncertainty) at the high-level sites, where an additional effect of collocation 10 might take a role as well. The underestimation of the TROPOMI random uncertainty could be due to a random effect of the uncertainty associated to the air mass factor calculation that is not currently included in the budget. This would also explain a larger underestimation of random error at high-levels sites (factor 2.3 vs 1.6 at clean sites). Also, a systematic uncertainty component on a short-term (so not included in the TROPOMI random uncertainty) can have a random effect on our longer-term comparisons. 15 We have shown that the TROPOMI data capture very well the HCHO seasonal variability, even at very clean sites. The Pearson correlation coefficient for monthly mean coincident data is 0.91. Although we have found room for a refinement of the TROPOMI random uncertainty estimation and for an improvement of the QA values for a better filtering of the remaining few outliers and negative columns (exceeding the expected statistical distribution), this validation work has demonstrated the very good quality of the TROPOMI HCHO product, which is well within the pre-launch requirements for both accuracy and 5 precision. This work has also shown the high value of the FTIR HCHO network, providing harmonized and well-characterized data covering a wide range of HCHO columns. These ground-based FTIR data are continuously extended by new measurements and will be used in the coming years for the routine S5P validation within the ESA dedicated validation server (https://mpcvdaf-server.tropomi.eu/). The FTIR network will also be used in the near future for the validation of previous satellite missions such as OMI or GOME-2. New FTIR measurements are continuously performed and can be used in the coming years for the  January 2020) depending on each PI decision. Please pay attention to the NDACC data policy. The whole data set used in this publication can be provided upon request to Corinne Vigouroux (corinne.vigouroux@aeronomie.be) and data per station can be requested from the individual PIs.

25
Author contribution: Corinne Vigouroux and Bavo Langerock performed the validation using HCHO TROPOMI and FTIR data at all sites. They are also involved in the FTIR measurements at Maïdo and Porto Velho. Corinne Vigouroux analyzed the Maïdo, Porto Velho, Sodankylä and Xiangue data. Isabelle De Smedt is the TROPOMI HCHO product lead and participated in the paper (Section 2 and discussions). Michel Van Roozendael and Zhibin Cheng have also responsibility for the TROPOMI HCHO prototype algorithm and operational processor, respectively. Gaia Pinardi was involved in the validation method section 30 through her expertise in validation using UV-Visible techniques, which is part of the projects TROVA and TROVA-2 that funded this work. All other co-authors provided the FTIR HCHO data for the station(s) they are responsible for.

atmos-chem-phys-discuss.net/acp-2019-1117/). Could we valuable to add it to the list of aircraft based validation efforts?
Indeed. This reference has been added in the manuscript.

TROPOMI HCHO data.
The description of TROPOMI data and versions is very complete but after reading this section the question remains, off all the options (RPRO, OFFL and NRTI) which one has been used? If several depending on the station and the period of time, that should also be explained?
The text in our AMTD version was: "At the time of writing this paper, the latest product versions 1.1. Indeed, the referee is right: it is not clear in this TROPOMI section which products are used in this paper (RPRO + OFFL, or NRTI). Actually, we performed the validation on the two sets of data. But, in this paper the tables and figures focus on the RPRO+OFFL data set. The NRTI validation results are so similar that we preferred avoiding giving details on them. We only give a summary of the NRTI biases in Sect. 5.1.
At all sites, the TROPOMI data set that we used is a combination of RPRO and OFFL products, from v.1.1.5 to 1.1.7, the versions 5 to 7 being consistent retrieved HCHO products. Indeed, the number of version corresponds to different period of time, but we did not find relevant to detail them since the products are consistent among these versions. The details of the dates are in the Readme file (more precisely in its Table 2) for which we gave the reference. For the referee and readers convenience, we provide them here, and we will repeat them in a Table in the next version  of the manuscript: - From 2018-05-14 to 2018-11-28 : RPRO v.1.1.5 -From 2018-11-28 to 2019-03-28 : OFFL v.1.1.5 -From 2019-03-28 to 2019-04-23 : OFFL v.1.1.6 -From 2019 The validation of the NRTI products (results only summarized in one sentence in Sect. 5.1) is using: We have also repeated in the new table (on request of referee#2), the information on the differences between the versions that is in the ReadMe file.
We have added to the manuscript (in italic): "At the time of writing this paper, the latest product versions 1. Given the unprecedented TROPOMI spatial resolution, the surface elevation could play a bigger role while explaining biases for some locations with complicated topography. What is the source of TROPOMI surface elevation information?
Yes, we agree that topography could play a significant role if not taken into account carefully, both for the quality of the product, and for the comparison between satellite and ground-based quantities. However, we considered it in both cases. For S5P L2 products, the digital elevation map is from GMTED2010 (Danielson et al., 2011, and an average over the ground pixel area is considered. Furthermore, as explained in the HCHO the Algorithm Theoretical Basis Document (ATBD, De Smedt et al. 2018): "To reduce the errors associated to topography and the lower spatial resolution of the model compared to the TROPOMI 3.5x7 km2 spatial resolution, the a priori profiles need to be rescaled to effective surface elevation of the satellite pixel. The TM5 surface pressure is converted by applying the hypsometric equation and the assumption that temperature changes linearly with height" Finally, as described in Sect.4.2, the different elevation between the altitude of the ground-based station and the surface elevation of the satellite pixel is taken into account. We believe that the positive bias usually observed at mountain stations is related to the constant bias of TROPOMI for small HCHO columns, because it is also observed at clean sites that have an altitude close to sea level (Kiruna, Ny-Alesund).
Together with the HCHO cross-section, the absorptions of NO2, BrO, O3 (at two temperatures) and O4 are fitted. A Ring cross-section and two pseudo-cross sections to account for non-linear O3 absorption effects are also included in the fit. References are given in De Smedt et al. (2018). This more detailed description has been added in the new manuscript.
The operational algorithm does not have the capability to fit directly the slit functions, it has to be done offline. Up to now, the TROPOMI slit functions have been stable. No update of the preconvolved cross-sections are planned, but this is monitored.

Page 4, line 20. How is M0 calculated? Is it the average of the AMFs of the slant columns considered in the calculation of N(s,0)?
Yes; M0 is an average of the air mass factors (M) of the slant columns selected in the reference sector, the Pacific Ocean (N(s,0)). This has been added in the text.

Ground-based FTIR HCHO data
Figure 1 caption could be expanded to provide some information about the spatial resolution of the averaged TROPOMI data shown. What kind of averaging algorithm was used to generate the background data?
The spatial resolution used for this map is 0.2°x0.2°. We use the HARP v1.5 tool, which can be found at https://atmospherictoolbox.org. This information has been added in the Fig.1 caption, as suggested by the referee. Done, as suggested.
Page 7, line 25: Please clarify, it looks like if stations using the PROFFIT9 retrieval code can have bigger systematic uncertainty due to uncertainty on the channeling that is not taken into account yet in the SFIT4 code. If the SFIT4 code is not taking this channeling uncertainty in the budget it just means that is introducing a systematic error for those stations?
The channeling is due to (possible) imperfections in the instrument that may (or may not) lead to artefacts in the interferogram. This error is included in the PROFFIT9 code, and not yet in SFIT4. However, at present the fact that there is or not a channeling in the spectra at each station (it is obviously depending on each instrument) has not been measured at each site. Such an exercise has been initiated after the Vigouroux et al. (2018) paper for a set of stations (by T. Blumenstock, KIT, co-author of the present paper), but has not been done at each site systematically. For the sites that have been tested, we found that a non-negligible channeling is indeed present at some sites, but not at all sites. Therefore, introducing such an additional error in the theoretical calculation without knowing if it is indeed present may also lead to an overestimation of the systematic uncertainty. In the next update version of SFIT4, the random and systematic error on the target species due to channeling will be included, but its correct estimation would be possible only at the sites where the channeling itself is estimated. This is an on-going work within the IRWG (InfraRed Working Group) of NDACC.
In the present validation, the systematic bias between TROPOMI and FTIR stations are very consistent among the stations (see Fig. 3), except for Eureka which is the only clean site with a negative TROPOMI BIAS. However, Eureka was one of the sites participating on the channeling exercise, and the channeling was found very small for this instrument. So the channeling error is not explaining the different bias there. For the other stations, the good consistency of the TROPOMI BIAS at the different stations (which depends on the HCHO levels, and not on individual sites), shows that the BIAS is dominated by the TROPOMI systematic error, and that the channeling one should have a smaller impact.
To clarify that the channeling is not always under-estimated in the SFIT4 stations, and can be over-estimated in some PROFFIT4 stations, we have adapted the text: "The systematic uncertainty can be larger (up to 21-26%) at the stations using the PROFFIT9 retrieval code, due to an assumed uncertainty on the channeling that is not taken into account yet in the SFIT4 code. However, this channeling uncertainty can also be negligible at some sites (it depends on each instrument), and more investigation is needed at each station to avoid its underor over-estimation."

Page 8, line 3: Why the smoothing systematic uncertainty (on the total column) is significantly bigger for the 5 added sites?
We think the referee has misinterpreted the sentence. The 13% and 14% for the 5 added sites, are for the total systematic uncertainty (dominated by the spectroscopy), and not for the smoothing part only. To avoid the confusion, we have changed the sentence to : "For the five added sites, the median total systematic uncertainty is 13% (Jungfraujoch, Tsukuba, Palau), or 14% (Rikubetsu, Xianghe), commensurate with the other sites."

Collocation criteria
What is the effect of reducing/increasing the TROPOMI/FTIR collocation radius (currently set at 20km)? Is there a radius threshold/range where no improvement is achieved in the comparisons?
Before choosing the 20km collocation radius, we have indeed tested several distances: 10,20,30,40, and 50 km. We provide in this discussion a plot of the median relative differences (bias) at each station ( Fig.1) for the different collocation distances. Please, note that the numbers are not the same as in the AMTD paper, because this work on collocation distances were made in the course of the project (not at the time of writing the paper), so the time-series were shorter, and the collocated time was 6h (now it is set to 3h). We see in Fig. 1, usually similar biases for the 20 to 50 km criteria, especially for mid-HCHO levels sites. For clean sites, we observe usually slightly smaller biases with the 30km criteria than with the 20km one. For the most polluted sites, UNAM (Mexico City) and Porto Velho, the bias is increasing with the distance. The 10km collocation leads to more than twice less coincidences (at some stations, even 5 times less).
Therefore, the median biases obtained with this criterion were less robust, and the 10km choice was discarded. The median biases, being usually similar using the different collocation distances, were not so useful to determine our choice of collocation. We therefore looked at the MAD (median absolute deviation, see Eq. 6 for complete definition) to help for the choice. Figure 2 shows the MAD at each station for the different collocation distance. We see from the figure that usually the MAD is decreasing with the distance increasing, except at a few cases (the polluted cases as expected: Porto Velho, UNAM=Mexico City,…). However, we cannot conclude that the comparisons are "improved": indeed, while the MAD is decreasing due to the averaging of more TROPOMI pixels, the random uncertainties of the comparisons are also decreasing. In a world where the random error would be perfectly determined, we would have a constant ratio MAD / RandErr (no dependence on the collocation distance), equal to 1 if there is no collocation effect (so expected to be 1 at clean sites). If we plot this ratio (Figure 3), we see that it is increasing with the distance, pointing to an additional random error due to the collocation. Figure 3: The ratio between the MAD and the random uncertainty on the differences between TROPOMI and FTIR.
As no clear threshold provides an improvement of the comparisons, we therefore decided to use the 20km collocation choice, a good compromise between the number of coincidences, and the best correspondence between MAD and random uncertainty budget. It also avoids an increasing MAD over the highest HCHO level sites (UNAM, Porto Velho).
In the new manuscript, we summarize this study by adding the following text: "Before choosing the 20 km collocation criteria, we have tested several distances (10,20,30,40, and 50 km). The 10 km criterion was discarded because of the poor remaining coincidences leading to less robust statistics. The 20 to 50 km criteria give similar biases between TROPOMI and FTIR. The standard deviations of the comparisons usually decrease slightly with increasing collocation distance due to a smaller TROPOMI random uncertainty (more pixels to average), except at the most polluted sites. However, the ratio between the standard deviations and the random uncertainty budgets is increasing with the collocation distance at all sites, pointing to an increased random error due to the collocation. We therefore choose the 20~km distance to reduce the random spatial collocation error." For each station, after co-adding, what is the median TROPOMI detection limit and random uncertainty? That will be an interesting fact to know In Table 2 of the AMTD paper, we give rand for each station. This value is the random uncertainty on the differences between TROPOMI and FTIR. It is fully defined by Eq. 7. In the text (Sect 4.3), we explain that since the other terms of Eq.7 are much smaller, rand is dominated by the TROPOMI random error budget S,rand. Therefore, the rand in Table 2 is in first approximation the number that the referee is asking (~TROPOMI random uncertainty, S,rand). Please note that there was an error in AMTD version in Eq. 7: the matrix for FTIR random uncertainty was called Ss,rand instead of SF,rand. It is now corrected.
Then, the detection limit is usually defined as 3*S,rand, so it is easily determined at each station from Table 2, by approximating S,rand with the provided rand, and multiplying by 3. For all stations together, we obtain 3.6x10 15 molec/cm 2 as the TROPOMI detection limit (for an average of about 34 pixels), so 2.1x10 16 molec/cm 2 for a single pixel.

Building inter-comparable products
Equation 2 could have dimensions problem: aS SP5 averaging Kernel is defined on the S5P vertical grid according to line 16 page 9 while x'F and xS,a are defined on the FTIR vertical grid.
Actually, we said in the text above Eq. 2, that x'F has been regridded to the satellite grid before applying Eq. 2. But the referee is right that this is not clear enough because we kept the same name for x'F and xS,a in both grids (to try to have a small number of variable names). So, for clarity, we now introduce different names for different grids: we call now xS,a the S5P a priori on the original satellite grid and keep x'F the FTIR profile on original FTIR grid, and we call xS,a/F the S5P a priori profile regridded to the FTIR grid, and x'F/S the FTIR profile regridded to the satellite grid.
The new text becomes: "First, the a priori substitution is applied, using the S5P a priori profile as the common a priori profile. For this, the S5P a priori profile xS,a is regridded to the FTIR retrieval grid (xS,a/F) using a mass conservation algorithm (Langerock et al., 2015). In the rare situation where the satellite pixel elevation is above the FTIR site, the S5P a priori profile is extended to the FTIR instrument's altitude. The regridded S5P a priori xS,a/F is then substituted following Rodgers and Connor (2003), and we finally use the corrected FTIR retrieved profile x′F in the comparisons: x′F=xF+ (AF−I)(xF,a−xS,a/F), where …" And also below: "For that purpose we regrid the corrected FTIR profile x′F to the S5P column averaging kernel grid (x'F/S) and apply the smoothing equation: cF smoo =cS,a+aS(x′F/S−xS,a) with cS,a the S5P a priori column derived from the S5P a priori profile. We obtain a smoothed FTIR column cF smoo associated with a collocated TROPOMI pixel. In the case of mountain sites where the pixel altitude is below the instrument's height, the regridding of the FTIR profile x′F/S is done…"

Validation results
As mentioned above, including a table showing the period of time each one of the products (RPRO, OFFL) has been used in the calculations will assure full reproducibility of the results shown.
We followed the referee's suggestion by adding such a Table (now Table 1).

TROPOMI observed BIAS and accuracy
Page 12, line 10: This sentence is confusing "...it is negative for higher levels and very consistent for the stations from 8.7 to 28.6 x 10 15 ..."This is my interpretation "...it is negative and very consistent for stations with higher levels, ranging from 8.7 to 28.6 x 10 15 ..." but maybe is the HCHO level what is 8.7 to 28.6 x 10 15 .
Page 12, line 10: Lower levels are defined in the abstract and below at page 12, line 21 as 2.5x10 15 molec/cm 2 . What is the meaning of 6.5x10 15 molec/cm 2 .
We meant that the biases were always negative above 6.5x10 15 (including Tsukuba and Bremen), and that they are consistent "only" above 8.7 x10 15 (because the bias at Bremen, -5%, is lower). The 6.5x10 15 limit was appearing in Table 2 (AMTD version) as a limit between positive/non significant trends (below) and always negative trends (above). However, because the limit of 8.0 x10 15 was chosen for the "high levels" median bias calculation, we did not put a separation line at 6.5x10 15 , which seems to be source of confusion. We decided to simplify the sentence as suggested by the referee.

Do the authors suggestions on how to link/explain the constant and proportional biases to different instrumental, algorithm, or geophysical parameters
This validation exercise could not identify a specific problem in the instrument itself or in the satellite retrieval algorithm. We will add the following text to the new manuscript (end of Sect. 5.1) in order to give some possible explanations to the observed biases (that are, however, in agreement with the systematic uncertainty budget).
The systematic uncertainties leading to the observed constant and proportional biases of our study have been calculated as described in Sect. 3 of De Smedt et al. (2018). From the error propagation of the HCHO TROPOMI columns (equation of Nv, in Sect. 2 of our AMTD paper, now numbered Eq.1 in the new manuscript), it can be found that the proportional bias is more likely due to air mass factor (M) uncertainties M, while the constant bias is more likely due to the uncertainties of the slant columns uncertainties N,S and to the uncertainty of the background correction of the slant columns. This can be seen in Eq. 13 of De Smedt et al. (2018), where M is proportional to Ns-Ns,0.
We can list some known difficulties of the satellite product:  The negative bias over high HCHO levels sites (biomass burning or megacities) could be due to aerosol effects. There is no plan to include a correction for aerosols in the operational product, but specific studies are foreseen to check its impact in a scientific product.
and BrO). As explained in the paper, the QA values need to be improved at large SZA, which is foreseen in the next version.  The current albedo climatology is too coarse for TROPOMI, which could be especially a problem for polar, mountain or coastal sites. A climatology based on TROPOMI measurements is under development.  It is also foreseen to test a regional model at higher spatial resolution for an improvement of the a priori HCHO profiles. This should improve the TROPOMI retrieved product, especially at polluted sites. However, the validation presented here is already taking the a priori information and averaging kernels into account. We therefore do not expect an important effect of the improved a priori profiles on the validation results.
In the conclusion, we have added the following summary: Possible improvements in the TROPOMI biases could be achieved by taking into account aerosol effects over polluted sites, improving the QA values at high SZA, and using an albedo climatology and a priori HCHO profiles at the TROPOMI spatial resolution. Except for the former, these improvements are foreseen in next versions of the operational TROPOMI data.