NDACC harmonized formaldehyde time series from 21 FTIR stations covering a wide range of column abundances

Among the more than 20 ground-based FTIR (Fourier transform infrared) stations currently operating around the globe, only a few have provided formaldehyde (HCHO) total column time series until now. Although several independent studies have shown that the FTIR measurements can provide formaldehyde total columns with good precision, the spatial coverage has not been optimal for providing good diagnostics for satellite or model validation. Furthermore, these past studies used different retrieval settings, and biases as large as 50 % can be observed in the HCHO total columns depending on these retrieval choices, which is also a weakness for validation studies combining data from different ground-based stations. For the present work, the HCHO retrieval settings have been optimized based on experience gained from past studies and have been applied consistently at the 21 participating stations. Most of them are either part of the Network for the Detection of Atmospheric Composition Change (NDACC) or under consideration for membership. We provide the harmonized settings and a characterization of the HCHO FTIR products. Depending on the station, the total systematic and random uncertainties of an individual HCHO total column measurement lie between 12 % and 27 % and between 1 Published by Copernicus Publications on behalf of the European Geosciences Union. 5050 C. Vigouroux et al.: Harmonized formaldehyde time series from 21 FTIR stations and 11× 1014 molec cm−2, respectively. The median values among all stations are 13 % and 2.9× 1014 molec cm−2 for the total systematic and random uncertainties. This unprecedented harmonized formaldehyde data set from 21 ground-based FTIR stations is presented and its comparison with a global chemistry transport model shows consistency in absolute values as well as in seasonal cycles. The network covers very different concentration levels of formaldehyde, from very clean levels at the limit of detection (few 1013 molec cm−2) to highly polluted levels (7× 1016 molec cm−2). Because the measurements can be made at any time during daylight, the diurnal cycle can be observed and is found to be significant at many stations. These HCHO time series, some of them starting in the 1990s, are crucial for past and present satellite validation and will be extended in the coming years for the next generation of satellite missions.

Abstract. Among the more than 20 ground-based FTIR (Fourier transform infrared) stations currently operating around the globe, only a few have provided formaldehyde (HCHO) total column time series until now. Although several independent studies have shown that the FTIR measurements can provide formaldehyde total columns with good precision, the spatial coverage has not been optimal for providing good diagnostics for satellite or model validation. Furthermore, these past studies used different retrieval settings, and biases as large as 50 % can be observed in the HCHO total columns depending on these retrieval choices, which is also a weakness for validation studies combining data from different ground-based stations.
For the present work, the HCHO retrieval settings have been optimized based on experience gained from past studies and have been applied consistently at the 21 participating stations. Most of them are either part of the Network for the Detection of Atmospheric Composition Change (NDACC) or under consideration for membership. We provide the harmonized settings and a characterization of the HCHO FTIR products. Depending on the station, the total systematic and random uncertainties of an individual HCHO total column measurement lie between 12 % and 27 % and between 1

Introduction
Through reactions with hydroxyl radical (OH) and NO x (NO + NO 2 ), the volatile organic compounds (VOCs) exert a strong influence on the oxidizing capacity of the atmosphere. These reactions produce ozone and secondary organic aerosols, which affect air quality and global climate. Given their short lifetimes (from a few minutes to a few hours for the most reactive ones, Kesselmeier and Staudt, 1999) and their different sources depending on geographical locations, it is very difficult to derive a global atmospheric burden for most of the VOCs from current measurements. The observation of formaldehyde (HCHO), which is an intermediate product of the degradation of many nonmethane VOCs (NMVOCs) and has a lifetime of only a few hours, allows us to constrain the NMVOCs emissions and to test our understanding of the complex and still uncertain degradation mechanisms of these NMVOCs . The use of satellite HCHO measurements in combination with tropospheric chemistry transport models to derive NMVOCs emissions has been the subject of several past studies (e.g. Palmer et al., 2003;Millet et al., 2008;Stavrakou et al., 2009;Fortems-Cheiney et al., 2012;Barkley et al., 2013;Marais et al., 2014). The past and present HCHO satellite data sets include those from GOME (1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003), SCIAMACHY (2003SCIAMACHY ( -2012, OMI (2004-present), GOME-2A (2006-present), OMPS (2011-present), GOME-2B (2012-present), and very recently TROPOMI (2017present). The NMVOCs emissions derived from top-down approaches using these satellite data sets rely on the accuracy of the measurements. An indirect way to test these accuracies is to compare the emission budgets obtained using two different satellite data sets as in Barkley et al. (2013) for SCIAMACHY and OMI or in Stavrakou et al. (2015) for OMI and GOME-2. While the global emission budgets are in general consistent , there are large differences in the top-down estimates on a regional scale, e.g. differences up to nearly 50 % are observed over Amazonia between SCIAMACHY and OMI (Barkley et al., 2013) and up to nearly 25 % between GOME-2 and OMI . Unambiguously concluding whether these differences are due to biases in the satellite products (due to retrieval settings, vertical sensitivities, horizontal resolution, etc.) or to the diurnal cycle of formaldehyde (SCIAMACHY and GOME-2 measuring in the morning and OMI in the afternoon) requires validation with independent and accurate ground-based measurements (Barkley et al., 2013;De Smedt et al., 2015;Stavrakou et al., 2015).
At present, validation studies of HCHO satellite products have taken place at a few locations only, mainly using aircraft data (Martin et al., 2004;Barkley et al., 2013;Zhu et al., 2016), the MAX-DOAS (Multi-Axis Differential Optical Absorption Spectroscopy) technique (Wittrock et al., 2006;De Smedt et al., 2015) and the FTIR (Fourier transform infrared) technique (Jones et al., 2009;Vigouroux et al., 2009;De Smedt et al., 2015). This is not sufficient to provide a good picture of the accuracy of the satellites, especially given the high geographical variability of formaldehyde. A lot of effort is therefore currently underway to increase the number of ground-based stations providing HCHO data, using the DOAS or the FTIR technique, initiated in view of the TROPOMI Cal/Val activities. This paper presents the work accomplished in this direction using FTIR measurements at most of the NDACC (Network for the Detection of Atmospheric Composition Change) stations, and including some more recent observing stations that will also be part of the NDACC in the near future.
Up to now, time series of HCHO total columns have been studied at only six FTIR stations out of more than 20 FTIR sites currently in operation: Ny-Ålesund (Notholt et al., 1997), Wollongong (Paton-Walsh et al., 2005), Lauder (Jones et al., 2009), Reunion Island (Vigouroux et al., 2009), Eureka (Viatte et al., 2014), and Jungfraujoch (Franco et al., 2015). We note that HCHO has also been measured by the JPL MkIV instrument (Toon, 1991) at various ground-based sites since 1985 (see http://mark4sun.jpl.nasa.gov/ground. html, last access: 5 September 2018), although these data are not used in this work due to their very different acquisition and analysis procedures. The main reasons for having so few FTIR HCHO data available are that (1) it is challenging to find robust retrieval settings for this species that has weak absorption signatures in the infrared, which are, in addition, surrounded by strong lines from interfering gases; (2) HCHO is not part of the NDACC FTIR target species (which are O 3 , HNO 3 , HF, HCl, CO, CH 4 , N 2 O, ClONO 2 , HCN, and C 2 H 6 , publicly available at http://www.ndsc.ncep.noaa.gov/ clickmap/, last access: 5 September 2018). In the above-cited studies, different retrieval settings are used, although the retrieved HCHO total columns can be very sensitive to some of them: e.g. a positive bias of 25 % or even 50 % is found at Reunion Island if the spectral micro-windows of Franco et al. (2015) or Jones et al. (2009) are used, respectively, instead of those from Vigouroux et al. (2009). Although these high biases are consistent with the uncertainty budgets, it is important to facilitate the interpretation of a satellite or model validation to harmonize the settings among the stations. Therefore, in the present work, we have set up common retrieval settings that can be used at any ground-based site, even under very humid conditions or low HCHO concentrations. These settings will be described in Sect. 2 together with a characterization of the retrieved HCHO products, i.e. their averaging kernels and uncertainty budget. The complete time series of HCHO total columns obtained at the 21 participating stations are shown in Sect. 3, as well as the diurnal cycles and a short assessment of the long-term trends. We then use the chemistry transport model IMAGES , which provides data for the 2003-2016 period, to show consistency in our harmonized FTIR data sets: comparisons between FTIR and IMAGES monthly mean time series and seasonal cycle at the 21 stations are presented in Sect. 4.
2 Ground-based FTIR HCHO data: description and characterization 2.1 FTIR HCHO monitoring Table 1 lists the ground-based FTIR stations included in this study, while Fig. 1 shows their geographical distribution. These stations take regular solar absorption measurements under clear-sky conditions, using either the highresolution spectrometers Bruker 120 M, 125 M, 120 HR, and/or 125 HR, which can achieve a spectral resolution of 0.0035 cm −1 or better, or the Bomem DA8, which can achieve a spectral resolution of 0.004 cm −1 . The only lower spectral resolution spectrometer (0.06 cm −1 ) used in this study is the Bruker Vertex at Mexico City. This instrument is not accepted by the NDACC FTIR standards at present; therefore Mexico City is the only site in this study that will not be part of NDACC. The formaldehyde spectral signatures used in groundbased infrared measurements lie in the 3.6 µm region and belong to the ν 1 and ν 5 bands. This implies that, for HCHO, a CaF 2 or KBr beamsplitter and a nitrogen-cooled InSb detector are used together with an optical filter which usually covers the 2400-3310 cm −1 region (called an NDSC-3 filter; see e.g. Senten et al., 2008). At St Petersburg a broader filter is used (1700-3400 cm −1 ). The spectral resolution can be reduced in order to increase the signal-to-noise ratio (SNR). In practice, the spectra used in the present study have a resolution between 0.0035 and 0.009 cm −1 , except for Mexico city (0.075 cm −1 ).
HBr or N 2 O cell measurements are regularly taken to verify the alignment of the instruments. The instrument line shape (ILS) can be obtained by analysing these cell measure- ments using the LINEFIT programme (Hase et al., 1999). This ILS can impact the shape of gas absorption lines, and its determination by LINEFIT can be used as an input parameter in the forward model of the retrieval codes (Sect. 2.2).

Harmonized retrieval strategy
We refer to Pougatchev et al. (1995) and/or Hase et al. (2004) for more details on the FTIR retrieval principles. Total columns of atmospheric gases, but also volume mixing ratio vertical profiles, are obtained from their pressureand temperature-dependent absorption lines. As seen in Table 1, two retrieval algorithms are used in the NDACC FTIR community: PROFITT9 (Hase et al., 2006), and SFIT2 (Pougatchev et al., 1995), which has been updated to SFIT4 09.4.4. It has been demonstrated in Hase et al. (2004) that the profiles and total column amounts retrieved from these two different algorithms under identical conditions are in excellent agreement.
We summarize the forward-model and retrieval parameters that have been harmonized in Table 2. The forward model uses pressure and temperature profiles from NCEP (National Centers for Environmental Prediction) for each site, except that the temporal resolution can vary depending on the retrieval team from daily means, from 6-hourly ones to even hourly interpolated ones.
The dominant source of systematic uncertainty being the spectroscopic parameters, it is crucial that all stations use the same spectroscopic database. We use the compilation from Geoffrey Toon (JPL), the atm16 line list, which is available at http://mark4sun.jpl.nasa.gov/toon/ linelist/linelist.html (last access: 5 September 2018). In this atm16 line list, the HCHO and N 2 O lines correspond to the HITRAN 2012 database (Rothman et al., 2013). This HITRAN 2012 database includes the latest improved HCHO parameters (broadening coefficients, Jacquemart et al., 2010), which complement the release in HITRAN 2008 (Rothman et al., 2009) of new HCHO line intensities from the same group . The spectroscopic parameters for the lines of water vapour and its isotopologues in To avoid any bias between the stations due to different spectroscopic parameters, it is also mandatory to harmonize the spectral micro-windows (MWs) containing the HCHO signatures. The challenge of the HCHO retrievals is that this species has very weak absorption signatures in the infrared (below 1 %), and it is therefore very important to minimize the impact of the interfering gases with more intense signatures, either by avoiding MWs with strong interfering lines when feasible or by including them only if they are very well fitted (e.g. no large residuals remain due to bad spectroscopic or incorrect ILS parameters). In past studies, while the micro-window spectral widths differ, some common HCHO signatures were used: the two more intense signatures at about 2778.5 and 2781.0 cm −1 were used in all previous studies (Notholt et al., 1997;Paton-Walsh et al., 2005;Viatte et al., 2014;Jones et al., 2009;Vigouroux et al., 2009), except in Franco et al. (2015), who discarded the  2781.0 cm −1 signature because of the bad residuals due to poorly fitted CH 4 lines (from HITRAN 2008, Rothman et al., 2009). In Vigouroux et al. (2009), in which HITRAN 2004(Rothman et al., 2005 was used, the MWs containing these two stronger signatures were quite narrow (2778.20-2778.59; 2780.80-2781.15 cm −1 ) in order to minimize residuals due to neighbouring CH 4 lines. With the empirically improved CH 4 spectroscopy in atm16, we can use larger windows (see Table 2 and Fig. 2), with the advantage of fixing the background and the interfering species more, leading to improved precision and accuracy in the HCHO total columns. We keep the two narrow MWs used in Vigouroux et al. (2009) andFranco et al. (2015) at about 2763.5 and 2765.8 cm −1 , which contain less absorption from interfering gases, but the gain in information in degrees of freedom for signal (DOFS; see Rodgers, 2000) is relatively small (0.1-0.2, compared to about 1.0 to 1.5 from the two main windows).
In Fig. 2 we give an example of a spectrum calculated from the retrieval using a spectrum recorded on the 12 February 2014 at Maïdo and corresponding to a retrieved HCHO total column of 2.48 × 10 15 molec cm −2 , a DOFS of 1.1, and a root mean square (rms) of 0.11, which compares well to the mean obtained for all measurements at Maïdo of 2.00 × 10 15 molec cm −2 , 1.2, and 0.12 for columns, DOFS and rms. The corresponding residuals (calculated − observed spectra) are shown in Fig. 3 little improvement seen in this MW is due to the better fitting of the other MWs, which allows better-calculated profiles for all gases. The CH 4 line in MW 3 is poorly fitted using the HITRAN 2012 line list, and the improvement in the atm16 is due to a change in several spectroscopic parameters (line position, line intensity, etc.). The two more intense CH 4 lines in MW 4 have also been improved by using the atm16 line list. However, to further improve the fits, one CH 4 line and one O 3 line were empirically de-weighted (see Table 2). The comparison of these two line lists shows the crucial need for good spectroscopic parameters in order to obtain precise amounts of atmospheric gases. As seen in Fig. 3 (right panel), the residuals are not perfect and there is still room for further improvement in the forward-model parameters. The atm line list created by Geoffrey Toon (JPL) is updated every 4 years, when HITRAN provides a new release, so that when the HI-TRAN line list is improved and provides either similar or better residuals than the atm line list, the empirical parameters of atm are changed by the preferred official database. In SFIT4 and PROFFIT retrieval codes, based on optimal estimation, a priori information (profile and regularization matrix) needs to be provided. In this work, the a priori HCHO profile, as well as all interfering species except water vapour and its isotopologues, were provided for each station from the v4 of the model WACCM (Garcia et al., 2007). A single profile for each species is used in the time series retrievals and corresponds to the mean of the model profiles calculated at each station from 1980 to 2020. For H 2 O and HDO, which have a high atmospheric variability, it is usually preferred (except at the stations Lauder, Mexico City, and Altzomoni) not to use a single a priori profile: for each individual spectrum, the water vapour a priori profiles are taken either from the 6-hourly vertical profiles provided by NCEP or from independent preliminary profile retrievals. The H 2 O absorption being very weak in the chosen MWs and the HDO profile being retrieved simultaneously with HCHO, the impact of using a single a priori profile at the three cited stations is assumed to be small. For the regularization matrix R, we followed Vigouroux et al. (2009) andSussmann et al. (2011) and used ad hoc Tikhonov (Tikhonov, 1963) L1 regularization as described, for example, in Steck (2002), for the reason that we do not have a realistic a priori covariance matrix S var from other measurements sources, especially with good vertical resolution. The regularization matrix R = αL T 1 L 1 is used in most cases for the determination of HCHO low vertical resolution profiles but also for profile retrievals of the interfering species when improvement is observed compared to the fit of a single scaling factor (which is applied to the a priori profiles). This is the case for HDO and CH 4 , for which profile retrievals are made, and at some stations for O 3 . For the stations Kiruna, Izaña, Zugspitze, and Paris, a scaling of HCHO a priori profiles is preferred to a Tikhonov regularization, but due to the low DOFS available for this species (see Sect. 2.3), this has little influence on the retrieved total columns (below 2 % when tested at Maïdo). For the other stations, the α values are site dependent, since they can depend, for example, on the HCHO amounts or the SNR of the spectra. Note that the SNR value may be the "real" one from the inherent noise in each spectrum but can also be chosen to be an "effective" SNR that is used as well as a regularization parameter. This effective SNR is smaller than the real one, since the residuals in a spectral fit do not only come from pure measurement noise but also from uncertainties in the forward-model parameters. The regularization choice (α and SNR if an effective one is used) is made at each station in order to obtain stable retrievals (no overfitting) with a sig-nificant decrease in the residuals (no underfitting), as in the well-known L-curve method (Hansen , 1992).
It is worth noting that another important forward-model parameter is the instrumental line shape (ILS) since it impacts the gases absorption line shapes. The treatment of ILS in the retrievals has not been harmonized yet among the stations because the stability and quality of the alignment is site dependent and/or the instrument's PIs have their own preferences. This is, however, another step toward full harmonization that should be done in the future within NDACC. At present, there are three options for considering the ILS, and we refer to Vigouroux et al. (2015) for more details. In the present work, the NIWA, NCAR and University of Bremen stations use a constant and ideal ILS (both modulation efficiency and phase error); i.e. the spectrometers are perfectly aligned. This is a valid approximation based on a LINEFIT ILS analysis of HBr cell spectra measurements (Sect. 2.1). The IMK-ASF, LERMA and UNAM stations use fixed ILS parameters that are previously retrieved using the cell measurements and the LINEFIT code (Hase et al., 1999). For the other stations, the effective apodization parameter is retrieved simultaneously with the target species profiles, while the phase error parameter is assumed to be ideal.

Characterization: averaging kernels and uncertainty budget
The vertical resolution and sensitivity of the retrieved HCHO products can be characterized by the averaging kernel matrix A (Rodgers, 2000): where K is the weighting function matrix that links the measurement vector y to the state vector x: y = Kx + , with representing the measurement error. In our retrievals, we assume S to be diagonal, with the diagonal elements being the inverse square of the SNR. R is the regularization matrix, which, in this work, has been chosen as the Tikhonov L1 matrix (see Sect. 2.2).
We give the trace of this averaging kernel matrix A for the elements corresponding to the HCHO profiles, called the DOFS, in Table 3 for each station. The DOFS range from 1.0 to 1.6, meaning that we can not provide more than one piece of information on the vertical profile. This is the reason that only total columns of HCHO are discussed in this paper and not vertical profiles. In Fig. 4 we show (upper panels) the averaging kernels (AKs, rows of A) for four different stations, with DOFS ranging from 1 (only scaling) to 1.4. Similar averaging kernels are obtained for the other stations with similar DOFS (not shown). We can observe that, in each case, the AKs peak at about the same altitude (8 km) with full width at half maximum of about 16-18 km, showing that we have limited vertical resolution, and that we are mainly sensitive to the whole troposphere, and to a lesser extent to the lowermost Table 3. Mean of the HCHO total columns (TC) in 10 14 molec cm −2 and degrees of freedom for signal (DOFS) obtained at each FTIR station. The stations with strictly 1 DOFS (Kiruna, Izaña, Zugspitze, and Paris) only make a scaling of the HCHO a priori profile; i.e. no change in the vertical shape of the a priori profile is allowed. We give, in 10 14 molec cm −2 , the mean of (1) the random uncertainties (Rand) that were calculated for each individual HCHO total column (excluding the smoothing part); (2) the smoothing random error (Smoo Rand); (3) the total random error (Total Rand = Rand 2 + Smoo Rand 2 ). We also provide the total random error in % for completeness. We give the mean of the systematic uncertainties in %: first without the smoothing part (Syst), then the smoothing systematic error (Smoo Syst), and the total systematic error (Total Syst = Syst 2 + Smoo Syst 2 ). If the Rodgers and Connor (2003) methodology is used in model-instrument comparisons, only the Rand and Syst uncertainties need to be taken into account (not the total errors). In addition, we provide the mean differences between two subsequent FTIR measurements taken within 30 min (Diff30) in both absolute (10 14 molec cm −2 ) and percent units relative to mean TC. The PROFFIT stations are indicated with ( * ). stratosphere. The total column averaging kernel (TotAK), associated with the FTIR-retrieved total columns, is plotted as well. The associated a priori profiles are also shown in Fig. 4 (lower panels) for completeness, together with the mean and standard deviation of the retrieved profiles. As expected by the low DOFS, the shape of the retrieved profiles is very similar to the shape of the a priori profiles. The uncertainty budget is calculated following the formalism of Rodgers (2000) and can be divided into three different sources: the measurement noise uncertainty (purely random), the forward-model parameter uncertainties (random and systematic), and the smoothing error expressing the uncertainty due to the limited vertical resolution of the retrieval (random and systematic). At each station, the random uncertainty (square root of sum of squares of the measurement noise error and of all the random forward-model errors) and the systematic uncertainty (square root sum of the squares of all systematic errors) are calculated for each single measurement. Except for a few cases (NCAR stations and Wollongong), for which a typical smoothing error is given, and St Petersburg, for which the mean value for 2013 is given, the smoothing uncertainty is also calculated for each individual measurement. In Table 3 we give the mean of the random and systematic uncertainties, the smoothing uncertainties (both random and systematic parts), and the total random or systematic uncertainties (square root sum of the squares of the random or systematic error and the smoothing random or systematic error), obtained using the FTIR complete time series at each station.
The random uncertainty given in Table 3 is dominated at all sites by the measurement noise with an error covariance matrix S n calculated as follows: where S is assumed to be diagonal, with the square of the inverse of the SNR as diagonal elements, and G y denotes the contribution matrix A = G y K. In this calculation of the measurement noise error, the SNR must be the real one from the noise in the spectra and not a regularization one as can be chosen in the retrieval process (as in Eq. 1; see also Sect. 2.2). For the HCHO spectra used in this study, this SNR can vary between 100 for the worst cases and 3000, with a mean of about 700-1000 for the Bruker 120/5 HR instruments and 500 for the Bomem DA8. The forward-model parameters error covariance matrices S f are calculated according to the following: where S b is the covariance matrix of b, the vector of forwardmodel parameters. For each individual forward-model parameter, the K b sensitivity matrix is mostly calculated by using analytic derivatives, while the covariance matrix S b is an estimate of the uncertainty in the model parameter itself. Effort has been made in this study to harmonize the uncertainty budget at all sites. This is done by calculating the errors from the same forward-model parameters (solar zenith angle, temperature, spectroscopic line parameters, baseline, etc.) across the network and by choosing the same S b matrix for relevant parameters (i.e. when they are not site or instrument dependent, e.g. for the spectroscopic line parameters). However, some differences remain between the SFIT4 and PROFFIT codes that result in small differences that still occur between the two groups of users, despite the use of harmonized parameters. For the SFIT4 users, the random uncertainty given in Table 3 is dominated by the measurement noise (Eq. 2). We see from Table 3 that the random error is between 1.0 and 3.6 × 10 14 molec cm −2 for the SFIT4 stations equipped with the high-resolution Bruker spectrometers 120/5 HR or M (the higher values coming from the 120/125 M instruments at Paramaribo and PortoVelho), while it can reach 5.1 × 10 14 molec cm −2 with the Bomem DA8 in Toronto. For the PROFFIT users, the random uncertainty is calculated to be a little bit larger (from 3.5 to 5.3 × 10 14 molec cm −2 ) for the sites with highresolution spectrometers, and 11.1 × 10 14 molec cm −2 with the low-resolution spectrometer Bruker Vertex 80 at Mexico City. The main difference between SFIT4 and PROFFIT is the additional error calculated at the PROFFIT stations due to the channelling of the spectra. However, in Table 3 we also give the mean differences between two subsequent FTIR measurements taken within 30 min (Diff30) as an upper limit of the total random uncertainty: this difference can be larger than the error budget if HCHO has faster variability than 30 min, but with enough statistics, the mean differences should not be lower than the total random errors. We see that this empirical upper estimation of total random uncertainty has a median value (2.8×10 14 molec cm −2 ) very close to the median total random uncertainty obtained by error propagation theory (2.9 × 10 14 molec cm −2 ), which gives confidence in the overall FTIR error estimation. At all the PROFFIT sites, except the highly polluted one (Mexico city), the total random uncertainty is larger than the Diff30, which could be an indicator that the uncertainty calculated in PROFFIT is slightly too conservative, probably due to this additional channelling error that would be estimated to be too large. For SFIT4 users, the Diff30 values are usually close, within 0.5 × 10 14 molec cm −2 of the calculated total random uncertainty, with the exceptions of Ny-Ålesund and Lauder, where the small calculated errors of 1.9 and 1.6 × 10 14 molec cm −2 might be a little bit optimistic, and with the exceptions of Toronto, Wollongong, and Paramaribo, where differences of 7 to 13 × 10 14 molec cm −2 are observed between the Diff30 values and the total random errors.
After the measurement noise error (and the channelling for PROFFIT users), the largest contributions to the random uncertainty due to the forward-model parameters come from the temperature, the interfering species, and the off-set baseline. For temperature, the S b matrix has been estimated using the differences between an ensemble of NCEP and sonde temperature profiles at Reunion Island, leading to 2 to 4 K in the troposphere and 3 to 6 K in the stratosphere. This matrix is currently used by all SFIT4 users, while for the PROF-ITT users, these chosen values are smaller (1 K in the troposphere, 2 K up to the middle-upper stratosphere, and 5 K for the highest levels). For each interfering species, the associated S b matrix is the covariance matrix obtained with the WACCM v4 climatology. At some stations, the ILS also has a high contribution to the random error budget.
If one uses the FTIR HCHO measurements to validate a model or a satellite with a fine vertical resolution, considering the random and systematic uncertainties (without smoothing) in Table 3 (4th and 7th columns) is sufficient to make correct comparisons, because the smoothing error due to the low vertical resolution of the FTIR measurements vanishes if one takes into account the FTIR averaging kernels and a priori profile in the comparisons (Rodgers and Connor, 2003). However, if one wants to have a better knowledge of the real precision of the FTIR data themselves, this smoothing uncertainty can be estimated for the random part using the smoothing error covariance S rand s (Rodgers, 2000): where S var should represent the natural variability of the target molecule. For HCHO, this S var variability matrix is unfortunately not well known due to the poor number of vertically resolved measurements. In Table 3, the smoothing errors have been calculated taking the covariance matrices obtained using the WACCM v4 profiles at each station as an approximation of the S var matrices. However, models usually underestimate the variability, and we expect that the smoothing errors provided here may be underestimated, especially in locations where HCHO is expected to have stronger vertical gradient variability than in the model. As an example, in the study by Vigouroux et al. (2009), the S var was taken from aircraft measurements PEM-Tropics-B, and led to a smoothing error estimation of 14 % at Saint-Denis, while the present estimation based on the WACCM model gives about only 2 % for this station. However, the S var matrix constructed from PEM-Tropics-B showed from 33 % to 70 % of HCHO variability which seems too high compared to what is observed at Reunion Island from the FTIR measurements (about 20 %). This illustrates that, ideally, the S var matrix should be reevaluated at the sites whenever better model data or correlative measurements become available. Since the FTIR data sets always include their associated averaging kernel matri-ces, this re-evaluation can be done a posteriori by future users using Eq. (4). The smoothing systematic uncertainty, reflecting the bias that would occur on the retrieved profile if the a priori x a is biased compared to the real expected profile < x >, is calculated following von Clarmann (2014): The x a − < x > is obviously not known (otherwise, < x > would be chosen as the correct a priori in the retrievals). Therefore, we have chosen to use x a − < x>= − 50 %, −20 %, −10 %, +10 %, +8 % and +5 % for the ground-4, 4-8, 8-13, 13-25, 25-40 and 40-120 km layers, respectively. The values have to vary with altitude to induce a different a priori profile shape: if 50 % is used at all altitudes, the a priori profile is then different from < x > by a simple scaling factor, and the systematic smoothing error is close to zero. Using the above values, we obtain smoothing systematic errors from 1 to 9 % (median value of 3.4 %), which is small compared to the other systematic error sources (Table 3). These values assume that the model WACCM profile shapes are not too far from reality, which should be the case: due to the known short lifetime of HCHO and its production at or near the surface, we expect that the mean profile peaks at the ground. This is, as for the random smoothing part, only an estimate of the smoothing systematic error. As discussed in von Clarmann (2014), one would even prefer to not give these smoothing errors at all. We prefer to give them to provide the reader with at least an idea of the impact of the smoothing in the precision and accuracy of our FTIR HCHO measurements. However, when making model or instrument comparisons, the appropriate use of the averaging kernel and a priori profile information, following Rodgers and Connor (2003), allows the user to implicitly take the smoothing uncertainty into account. This means that, for satellite or model comparison, if the methodology of Rodgers and Connor (2003) is used, there cannot be some different systematic biases at different stations due to different x a − < x >. The dominating systematic uncertainty sources are the spectroscopic parameters: the line intensities and the pressure broadening coefficients of the absorption lines present in our MWs. For the HCHO spectroscopic parameters, the line list in atm16 follows HITRAN 2012 (Rothman et al., 2013), which used the work of Jacquemart et al. (2010), and we use 10 % for the three parameters (line intensity, air-, and self-broadening coefficients). The larger error source is then the HCHO line intensity parameter and to a lesser extent the HCHO air-broadening coefficient. In addition, the uncertainties in HCHO columns due to the interfering species spectroscopic parameters are calculated. The dominant ones were found to be due to the pressure broadening coefficients of CH 4 , HDO, and N 2 O, with an order of magnitude of about 20 % of the error due to the HCHO line intensity.
The other systematic error sources due to forward-model parameters are lower or within a few percent (ILS, tempera-ture), except for the PROFFIT channelling source (from 7 % to 17 %), which also has a systematic component. We see from Table 3 that the total systematic uncertainty is between 12 % and 15 % at the SFIT4 stations. For the PROFFIT stations, it lies between 12 % and 27 %.
3 Complete FTIR individual HCHO column data sets 3.1 A network sampling very low to highly polluted levels of HCHO In Fig. 5 we show the individual HCHO total columns obtained at each station for a single year only (2016, except for Saint-Denis: 2011), in order to better see the day-to-day variability. The complete time series at each station are shown in the Supplement (Fig. S1). The error bars in Figs. 5 and S1 are the total random uncertainties; i.e. we do not include the systematic errors in order to better visualize the precision of the FTIR measurements compared to the observed day-to-day variability. The FTIR network samples a wide range of concentrations. Indeed, we can first distinguish the "clean" sites (shown with the same vertical axis with maximum 15 × 10 15 molec cm −2 ) such as the Arctic stations (Eureka, Ny-Ålesund, Thule, Kiruna, Sodankyla), the marine stations (Izaña, Mauna Loa, Maïdo, Saint-Denis, and Lauder, the former three also being high-altitude stations), and the high-mountain sites (Zugspitze and Altzomoni). These clean sites can have HCHO concentrations at the limit of detection (few 10 13 molec cm −2 ) with mean values of 10-25 × 10 14 molec cm −2 (Table 3), except for Saint-Denis, which reaches a mean of 39 × 10 14 molec cm −2 . Second, we show the intermediate concentration sites (with the same vertical axis with maximum 30 × 10 15 molec cm −2 ) such as the tropical coastal site Paramaribo and the midlatitude polluted sites in or close to cities and/or vegetation (Peterhof close to St Petersburg, Bremen, Paris, Boulder). These intermediate sites have mean HCHO total columns of 58-73 × 10 14 molec cm −2 . The sites with the highest levels of HCHO (vertical axis with maximum 70 × 10 15 molec cm −2 ) are Toronto and Mexico City, where large anthropogenic emissions are indeed expected (means of 95 and 221 × 10 14 molec cm −2 ), and places which are also affected by large biogenic emissions such as Wollongong (mean of 79×10 14 molec cm −2 ) and the new site of Porto Velho, located at the edge of the Amazon rainforest (mean of 190 × 10 14 molec cm −2 ).

HCHO diurnal cycle
As explained in the introduction, to reconcile the different results obtained using satellites observing at different times (e.g. SCIAMACHY and GOME-2 measuring in the morning and OMI in the afternoon), it is crucial to have groundbased observations of the HCHO diurnal cycles (Barkley et al., 2013;De Smedt et al., 2015;Stavrakou et al., 2015). The diurnal cycle is also important for model validation, since emissions, chemistry and other processes depend on the time of the day. Our FTIR data set is now able to provide the diurnal cycles at 21 different locations. To separate the effect of the strong seasonal cycle (shown in the next section), we give the diurnal cycle in four different seasons in Fig. 6 for a selection of the sites, while the other ones are provided in the Supplement (Fig. S2). As seen from Figs. 6 and S2, the diurnal cycles are often site and season dependent. While there is no clear diurnal cycle at the Arctic sites and at some of the midlatitude cities during winter (St Petersburg, Bremen, Toronto), we usually see an increase from the morning, which is often more pronounced in June-July-August (and December-January-February in the Southern Hemisphere) at most of the stations (in the cities but also at marine sites). A maximum is often found around midday (St Petersburg, Mexico City, Izaña, Saint-Denis, and Wollongong) or much later in the afternoon (16:00-18:00 local time, LT), as in Bremen, Paris, Toronto, Lauder, and Altzomoni. Only in a few cases is a minimum found at midday (St Petersburg in SON, Zugspitze in MAM-SON, Sodankyla in MAM). The marine sites at high altitudes (free of local pollution) have a small minimum at about 08:00 LT (Izaña, Maïdo). This diversity in the FTIR diurnal cycles is also observed with MAX-DOAS at other sites : a very weak diurnal cycle at OHP (southern France) in winter and spring; a minimum around midday at Beijing and Xianghe in spring and autumn, and a constant increase in summer (as observed with FTIR for Bremen, Toronto, and Paris). The diurnal cycles observed at the Jungfraujoch station from FTIR measurements (Franco et al., 2015) show, for all months of the year, a midday maximum, which is very different from the ones observed at our closest station Zugspitze. The IMAGES model shows diurnal cycles in phase agreement with our FTIR measurements at Zugspitze, except for the summer, for which the model diurnal cycle is very weak (not shown). However, two sites very close together can indeed observe different diurnal cycles (as seen for Saint-Denis and Maïdo). More investigation is needed to understand the different diurnal cycles at these two mountain sites.
We see from Fig. 6 that the FTIR measurements at Porto Velho do not show a clear pattern, in particular if one is interested in the 09:30 and 13:30 LT differences between the overpasses of two different satellites . From the 1-year data available at present at this new site, it seems that the diurnal cycle cannot help reconciling the differences observed over Rondônia between GOME-2 and OMI . In contrast, the diurnal cycles observed over cities confirm that the observation of a positive bias between OMI (13:30 LT) and GOME-2 (09:30 LT) over urban areas can be indeed explained, at least partly, by the diurnal cycle.

Long-term HCHO trends
The length of the HCHO time series allows trends to be derived for some stations. We have calculated the trends at each station using the monthly mean time series Y m (t) with a simple model, including a fit of the seasonal cycles: Y m (t) = A 0 + A 1 · cos(2π t/12) + A 2 · sin(2π t/12) + A 3 · cos(4π t/12) + A 4 · sin(4π t/12) with A 5 as the annual trend. It turned out that, due to the very high variability of HCHO, the uncertainties in the trends are often too large  (Fig. S1). The clean, intermediate, and high-level HCHO sites are shown using blue, orange, and red colours. The error bars are the total random uncertainty. When the altitude of the station is higher than 1.5 km, it is explicitly shown.
to obtain significant values. A more sophisticated multiregression model might be able to reduce the uncertainties, but this is beyond the scope of this paper. However, for a few stations, significant trends are found. They are mainly negative: at St Petersburg (−3.9 ± 3.3 % decade −1 ), Mexico City (−9.6±5.1 % decade −1 ), Wollongong (−18.8± 10.8 % decade −1 ), and close to significance at Zugspitze (−7.7 ± 7.7 % decade −1 ). Only the marine sites Izaña and Saint-Denis show positive significant trends (+17.3 ± 15.2 and +15.8 ± 5.2 % decade −1 ). Note that at Maïdo, the trend is not significant. A careful combination of the measurements at both Reunion Island sites (Saint-Denis and Maïdo) could be carried out in the future. For the longest time series, we observe a very good agreement with previous studies in general. The negative trends observed over the European stations St Petersburg and Zugspitze are in agreement with the negative trends observed by OMI (2004OMI ( -2014 over St Petersburg and Germany . At the Jungfraujoch station, a negative trend (−6.1 ± 2.6 % decade −1 ) was also observed for the 1996-2015 period (Franco et al., 2016). Note that the calculation of the uncertainties in the trends in our study takes into account the autocorrelation in the residuals, following Santer et al. (2000), which increases the uncertainties. For Zugspitze, the uncertainty without correcting for this autocorrelation, as in Franco et al. (2016, would be 4.9 % (instead of 7.7 %), showing then a more significant trend, in agreement with these studies. The non-significant trends observed at the northern European station (Kiruna), and the midlatitude American stations (Toronto, Boulder) are in agreement with . In the Southern Hemisphere, the negative trend  (Fig. S2). The error bars are the standard errors of the mean: 2 × σ/ √ n, with σ the standard deviation and n the number of measurements at a given time. If n < 8, the hourly value is not shown. The time is the local standard time meridian (LSTM). observed at Wollongong was also found in De , as well as a positive trend at Madagascar, which is near Reunion Island, in agreement with the high positive trend observed at Saint-Denis. At Lauder, OMI also shows a non-significant trend .

HCHO FTIR and IMAGES model comparisons
In this study, we do not aim to validate the model input parameters or attribute different emission sources at the different stations. We use the model to assess the internal consistency of the network using harmonized retrieval settings. This means that we expect that, for the same latitude regions and/or type of sites (polluted; clean), the comparisons with the model will give consistent biases. In the Supplement we provide a global map of IMAGES climatological daytime HCHO columns (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) together with the mean columns observed at the FTIR stations (Fig. S3). This map illustrates the very different levels covered by the FTIR stations and the overall good agreement with the calculated levels of IMAGES. However, Fig. S3 can only provide a qualitative comparison due to the different measurement periods covered. We give quantitative comparisons in the present section.

IMAGES model description
The IMAGESv2 global model calculates the distribution of 170 chemical compounds gases with a time step of 6 h at 2 • × 2.5 • resolution, with 40 hybrid (σ pressure) levels in the verticals between the surface and the lower stratosphere (44 hPa level). The model calculates daily averaged concentrations of chemical compounds. The effect of diurnal variations is accounted for through correction factors on the photolysis and kinetic rates obtained from a full diurnal cycle simulation using a time step of 20 min. The same model simulation also stores the diurnal shapes of formaldehyde columns required for the comparison with FTIR data on files. Meteorological fields (winds, temperature, humidity, 3-dimensional cloud cover, solid and liquid cloud water content, large-scale and convective precipitation rates, visible downward radiation, convective updraught fluxes, boundary layer diffusivities, snow depth, sea ice fraction, surface roughness lengths, surface sensible heat flux, friction velocity, etc.) are obtained from ERA-Interim analyses of the European Centre for Medium-range Weather Forecasts (ECMWF).
Anthropogenic emissions of NO x , CO, SO 2 , and NMVOC are provided by the Hemispheric Transport of Air Pollution data set version 2 (HTAPv2) (Janssens-Maenhout et al., 2015), with the NMVOC speciation provided by the emission inventory of the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP) (Lamarque et al., 2010). Emissions from open vegetation fires are taken from the last version of the Global Fire Emissions Database, GFED4s, which includes the contribution of small fires (Randerson et al., 2012;Giglio et al., 2013). The GFED data are available at daily frequency at 0.25 • × 0.25 • from 1997 to the present (http://www.globalfiredata.org, last access: 5 September 2018). The vertical distribution of these emissions follows Sofiev et al. (2013). Isoprene and monoterpene emissions are obtained from the MEGAN-MOHYCAN model (Müller et al., 2008;Stavrakou et al., 2014;Guenther et al., 2012) for all years of the study period at a resolution of 0.5 • × 0.5 • (http://tropo.aeronomie.be/models/isoprene.htm, last access: 5 September 2018). Methanol biogenic emissions are obtained from the inverse modelling study of Stavrakou et al. (2011). Besides the dependence on temperature, visible radiation, leaf area, and leaf age, the model accounts for the inhibition of isoprene emissions under drought conditions through a dimensionless soil moisture activity factor (γ SM ). However, the parameterization of γ SM is very uncertain, as discussed in Bauwens et al. (2016), and we assume γ SM = 1 in this study. The average global annual emissions are 419 Tg yr −1 isoprene, 100 Tg yr −1 methanol, and 103 Tg yr −1 monoterpenes.
The chemical degradation mechanism of pyrogenic NMVOCs is described in Bauwens et al. (2016). The oxidation mechanism for isoprene is also based on Bauwens et al. (2016), with a few updates. It accounts for the revised kinetics of isoprene peroxy radicals according to the Leuven Isoprene Mechanism version 1 (LIM1) (Peeters et al., 2014) and is further modified to account for laboratory findings (Teng et al., 2017;Bates et al., 2016). The formaldehyde yield in isoprene oxidation by OH is close to 2.4 mol mol −1 in high NO x (1 ppbv NO 2 , after 2 months of simulation) and 1.9 mol mol −1 at low NO x (0.1 ppbv NO 2 ). The chemical mechanism for monoterpenes is simplified, with product yields of formaldehyde, acetone, methylglyoxal and glyoxal based on box model calculations using the αand β-pinene oxidation mechanism from the Master Chemical Mechanism (MCM) (Saunders et al., 2003). The overall formaldehyde yield is 4.2 HCHO per monoterpene oxidized, reducing to 2.3 after subtracting the contributions of acetone and methylglyoxal oxidation. This yield is further reduced by 45 % to account for wet deposition of intermediate and secondary organic aerosol formation. This fraction of 45 % is higher but of the same order as the estimated overall impact of deposition on the average HCHO yield from isoprene oxidation (28 %), based on IMAGES model calculations. The higher fraction for monoterpenes is intended to account for the impact of the more complex chemistry and larger number of oxygenated intermediates involved in their oxidation compared to isoprene. The large deposited fraction is uncertain but appears justified by the larger number and lower volatility of intermediates involved in formaldehyde formation from monoterpene oxidation.
The calculation of the model columns at the FTIR stations accounts for its location in the horizontal (nearest model pixel) for the FTIR a priori profiles and averaging kernels as prescribed in Rodgers and Connor (2003), as well as for the station altitude above sea level. The model column is calculated from the calculated formaldehyde profile, between the altitude of the station and the uppermost model level (approximately 20 km), and from the a priori FTIR profile, above that level. When the model surface lies higher than the station, the model column is increased by a partial column assuming a constant mixing ratio between the two altitudes, taken equal to the value at the lowermost model level. The monthly averaged formaldehyde columns are calculated by accounting for the temporal sampling of the observations at each site and month. Also, the local time of each observation is taken into account by rescaling the daily averaged concentration using the formaldehyde diurnal shape factors calculated by the model with a time step of 20 min.

HCHO monthly means and seasonal cycle comparisons
We compare the monthly means of FTIR HCHO total columns at each station with the IMAGES columns calculated for the 2003-2016 period. The time series of both products are shown in Fig. 7. Since the random uncertainty of the FTIR monthly means is divided by the square root of the number of measurements within each month, the dominant contribution to the FTIR error bars in Fig. 7 is the systematic uncertainty (estimated at 11-26 %. The smoothing uncertainty is not included in model comparisons using Rodgers and Connor, 2003). Except for very few cases (Mexico City and Paramaribo), the model is in overall good agreement with the FTIR measurements in terms of absolute levels (Fig. 7) and seasonal cycle (Fig. 8).
For each station the correlation, the bias and the standard deviation (SD) of the statistical comparisons between the monthly means, mean(IMAGES(smoothed) − FTIR)/mean(FTIR), are summarized in Table 4. The median correlation between FTIR and IMAGES for the 21 stations is very high (0.81), with weaker values at the Mexican stations (0.4/0.5) and at Mauna Loa (0.10). The median standard deviation for all comparisons is 25 % (ranging from 11 % to 41 %). This agreement is good considering the FTIR variability (i.e. the SD) of HCHO monthly means (median of 35 %). The standard deviation of the comparisons can be explained partly by the lower variability of the model monthly means (31 %) compared to FTIR, as seen in Fig. 7. In addi- tion, the variability of the model data within a month is also much smaller (median of about 11 %; this SD within a month is shown as magenta error bars in Fig. 7) than the FTIR one (median of about 28 %). The median of IMAGES and FTIR differences is small (−15 %) and within the FTIR systematic uncertainty estimated at 11 %-26 %. However, the biases range from −64 % to +51 %, which requires an investigation of their possible causes. The main source of systematic uncertainty is the spectroscopic parameters, which have been harmonized in this work, with each station using the same line parameters database, and the same spectral MWs. Therefore, it is expected that all FTIR stations should provide consistent HCHO total columns within 5 %-17 % (systematic errors due to other sources than spectroscopic ones). To check this, we divide the FTIR stations according to their concentrations levels and latitudes, and use the model for comparisons.

Clean Arctic sites
We distinguish two groups of Arctic sites, Eureka, Ny-Ålesund and Thule, which are very remote (77-80 • N), and two European sites, Kiruna and Sodankyla (67-68 • N). As seen in Table 4 Note that the Arctic sites do not record measurements during polar nights, so the winter months basically correspond to February (Fig. 8).

High-mountain sites
The mountain sites are more difficult to model, especially when they are close to cities. They are often very clean sites, but the model cannot reproduce this at the current resolution (2 • × 2.5 • ) when they are surrounded by emission sources in the same pixel. This seems to be the case at Altzomoni, which lies in the same model pixel as Mexico City, lead-   (Franco et al., 2015). It is therefore not possible at present to compare the biases obtained at these two close stations. At the mountain site of Izaña, located in a clean marine area, the model and FTIR are in overall good agreement (−3 %), with a negative bias in summer (−19 %) and a pos-itive one in winter (+22 %), as a result of the weak seasonal amplitude in the model. A moderate positive model bias is calculated at Mauna Loa (+13 %), which is more pronounced in winter (+24 %), and a good agreement is seen between the model and FTIR mean seasonal cycle (Fig. 8). The observed variability (34 %) is, however, important at this site and similar to the clean Arctic sites (Fig. 7), with values ranging from 0.5 to 2.5 × 10 15 molec cm −2 . This is not reproduced by the model values lying within 1-1.5 × 10 15 molec cm −2 . The causes of the pronounced observed variability are unclear at present. Table 4. Correlation (Corr), bias ± standard deviation (SD stat ) of the statistical comparisons between the monthly means, mean(IMAGES(smoothed)−FTIR)/mean(FTIR). Also given are the mean of the standard deviations in the IMAGES and the FTIR monthly means, i.e. the variability within a month (SD m ), and the standard deviation of the whole FTIR and IMAGES monthly mean time series (SD all ). All numbers, except the correlations, are given in %.

Station
Corr bias ± SD stat bias ± SD stat bias ± SD stat SD m IMAGES / FTIR SD all IMAGES / FTIR

Central and South American sites
The model falls short in reproducing the enhanced HCHO levels observed at Mexico City (ca.2 × 10 16 molec cm −2 ), mainly due to the coarse-model resolution (2 • ×2.5 • ), as suggested by the strong negative bias (−64 %), which is almost constant across the year.
Comparison at two sites in South America, the coastal site of Paramaribo and the Porto Velho site at the edge of the Amazon rainforest, indicates a consistent model overestimation (+51 %/+41 %). At Porto Velho, this overestimation is more significant during the dry season (August-September, Fig. 8), which corresponds to the maximum of fire intensity in Amazonia. An overestimation of biogenic (isoprene) and biomass-burning emissions in Amazonia was already found in IMAGES in the study of Bauwens et al. (2016).

Southern Hemisphere 21-45 • S sites
The two marine sites at Reunion Island (Saint-Denis at sea level, and Maïdo at 2.2 km altitude) show a small model bias (−7 %) and standard deviation, especially at Maïdo (11 %). At these sites, HCHO shows the lowest variability in the monthly means (18-20 %), and the model reproduces the seasonal cycle quite well. As shown in Fig. 8, the largest seasonal bias is not found in austral summer (DJF) as seen in the Northern Hemisphere sites, but during September-November months, which correspond to the maximum of the biomass-burning period in southern Africa and Madagascar, close to Reunion Island. The biomass-burning source at this location might be underestimated, whereas it was overestimated in South America.
The Wollongong site shows the same behaviour as most of the Northern Hemisphere sites: an overall underestimation of the model (−26 %), which is larger in austral summer (−29 %). A first look at the Lauder comparison gives a similar annual bias (−25 %), which remains constant over the year, as seen in Table 4 and Fig. 8. However, Fig. 7 shows that, during the austral winters (JJA) 2012 to 2015, the FTIR time series presents unusually high columns. By limiting the comparison to the first years of the period, a better agreement with the model in winter is obtained at Lauder, as for many other sites.
Since the time series at Saint-Denis, Wollongong and Lauder have been published in the past using different retrieval strategies (Vigouroux et al., 2009;Jones et al., 2009;Zeng et al., 2015), here we report the bias observed at these stations between the previous and present data sets. The bias is only 1.4 % at Saint-Denis, the present HCHO columns being smaller than the previous data set in Vigouroux et al. (2009), in which the a priori profile and the spectroscopy were different (mostly for interfering species, the HCHO spectroscopic intensity parameters being from the same work of Perrin et al., 2009), and the MWs were smaller than in the present work. Therefore, the comparisons with MAX-DOAS shown in Vigouroux et al. (2009) would still provide good agreement between the two techniques. Concerning Lauder and Wollongong, where the previous retrieval strategy was from Jones et al. (2009), the present HCHO columns are 49 % smaller than the previous data sets. Therefore, the new data set is in much closer agreement with the simulation of four different models that were all found 50 % lower than the old Lauder and Wollongong data sets (Zeng et al., 2015). From sensitivity tests, this high bias between the two strategies is very likely mostly due to the 2869.65-2870.0 cm −1 window used in Jones et al. (2009).

Conclusions
Only five NDACC FTIR sites have delivered HCHO time series until now (Paton-Walsh et al., 2005;Jones et al., 2009;Vigouroux et al., 2009;Viatte et al., 2014;Franco et al., 2015), using different retrieval settings. The small number of stations and the differences in bias associated with the different retrieval strategies made it difficult to use the FTIR network as a coherent tool for satellite or model validation. In this study, we have designed a harmonized HCHO retrieval strategy to derive total columns at 21 stations, at locations characterized by very different concentrations, from very clean Arctic sites where HCHO is at the limit of detection (a few 10 13 molec cm −2 ) to highly polluted sites such Mexico City or Porto Velho, near the Amazon rainforest, where columns up to 7×10 16 molec cm −2 have been observed. This network includes well-established NDACC stations, as well as several new sites (Sodankyla, Boulder, Paris, Porto Velho) that aim to be affiliated with NDACC. The FTIR network is also growing, with new sites such as Hefei in China, which will again expand its spatial coverage.
We have presented the retrieval settings that have been optimized for this challenging species, and the FTIR HCHO products have been characterized by their averaging kernels and their uncertainty budget. The systematic uncertainty of an individual HCHO total column measurement lies between 12 % and 27 %, with some differences remaining between the SFIT4 code users (12 %-15 %) and the PROFFIT users (12 %-27 %), which needs to be investigated in the future within the NDACC InfraRed Working Group. The random uncertainty lies between 1 and 11 × 10 14 molec cm −2 , with a median value of 2.9 × 10 14 molec cm −2 . The high maximum value is due to the lower quality of the Bruker Vertex compared to the high-resolution ones (Bruker 120/5 M or 120/5 HR).
In addition to the well-defined seasonal cycles, the diurnal cycles were presented at each site. These observations are crucial for interpreting the differences observed between satellites measuring at different local times. For example, the diurnal cycle at Porto Velho which shows insignificant variations suggests that the negative bias observed over Rondônia between OMI (13:30 LT) and GOME-2 (09:30 LT)  is unlikely due to the diurnal cycle. In contrast, the FTIR diurnal cycles in the cities confirm that the positive bias between OMI and GOME-2 over urban areas is likely due, at least partly, to the diurnal cycle.
The monthly mean time series as well as the seasonal cycles have been compared to the IMAGES model. We did not aim to evaluate the model but show that the FTIR network provides coherent absolute values and seasonal cycles. We observed an overall good agreement with IM-AGES, which usually (but not always) underestimated the HCHO total columns (median bias ± standard deviation of −15 % ± 25 %), with a more pronounced bias during summer (−19 % ± 19 %). The similar biases obtained at stations under similar conditions (clean Arctic sites, urban sites, marine sites) strengthen our confidence in the harmonization of the HCHO products within the network. When the model showed different behaviours for some of the stations, we could explain it by either the oversized model pixel (2.0 • × 2.5 • ), especially for high-altitude sites such as Zugspitze, Altzomoni, and Mexico City, or an overestimation of the biogenic and biomass-burning sources in South America (Paramaribo, Porto Velho), which was already pointed out in Bauwens et al. (2016). However, for a few sites, the behaviour of the model remained unexplained (positive biases at Kiruna and Sodankyla, the too-low model variability at Mauna Loa).
These HCHO time series, harmonized and well characterized, provide an important data set for past and present satellites, and model validation. They are continuously extended by new measurements and will be used in the coming years for the validation of new satellites, such as Sentinel 5P and Sentinel 4. Data availability. The FTIR data sets can be provided in the public NDACC repository (ftp://ftp.cpc.ncep.noaa.gov/ndacc/station/, last access: 5 September 2018) depending on each PI decision. Please pay attention to the NDACC data policy. The whole data set used in this publication can be provided upon request to Corinne Vigouroux (corinne.vigouroux@aeronomie.be) and data per station can be requested from the individual PIs.