A tropopause-based a priori for IASI-SOFRID Ozone retrievals: improvements and validation

Abstract. The Metop/IASI instruments provide data for operational meteorology and document atmospheric composition since 2007. IASI Ozone (O3) data have been used extensively to characterize the seasonal and interannual variabilities and the evolution of tropospheric O3 at the global scale. The SOFRID (SOftware for a Fast Retrieval of IASI Data) is a fast retrieval algorithm that provides IASI O3 profiles for the whole IASI period. Up to now SOFRID O3 retrievals (v1.5 and 1.6) were performed 5 with a single a priori profile which resulted in important biases and probably a too low variability. For the first time we have implemented a dynamical a priori profile for spaceborne O3 retrievals which takes the pixel location, time and tropopause height into account for SOFRID-O3 v3.5 retrievals. In the present study we validate SOFRID-O3 v1.6 and v3.5 with ECC ozonesonde profiles from the global WOUDC database for the 2008-2017 period. Our validation is based on a thorough statistical analysis using Taylor diagrams. Furthermore we compare our retrievals with ozonesonde profiles both smoothed by the IASI averaging 10 kernels and raw. This methodology is essential to evaluate the inherent usefulness of the retrievals to assess O3 variability and trends. The use of a dynamical a priori largely improves the retrievals concerning two main aspects: (i) it corrects high biases for low-tropospheric O3 regions such as the southern hemisphere (ii) it increases the retrieved O3 variability leading to a better agreement with ozonesonde data. Concerning UTLS and stratospheric O3 the improvements are less important and the biases are very similar for both versions. The SOFRID Tropospheric Ozone Columns (TOC) display no significant drifts (< 2.5%) 15 for the northern hemisphere and significant negative ones (9.5% for v1.6 and 4.3% for v3.5) for the southern hemisphere . We have compared our validation results to those of the FORLI retrieval software from the litterature for smoothed ozonesonde data only. This comparison highlights three main differences: (i) FORLI retrievals contain more theoretical information about tropospheric O3 than SOFRID (ii) RMSDs are smaller and correlation coefficients are higher for SOFRID than for FORLI (iii) in the northern hemisphere, no significant temporal drift is detected in SOFRID contrarily to FORLI (∼8%). 20

This technique was once used for O 3 total column retrievals from FTIR spectra at the Jungfraujoch station (De Maziere et al., 1999). It was shown that the retrieved O 3 columns were largely improved when the tropopause was taken into account in the choice of the a priori. In SOFRID-O3 V3.5, we compute the tropopause using the WMO lapse rate definition from the ECMWF interpolated temperature profiles. The a priori profile is then picked up from the TpO 3 climatology according to month, latitude and tropopause height.

Information content and retrieval error
A remote sensing instrument is not equally sensitive to the different atmopsheric layers. Its vertical sensitivity depends on its instrumental characteristics and on local parameters. In the case of a thermal infrared nadir sounder such as IASI, surface parameters such as surface emmissivity, surface temperature, thermal contrast between the surface and the first atmospheric layer 10 are key parameters to determine the vertical sensitivity, especially in the lower troposphere (Barret et al., 2005;Boynard et al., 2016). The vertical sensitivity of a remote sensing instrument is characterised by the so-called Averaging Kernel (AK) matrix.
For each retrieval layer, the retrieved quantity is the result of the convolution of the whole real profile by the corresponding averaging kernel (row of the AK matrix) plus a contribution from the a priori profile (x a ) and a noise ( ) contribution (see Eq. In an ideal case, the AK matrix (A) would be the identity matrix (I) and real (x) and retrieved (x) profiles would be identical modulo the noise ( ) contribution. In a real case, the AKs are bell shaped functions which peak at an altitude that could be different from the nominal altitude and which width gives an indication of the retrieval vertical resolution.

20
The Degree of Freedom for Signal (DFS) of a retrieval describing the number of independent pieces of information provided by the measurement is the trace of the AK matrix (Rodgers, 2000). We have divided the atmosphere in 5 layers which are described in Table 1. The Troposphere 2 layer has been selected for comparison with Boynard et al. (2018Boynard et al. ( , 2016 who did not compute a tropopause based TOC for their validation (see section 5). The DFS corresponding to these different layers is displayed in Figure 1 for V1.6 and V3.5 averaged over the validation dataset. The total DFS ranges from 2.4 to 3.3 for v3.5 and 25 is about 0.2 lower for v1.6. The DFS for the troposphere (WMO lapse rate), UTLS and stratosphere are almost identicals for both versions. The tropospheric DFS is the lowest (0.3-0.5) at high latitudes where surface temperature, thermal contrast and tropopause height are the lowest and the highest in the tropics (about 1.5) where surface temperature and tropopause height are the highest. At mid-latitudes the tropospheric DFS is about 0.6. Therefore, except in the tropics, SOFRID retrievals provide less than one independent piece of information in the troposphere. In the UTLS (resp. stratosphere) the DFS range from 30 0.7 to 1 (resp. from 0.9 to 1.5) which means that SOFRID provides around one independent piece of information in these layers.
Th retrieval error is the sum of the measurement and smoothing errors (Rodgers, 2000). Uncertainties in auxiliary parameters (Temperature and humidity profiles, surface properties...) are also responsible for errors. Coheur et al. (2005); Barret et al. (2005) have shown that in the case of O 3 and CO retrievals from thermal infrared satellite sensors the dominant source of errors was the smoothing error. The retrieval error for SOFRID-O3 v1.6 and v3.5 are displayed in Fig. 1. V1.6 displays slightly larger errors than v3.5 but the same behaviour. For the Total and stratospheric columns, the errors decrease from high latitudes (9-12 5 DU) to the tropics (6-8 DU). The behaviour of UTLS errors is similar with lower values (4 to 6 DU). For the TOC, errors are larger in the tropics (5 DU) than at middle and high latitudes (4 DU).

Global distributions of tropospheric ozone columns
The global distributions of TOC from SOFRID v1.6 and v3.5 for July and December 2017 are displayed on Fig. 2. The global 10 TOC structures are similar for both versions. They both clearly show the highest TOC over the NH mid-latitudes in summer with a large export region over the north Pacific off the chinese coast and the summertime TOC maximum over the Eastern Mediterranean already documented with the GOME-2 sensor (Richards et al., 2013). The tropical Wave-one pattern (Thompson et al., 2003;Sauvage et al., 2006)  latitudes in winter. We will show in the validation part of the paper that this is an important improvement of the SOFRID-O3 retrievals. The agreement is better in regions of high TOC such as NH mid latitudes in summer or the tropical Atlantic.
The use of a dynamical a priori is responsible for visible stripes along the 10 latitude bands. These stripes are generally indicating a discontinuity of 2.5 to 5 DU between two adjacent latitude bands with different a priori profiles. They are clearly 25 caused by the impact of the a priori on the retrieval which is taken into account in the retrieval error (see Equ. 1). The latitudinal discontinuities are therefore consistent with our retrieval errors (4-5 DU) from Fig. 1.

Ozonesonde data
Ozonesonde data come from the WOUDC database (hhttps://www.woudc.org/). For consistency purposes we have chosen to use data from ECC sondes only. For the 10 years IASI period (

Coincidence criteria
The spatiotemporal coincidence criteria are ±1°latitude, ±1°longitude and ±12 hours. They are similar to those used in Barret

15
As we compare sondes with IASI morning data only and that most of the sonde launches are performed in the morning, using 6 or 12h coincidence does not introduce significant differences. We have computed statistics for 9 latitude bands which are the whole globe, the two hemispheres and six 30 • wide latitude bands. For each band, the monthly mean is computed if there are more than 3 coincindent profiles. Pixels are selected according to 3 quality criteria. We first keep pixels for which convergence is achieved meaning a positive Jcost output from the 1DVar (based on gradient and evolution of Jcost between the two last it-20 erations). We have also set an upper limit (1.0) for the retrieval cost in order to elliminate pixels with poor quality fits. Thirdly, only pixels with a total DFS > 2.0 are selected. Using these criteria we have kept about 9.0E5 pixels out of 1.1E6.

Comparison with raw and smoothed data
To compare remote sensed to in-situ or modeled profiles it is important to apply Eq. 1 to the in-situ or simulated profile 25 (Rodgers, 2000;Barret et al., 2002). This procedure allows us to check the quality of the retrieval taking its degraded vertical resolution and sensitivity into account.
Nevertheless, in a validation objective it is also necessary to compare the retrieved profiles to raw (not smoothed by the AKs) in-situ profiles in order to perform a fully informative validation. This is of particular importance when the satellite data are 30 used for issues such as the ozone seasonal to interannual variabilities (Wespes et al., 2017;Peiro et al., 2018) or to document the long term tropospheric ozone tendencies (Gaudel et al., 2018;Wespes et al., 2018;Dufour et al., 2018). Indeed, the application of Equation 1 implies the mixing of information between the different layers. Therefore, the variabilities and the drifts computed from raw and smoothed sonde data may be different and need to be documented. Raw ozone sonde data have been compared to IASI retrievals in few studies at the beginning of the IASI period (Barret et al., 2011;Dufour et al., 2012) but have been disregarded in more recent validation work (Boynard et al., 2016(Boynard et al., , 2018. The importance of raw data validation regarding 5 seasonal and interannual variabilities and trends analyses will be highlighted in details in section 4.

Taylor diagram
In order to validate remote sensing with reference in-situ observations we need to determine how well they are able to reproduce the same behaviour. There are four statistical indicators that have to be computed: (i) the absolute difference or bias which 10 documents the accuracy, (ii) the root mean square of the differences (RMSD) which tells wether the bias is significant or not, Based on the relationship between correlation coefficients, RMSDs and variances of the reference (validating) and test (validated) datasets, Taylor has developed the Taylor diagram initially for climate models evaluation (Taylor, 2001). It displays all of these parameters (except the biases) in a more convenient and synthetic way than tables with numbers. Each experiment or observation to be validated correspond to a point placed within a quarter circle. The reference is located in the middle of 20 the X-axis (see Fig. 4, 5). The correlation coefficient between the reference and test dataset is given by the azimuthal position of the point. The RMSD is proportional to the distance between the test and the reference point. Finally, the radial distance from the origin is proportional to the variance of the experiment. Both RMSDs and standard deviations are normalised by the standard deviation of the reference (see Taylor (2001)  For the different latitude bands, the statistics from the comparisons between ozonesondes and SOFRID data are presented in Table 2 for the biases and corresponding RMSDs. Taylor diagrams are displayed in Fig. 4 for the TOC and lower tropospheric 30 SOFRID retrievals and smoothed sonde profiles is better than with raw sondes. An important exception is the large UTLS oscillations in both the NH and SH tropics and for both v1.6 and 3.5. Therefore, unlike expected, this important discrepancy between retrievals and sonde data does not result from the use of a single a priori profile too far from the real profile. The differences between v3.5 and v1.6 are largely reduced when sondes are smoothed. For instance the large tropospheric biases 25 for v1.6 in the SH disappears when the smoothing is applied to the sonde profiles.
For all latitude bands RMSD profiles display the largest values around the tropopause (below 60% in the extra tropics and up to 100% in the NH tropics) as is expected because it is the altitude range with the largest relative variability. RMS from differences between retrievals and smoothed data are generally much lower than with raw data. This is also expected since the 30 smoothing error is the largest source of error in IASI retrievals (see Barret et al. (2011); Dufour et al. (2012)). RMS of the differences with smoothed sondes in the troposphere are somewhat larger for v3.5 than v1.6 especially in the SH. This is an indication of the increased sensitivity and decreased smoothing of v3.5. This is also evident in the Taylor diagrams which show 10 https://doi.org/10.5194/amt-2020-5 Preprint. Discussion started: 28 February 2020 c Author(s) 2020. CC BY 4.0 License.
that tropospheric variabilities are larger and in better agreement with sonde data (raw and smoothed) for v3.5 (see Fig. 4). teresting to bring insight about the general statistics discussed in the previous sections and to identify possible drifts of the data.

Time series of tropospheric columns
The time series of IASI and sondes monthly TOCs are presented in Fig. 8 (resp. 9) for V1.6 and in Fig. 10 (resp. 11) for V3.5 for northern (resp. southern) hemisphere. We present both raw and smoothed sonde data to highlight the impact of smoothing upon the agreement between IASI and sondes. This impact is particularly obvious for SOFRID v1.6 at mid-latitudes. At 10 northern mid-latitudes the bias between SOFRID v1.6 and raw sonde TOCs displays large seasonal variations from -(5-10)% in summer to 10-20% in winter resulting in a negligible 2±15% average bias (Table 2). When sonde data are smoothed by IASI AKs, the sonde variability is largely reduced. Bias is varying from 5% in winter to -5% in summer. icantly biased high (29±22%) relative to raw sonde data (Table 2). This was explained by the fact that the single a priori used in v1.6 is biased towards northern mid-latitude O 3 (Emili et al., 2014). When the sonde data are smoothed by IASI AKs, the agreement is much better and the bias becomes unsignificant (5±9%) as a result of taking the a priori contribution into account (Equ. 1). The largest significant bias (56±26%) is found in the SH high latitudes for v1.6 TOCs ( Table 2) with large seasonal variations from 20% in winter to 120% in summer. The large bias variabilities at mid-and especially high latitudes of the SH 20 result from the very low seasonal variability of the retrieved columns (see Fig. 4(a)).
For V3.5, the use of a dynamical a priori profile clearly improves the retrievals at mid-latitudes. At northern mid-latitudes the seasonal bias variation is reduced to -10-0% and the average bias remains small (-6±14%). When smoothing is applied, the seasonal variability almost disappears and the bias is only -3±9%. At southern mid-latitudes, the agreement is very good 25 and very similar for raw and smoothed sonde data with no real seasonal signature detectable and an avergae bias close to 0%.
At tropical latitudes, the situation is quite different. First, the seasonal variability is not so notable and regular and the difference between raw and smoothed sondes is lower than at mid-latitudes. Furthermore, the behaviour of v1.6 and v3.5 are close even though v3.5 is in better agreement with sonde data (see section 4.1). In the southern tropics there is a noticeable variation At high northern latitudes, for both v1.6 and v3.5 the drifts are large (> 10 and >4.5%.decade −1 for raw and smoothed data resp.) and significant at the 95% level. For mid and tropical latitudes, drifts are between 0.9 and -3.4 %.decade −1 but are not 10 significant. The NH mid-latitude drift with raw sonde data is reduced from -3.2 with v1.6 to -0.6%.decade −1 with v3.5. For the whole NH, the drifts are not significant and decreases from -2.2 with v1.6 to 0.7%.decade −1 with v3.5 for raw sonde data.
In the SH tropics, drifts are ∼-5 and ∼-3%.decade −1 for raw and smoothed sonde data resp. and only significant for v3.5 15 compared to raw data. These drifts are linked to the large negative biases of the 2011-2014 period resulting from misssing data (see above). For v1.6 a large but unsignificant drift (-8%) also occurs at high latitudes which is largely reduced for v3.5.
For the whole SH we found a significant negative drift (relative to raw sonde data) of -9.5±4.7 for v1.6 which reduces to (11600) of ozone sonde profiles than in the present study and their comparison methodology is close to the one we have used (spatio-temporal coincidence criteria set to 100 km and ± 6 h). We have collected the correlation coefficients (r2), the biases, the RMSDs from B18. We have also collected the DFS of the retrievals and the slopes of the linear fit between the smoothed sondes and retrievals from B18. There are two main limitations to the comparison between the validation of our SOFRID retrievals and the FORLI validation from B18. First, they do not document the sonde and IASI variabilities and make it therefore 30 impossible to draw Taylor diagrams with their data. Second, they have limited their comparisons to smoothed sonde data and do not provide results from comparisons between raw sonde and FORLI data. In Fig. 14 we have drawn Biases and RMSDs from SOFRID v1.6 and 3.5 and from FORLI for the layers selected by B18 (1013-300 hPa, 300-150 hPa and 150-25 hPa) and sphere, both SOFRID and FORLI products are positively biased. The largest differences between both retrieval algorithms are found in the extra tropical southern latitudes with FORLI biases larger than SOFRID. In the 60-90 • S latitude band FORLI biases reach about 40 % against about 5% for SOFRID.
In the perspective of a better quantification of tropospheric O 3 evolution and of the TOAR results (Gaudel et al., 2018), it 5 is also important to compare the drifts between sonde and retrievals. B18 present and discuss the drift between FORLI and sonde data for different layers in the whole NH. The SOFRID NH tropospheric drifts discussed in section 4.3 are smaller and opposite in sign to the significant -8.6±3.4%.decade −1 drift between FORLI and smoothed sonde data in the NH troposphere presented in B18. As B18 computed a surface-300 hPa column instead of a tropospheric column, we have computed the drifts based on the same layer (see Fig. 15). Drifts for Surface-300hPa columns are slightly (0.1 to 0.4%) smaller than for TOCs 10 and are not significant in both cases. The comparison of the NH drift with B18 is therefore not dependent on the tropospheric layer definition. For v1.6 and v3.5 compared with raw and smoothed sonde data, the surface-300 hPa column drifts range from -2.0 to 1.3%.decade −1 (see Fig. 15), values which are much smaller than in B18. These authors attribute their NH tropospheric significant drift to an abrupt change or jump detected in 2010 in FORLI but already detectable in the previous version (v20141022) of FORLI (Boynard et al., 2016). No significant change occuring around 2010 is detectable for SOFRID v1.6 15 ( Fig. 8(h)) and v3.5 ( Fig. 10(h)) NH time serie. The difference could be linked to the use of EUMETSAT L2 products and of ECMWF analyses for FORLI and SOFRID retrievals respectively. As mentioned previously refering to B18, EUMETSAT L2 RMSDs and increases the r2 correlation coefficients and the amplitude of the retrieved variability. The high TOC biases of v1.6 relative to low O 3 is also corrected with v3.5. This is of particular importance in the SH extratropics where the very large biases almost dissapear. In the NH lower TOC are retrieved in winter leading to a better seasonal cycle.
In the UTLS and stratosphere the improvements are less important. In particular both versions are impacted by positive biases for the UTLS (18% at NH mid-latitudes) and stratospheric (<7%) columns at extratropical latitudes that were already discussed in Dufour et al. (2012). In the tropics large profile oscillations around the tropopause result in negative biases in the 5 UTLS (21% in the SH) and positive biases (< 14%) in the stratospheric columns.
Concerning the TOC drifs, we have shown that there were no significant differences between v1.6 ans v3.5. There are no significant drifts except at high northern latitudes (increase of 9-13%.dec) and at southern tropical latitudes (decrease of 4-5%.dec). For southern tropics, the apparent decrease is probably linked to a sampling weakness at different stations which 10 makes the time serie inhomegeneous.
Our study have also demonstrated the importance of making comparisons with both raw and smoothed in-situ data. Comparing only with smoothed data could lead to the conclusion that the satellite data are better than they really are. For instance, the high bias for low TOC with the v1.6 is almost completly corrected when smoothing is applied. The real improvement of 15 v3.5 relative to v1.6 is only sizeable when we compare SOFRID retrievals with raw sonde data.