The complete data fusion (CDF) method is applied to ozone profiles obtained from simulated measurements in the ultraviolet and in the thermal infrared in the framework of the Sentinel 4 mission of the Copernicus programme. We observe that the quality of the fused products is degraded when the fusing profiles are either retrieved on different vertical grids or referred to different true profiles. To address this shortcoming, a generalization of the complete data fusion method, which takes into account interpolation and coincidence errors, is presented. This upgrade overcomes the encountered problems and provides products of good quality when the fusing profiles are both retrieved on different vertical grids and referred to different true profiles. The impact of the interpolation and coincidence errors on number of degrees of freedom and errors of the fused profile is also analysed. The approach developed here to account for the interpolation and coincidence errors can also be followed to include other error components, such as forward model errors.

Many remote sensing observations of vertical profiles of atmospheric variables are obtained with instruments operating on space-borne and airborne platforms, as well as from ground-based stations. Recently, the complete data fusion (CDF) method (Ceccherini et al., 2015) was proposed for use in the combination of independent measurements of the same profile in order to exploit all the available information and obtain a comprehensive and concise description of the atmospheric state. This is an a posteriori method that uses standard retrieval products. With simple implementation requirements, the CDF products are equivalent to those from a simultaneous retrieval, considered to be the most comprehensive way of exploiting different observations of the same quantity (Aires et al., 2012), in spite of a greater computational complexity. However, so far, the data fusion method was mainly applied to measurements performed by the same instrument while sounding the same air sample.

Limited tests were conducted on measurements performed by different instruments when inconsistencies due to differences in the observed true profiles (because of the non-perfect coincidence of the space–time location of the measurements) could degrade the optimal performances of the simultaneous retrieval. About the fusion of data provided by different instruments, it has been proved (Ceccherini, 2016) that the CDF method is completely equivalent to the measurement space solution (MSS) data fusion method (Ceccherini et al., 2009). The latter was successfully applied to the data fusion of MIPAS-ENVISAT and IASI-METOP measurements (Ceccherini et al., 2010a, b) and of MIPAS-STR and MARSCHALS measurements (Cortesi et al., 2016). However, since in these cases the measurements to be fused (referred to as fusing profiles hereafter) carried information about basically complementary altitude ranges, their possible inconsistency did not result in unrealistic fused profiles.

The first applications of data fusion were made with profiles retrieved on the same vertical grid. A first analysis of the effect of different grids on the quality of the fused products was performed and presented by Ceccherini et al. (2016). In this case, the individual profiles were first obtained on grids optimally defined according to the information content of the individual observations. Then, the CDF method was performed using averaging kernel matrices (AKMs) interpolated to a common grid optimized for the data fusion product. Compared to the case in which the individual retrievals are obtained directly on the grid optimized for the data fusion, the number of degrees of freedom (DOF) is reduced by about a quarter with this approach. Thus, in data fusion applications the choice of the retrieval grid can lead to an information content loss that cannot be restored with interpolation.

Here, we consider the general problem posed by the application of the CDF
method to measurements performed by different instruments that are retrieved
on different vertical grids and refer to different true profiles (which
correspond to the case of fusing profiles measured in different
geolocations). The analysis of this problem suggests a modification of the
CDF method, taking into account interpolation and coincidence errors. We
determine the expressions of these errors and show how they enter in the CDF
formula. The study is performed using simulated measurements of ozone
profiles obtained in the ultraviolet and in the thermal infrared in the
framework of the Sentinel 4 (S4) mission (ESA, 2017) of the Copernicus
programme (

The paper is organized as follows: Section 2 presents an account of the problems that occur when the CDF method is applied to vertical profiles retrieved on different vertical grids and referring to different true profiles. In Sect. 3, we theoretically analyse the problems discussed in Sect. 2 and show how the CDF method can be modified to overcome them. In Sect. 4, we show how the solution proposed in Sect. 3 solves the problems discussed in Sect. 2. In Sect. 5, we describe how to deal with forward model errors. Conclusions are drawn in Sect. 6.

The future atmospheric Sentinel missions of the Copernicus programme
(

In order to evaluate the effect of the variability of vertical grids and of
true profiles, three cases are considered:

The simulated measurements refer to the same true profile and are retrieved on the same vertical grid.

The simulated measurements refer to the same true profile but are retrieved on different vertical grids.

The simulated measurements refer to different true profiles and are retrieved on the same vertical grid.

For a meaningful comparison of the quality of fusing and fused profiles, it is necessary to have common a priori profiles and common a priori covariance matrices (CMs). Therefore, the a priori of the fusing profiles, which are produced with individual a priori assumptions, have been modified using the method described in Ceccherini et al. (2014). In the comparisons, the same a priori profiles provided by the McPeters and Labow climatology (McPeters and Labow, 2012) are used for all fusing and fused profiles. The a priori CMs are obtained using the standard deviation of the McPeters and Labow climatology when its value is larger than 20 % of the a priori profile and a value of 20 % of the a priori profile in the other cases. The off-diagonal elements are calculated considering a correlation length of 6 km. The correlation length is used to reduce oscillations in the retrieved profile and the value of 6 km is typically used for nadir ozone profile retrieval (Liu et al., 2010; Kroon et al., 2011; Miles et al., 2015).

The results obtained in the three test cases are reported in Figs. 1–3. These figures show the true profiles in panel (a), the mean value of the true profiles and the profiles obtained from the measurements (TIR, UV and data fusion) in panel (b) and the residuals in panel (c), i.e. the differences between the three estimated profiles and the mean value of the true profiles.

We observe that, while in case 1 the differences between the profile obtained from the fusion and the mean of the true profiles are smaller than, or comparable to, those of the profiles obtained from the TIR and UV measurements, in cases 2 and 3 these differences are significantly larger. Therefore, in cases 2 and 3 the fusion provides a product of poorer quality than that of the single products.

These tests show that the CDF algorithm and the equivalent simultaneous retrieval work well in case 1, while they have problems in cases 2 and 3, where the profiles are retrieved on different vertical grids and are referred to different true profiles, respectively.

As Fig. 1 but for case 2.

As Fig. 1 but for case 3.

The problem encountered in case 2 is due to the fact that the data fusion is
made using estimates of the AKMs on the fusion grid (see Sect. 3.1)
obtained by interpolation of the original AKMs (Ceccherini et al., 2016),
which are only an approximation of the real AKMs on the fusion grid. We
refer to this effect as

In this section, a theoretical analysis is performed to overcome the problems highlighted in the previous section. In Sect. 3.1, we recall the formulas of the CDF method in order to establish the formalism subsequently used in Sect. 3.2, where an upgrade of the method is proposed.

Let us assume to have

The vectors

The CDF solution for the considered profiles is given by (see Ceccherini et
al., 2015)

We note that the vector

The fused profile has a CM, obtained by propagating the errors of

Let us first consider the interpolation error. The vectors

Equation (8) shows that in the presence of different vertical grids the CDF
method combines measurements with sensitivity to the true profile expressed
by

We can explicitly introduce this error in the expression of

Substituting Eqs. (11) and (12) in Eq. (10), one obtains

An estimate of the quantity

For the estimate of the interpolation error, we use the a priori CM

The test cases of fusion 2 and 3 shown in Sect. 2 are here repeated with the modified method described in Sect. 3.2.

In Figs. 4 and 5, we report the noise errors, the interpolation errors and
the coincidence errors related, respectively, to case 2 and case 3, for both
TIR and UV measurements. These errors are calculated as the square root of
the diagonal elements of

Figures 6 and 7 show the fused profiles and the residuals obtained with the modified algorithm compared with the same quantities reported in panels (b) and (c) of Figs. 2 and 3, respectively. In both tests, the modified method provides residuals that are significantly smaller than those obtained with the original CDF method.

Noise errors (red lines), interpolation errors (green lines) and coincidence errors (blue lines) in case 2 for TIR and UV measurements.

As Fig. 4 but for case 3.

These tests show that the upgrade of the CDF method proposed in Sect. 3.2 solves the problems observed in Sect. 2 that occur when either the fusing profiles are retrieved on different vertical grids or they refer to different true profiles. The modified method is a generalization of the CDF that allows its application to a wide range of cases.

The fused profile and the residual error obtained with the modified algorithm (magenta lines) compared with the same quantities of Fig. 2b and c.

The fused profile and the residual error obtained with the modified algorithm (magenta lines) compared with the same quantities of Fig. 3b and c.

We now look at the effect of the generalized method on the errors and on the
number of DOF. Figures 8 and 9 show the errors of the fused profile when we
use either the original or the modified method for cases 2 and 3,
respectively. These errors are calculated as the square root of the diagonal
elements of

The introduction of the interpolation error (case 2) does not significantly modify the errors and determines a decrease in the number of DOF of the fused profile of about 1. The introduction of the coincidence error (case 3) determines a significant increase in the errors and a small decrease in the number of DOF of the fused profile equal to about 0.5. However, in both cases the number of DOF of the fused profile obtained with the modified method is larger than the number of DOF of the individual fusing profiles, proving the information gain provided by the fusion.

Errors of the fused profile when we use the original (black line) and the generalized (magenta line) CDF for case 2.

Errors of the fused profile when we use the original (black line) and the generalized (magenta line) CDF for case 3.

From the analysis of errors and number of DOF we deduce that the interpolation error has the largest impact on the vertical resolution, while the coincidence error has the largest impact on the errors. However, these numerical results depend on the values that interpolation and coincidence errors have in the single cases.

Number of DOF of the profiles obtained with the TIR measurement, the UV measurement, the original fusion method and the modified fusion method for each of the three cases described in Sect. 2.

In this paper, we considered simulated measurements, which generally do not
include all the error components that are present in real measurements. When
real measurements are considered, there are other important error sources
that can cause inconsistency among the fusing profiles, such as forward
model errors, due, for example, to approximations in the model and
uncertainties in atmospheric and instrumental parameters. When performing
data fusion, these errors can also lead to quality loss and show problems
similar to those described in Sect. 2. These problems can be avoided by
accounting for them in the CDF formulation. In particular, Eq. (21) can be
modified to account for an extra CM term,

We analysed the problem posed by the application of the CDF method to vertical profiles obtained with different instruments, which use different retrieval grids and observe different true profiles. To this purpose, we studied simulated ozone profile measurements expected from the MTG payload for the S4 mission of the Copernicus programme: namely, those provided by the IRS in the thermal infrared and by the UVN spectrometer in the ultraviolet. The study showed that the CDF algorithm works well when the fusing profiles are represented on the same vertical grid and refer to the same true profile; otherwise the algorithm provides unsatisfactory results because the fused profile differs from the mean of the true profiles significantly more than the fusing profiles. In the latter case, the CDF method, which uses all the existing information for the determination of the best fused profile, is exploiting the differences due to the inconsistency of the measurements as useful information and provides unrealistic fused profiles.

In order to overcome this new problem, we performed a theoretical analysis that led to a generalization of the CDF method to the cases in which interpolation and coincidence errors occur. The interpolation error is present when the vertical grids of the fusing profiles differ from the fusion grid, meaning that an interpolation of the AKMs is necessary. In this case, the interpolated AKMs are only an approximation of the real AKMs on the fusion grid. The coincidence error is a consequence of the fact that the fusing profiles are not generally co-located in space and time, thus referring to different true profiles.

The generalized algorithm allows for these inconsistencies and provides fused profiles that are in better agreement with the true profiles than those obtained with the original CDF algorithm.

With the new algorithm, the fusion generally provides fused profiles that are also better than the fusing profiles in terms of total error and number of DOF. However, a more comprehensive error budget, which may even cause the fused profile to have larger errors than the fusing profiles (coincidence and interpolation errors do not have to be considered for the individual fusing profiles), is now considered. If neither of the qualifiers (total error and number of DOF) is improved, the fusion process is not justified.

An approach similar to that used to account for interpolation and coincidence errors can also be useful to include other error components, such as forward model errors, in the fusion process.

The data of the simulations presented in the paper are available upon request to the authors.

SC deduced the expression of the interpolation and coincidence errors and wrote the draft version of the paper. BC suggested the idea to introduce the interpolation and coincidence errors and contributed to the interpretation of the results. NZ wrote the Python code of the complete data fusion. CT and SDB performed the simulation of the infrared measurements. JK performed the simulation of the ultraviolet measurements. UC put together the team of authors and coordinated its activity. RD performed a detailed revision of the manuscript.

The authors declare that they have no conflict of interest.

The results presented in this paper arise from research activities
conducted in the framework of the AURORA project
(