Articles | Volume 15, issue 23
https://doi.org/10.5194/amt-15-7039-2022
https://doi.org/10.5194/amt-15-7039-2022
Research article
 | 
07 Dec 2022
Research article |  | 07 Dec 2022

An improved formula for the complete data fusion

Simone Ceccherini, Nicola Zoppetti, and Bruno Carli
Abstract

The complete data fusion is a method that combines independent measurements of atmospheric vertical profiles. Recently a new formula for the complete data fusion, which does not contain matrices that can be singular and overcomes the generalized inverse approximation used when singular matrices have to be inverted, has been proposed. We show that the new formula is a generalization of the original one and analyze the analytical relationship between the two formulas when generalized inverse matrices are used for the inversion of singular matrices. We extend the new formula to include interpolation and coincidence errors, which must be considered when the profiles to be fused are measured on different vertical grids and at different times and/or locations. Finally, we use a real measurement of the Infrared Atmospheric Sounding Interferometer (IASI) instrument to show the improved performances of the new formula with respect to the original one.

Dates
1 Introduction

The complete data fusion (CDF) was first introduced in Ceccherini et al. (2015) as a new data processing method that allows for the combination of several independent measurements of an atmospheric vertical profile and more generally of any vectorial quantity that is retrieved using the optimal estimation method (Rodgers, 2000). It is called “complete” for its capability of considering all the features of the measurements that are being combined, that is, not only their errors but also their vertical resolution. The inputs of the method are the profiles retrieved from the individual measurements using the optimal estimation method together with their a priori profiles, averaging kernel matrices (AKMs) and noise covariance matrices (CMs), as well as an a priori profile with its CM used to constrain the fused profile. The output of the method is a single profile (the fused profile) with its AKM and CM. The a priori information used to constrain the fused profile can be freely chosen, independently of the a priori information used in the retrievals of the individual profiles so that the method can also be used to change the a priori of a retrieved product. When in the variability range of the results of the individual retrievals the linear approximation of the forward models is appropriate, the method is equivalent to the simultaneous retrieval of all the measurements that are combined (see the Appendix of Ceccherini et al., 2015 for proof). The implementation of the simultaneous retrieval requires the integration of the different radiative transfer models that simulate the measurements of the different sensors into a single inversion system and access to the different (Level 1) measurements, implying the use of large computational resources specifically developed for each fusion operation. The CDF overcomes these complications by combining the Level 2 products separately supplied by the different retrieval processors.

The method has been extended to fuse profiles retrieved on different vertical grids for which an interpolation on a common grid is needed and to deal with measurements obtained either at different times or from different platforms and therefore referred to different true profiles. This extension required the introduction of interpolation and coincidence errors in the fusion process (Ceccherini et al., 2018).

The performance of the method has been studied on ozone profiles retrieved from simulated measurements in the ultraviolet, visible and thermal infrared spectral ranges for the Sentinel-4 and Sentinel-5 missions of the Copernicus program (Tirelli et al., 2020; Zoppetti et al., 2021). The results of these studies show that the CDF is able to provide products of improved quality with respect to the input products in terms of reduced errors and an increased number of degrees of freedom.

A problem connected with the application of the CDF formula is the presence of the inverse matrices of the noise CMs of the input profiles, and this implies that the formula can be rigorously applied only when the noise CMs are nonsingular. When the profiles are retrieved solving ill-posed inverse problems (which is a very common case), this condition is not satisfied. In this case, we can still apply the CDF formula, replacing the inverse matrices of the noise CMs with the generalized inverse matrices (Kalman, 1976), but the result is an approximation. Furthermore, a practical problem in the use of the generalized inverse matrices is the definition of the threshold for the eigenvalues for which eigenvalues smaller than this threshold have their inverses replaced with zeros. Too small values for this threshold determine significant numeric noise in the products; on the other hand, too large values of this threshold determine a loss of useful information.

Recently, following the approach of the Kalman filter (Kalman, 1960; Rodgers, 2000) as done in Schneider et al. (2022), a different formula for the CDF has been derived (Ceccherini, 2022) for the fusion of two profiles. This formula contains the inverse matrices of the retrieval error CMs, which include both the noise and the smoothing errors, instead of the inverse matrices of the noise CMs. Differently from the noise CMs, the retrieval error CMs are always nonsingular matrices, and the new formula can be used without having to resort to the use of generalized inverse matrices.

In this paper we extend the new formula to the fusion of any number of profiles and show that it is a generalization of the original CDF formula given in Ceccherini et al. (2015). Furthermore, we analytically analyze the differences between the new formula and the original one when the generalized inverse matrices are used for the inverse of the noise CMs. Since in the application of the CDF to real measurements it is common practice to interpolate between different grids and to consider imperfect coincidence of the fusing profiles, the new formula is also used to derive the operational expression that takes into account interpolation and coincidence errors.

Finally, we use a measurement of the Infrared Atmospheric Sounding Interferometer (IASI) instrument (Clerbaux et al., 2009) to show the improved performances of the new formula with respect to the original one in the case of real data.

In Sect. 2, we show that the new formula is a generalization of the original one and extend it to handle the cases where coincidence and interpolation errors are present. In Sect. 3, we compare the performances of the two formulas using an IASI measurement, and in Sect. 4 we draw the conclusions.

2 Theoretical analysis of the CDF formula

2.1 The new formula as a generalization of the original one

We assume to have N profiles x^i retrieved on the same vertical grid with the optimal estimation method (Rodgers, 2000) from N independent measurements of a true atmospheric profile xt. The profiles x^i are characterized by the AKMs Ai=x^ixt, which determine the sensitivities of the profiles x^i to xt, and the CMs Si, which determine the retrieval errors.

Before introducing the new formula for the CDF, let us recall some useful relationships. The quantities Ai and Si can be written as a function of the two quantities that characterize the retrievals, that is, the Fisher information matrices (Fisher, 1935) Fi=KiTSnyi-1Ki (Ki being the Jacobian matrices of the forward models and Snyi the CMs of the noise errors of the measured radiances yi), which characterize the measurements, and the a priori CMs Sai used in the retrievals, which characterize the constraints. The expressions of Ai and Si as a function of these two quantities are

(1)Ai=Fi+Sai-1-1Fi,(2)Si=Fi+Sai-1-1.

We also recall that the Si are the sum of two contributions: Sni, the CMs of the noise errors, and Ssi, the CMs of the smoothing errors, that are respectively equal to

(3)Sni=Fi+Sai-1-1FiFi+Sai-1-1,(4)Ssi=Fi+Sai-1-1Sai-1Fi+Sai-1-1

and, as we can see from Eq. (2), the inverse matrices of Si always exist.

Using the Kalman filter (Kalman, 1960; Rodgers, 2000) the new formula for the CDF was obtained in Ceccherini (2022) in the case of the fusion of two profiles. With an iterative procedure that one by one adds the extra profiles to the fused product (see Appendix A), it can be generalized to the fusion of N retrieved profiles x^i and expressed by the following formula:

(5) x f = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 α i + S a - 1 x a ,

where xf is the fused profile, xa and Sa represent the a priori profile and its CM used to constrain the fused profile, and

(6) α i = x ^ i - x a i + A i x a i ,

with xai being the a priori profiles used in the retrievals of the individual x^i. In general, the a priori profiles xai can be different among them and from xa. In the following we refer to the CDF formula given in Eq. (5) as CDF (2022).

From Eqs. (1)–(3) we see that we can express Sni in terms of Ai and Si:

(7) S n i = S i A i T = A i S i

and in the hypothesis that the CMs of the noise errors are nonsingular matrices we can obtain Si-1:

(8) S i - 1 = A i T S n i - 1 .

Substituting them in Eq. (5) we obtain the original formula for the CDF given in Ceccherini et al. (2015):

(9) x f = i = 1 N A i T S n i - 1 A i + S a - 1 - 1 i = 1 N A i T S n i - 1 α i + S a - 1 x a

which, differently from Eq. (5), holds only in the case that the CMs of the noise errors Sni are nonsingular matrices. Therefore, Eq. (5) is more general than Eq. (9). In the following we refer to the CDF formula given in Eq. (9) as CDF (2015).

As already stated, the output of the CDF is not only the fused profile, but also its AKM and CM. The AKM and the CM of the fused profile calculated using Eq. (9) also contained the inverse of Sni in the formulas (Ceccherini et al., 2015). We can now calculate these quantities for the products of Eq. (5), aiming at obtaining expressions that do not contain the inverse of matrices that may be singular. From Eq. (5) the AKM of xf is given by

(10) A f = x f x t = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 α i x t = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 A i ,

where we have used Eq. (6) for the calculation of the derivatives.

The noise CM of xf is obtained by exploiting the fact that the noise CMs of αi are Sni; therefore,

(11) S nf = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 S n i S i - 1 i = 1 N S i - 1 A i + S a - 1 - 1 .

Substituting Sni given in Eq. (7) in Eq. (11), we obtain

(12) S nf = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 A i i = 1 N S i - 1 A i + S a - 1 - 1 .

The CM of xf is obtained adding to Eq. (12) the CM of the smoothing errors:

(13) S sf = i = 1 N S i - 1 A i + S a - 1 - 1 S a - 1 i = 1 N S i - 1 A i + S a - 1 - 1 ,

obtaining

(14) S f = S nf + S sf = i = 1 N S i - 1 A i + S a - 1 - 1 .

In this section, following the recent results published in the literature, we started from the formula CDF (2022) and demonstrated that it is a generalization of CDF (2015). An alternative line of thought can also be followed. One can start from the formula CDF (2015), valid only in the hypothesis that the noise CMs are nonsingular matrices, and using Eq. (8) derive the formula CDF (2022). Noticing that the use of this formula does not require the hypothesis that the CMs of the noise errors are nonsingular matrices anymore, one can assume its general validity. The correctness of this assumption is then confirmed by the fact that CDF (2022) can also be obtained using the Kalman filter as shown in Ceccherini (2022).

In Appendix B we rewrite some equations in a way that better highlights their physical meaning, although Eqs. (5) and (9) remain the CDF equations that can be used operationally.

2.2 Relationship between CDF (2022) and CDF (2015) with generalized inverse matrices

In the introduction we mentioned that, using the approximation of the generalized inverse matrices (Kalman, 1976), the original formula CDF (2015) can also be used in the case of Sni singular. Therefore, in this section, we investigate the differences between CDF (2022) and CDF (2015) when in the latter the generalized inverse matrices of Sni are used. In Eq. (9) we replace the matrices Sni-1 with the generalized inverse matrices Sni#:

(15) x f = i = 1 N A i T S n i # A i + S a - 1 - 1 i = 1 N A i T S n i # α i + S a - 1 x a .

Sni# appear in two terms. For the first term it has already been demonstrated in the Appendix of Ceccherini et al. (2012) that

(16) A i T S n i # A i = F i = S i - 1 A i ,

where the second equality follows from Eqs. (1) and (2). Therefore, the first term is equal in the two CDF formulas.

We can elaborate the second term using Eqs. (1)–(3):

(17) A i T S n i # = F i F i + S a i - 1 - 1 S n i # = F i + S a i - 1 F i + S a i - 1 - 1 F i F i + S a i - 1 - 1 S n i # = S i - 1 S n i S n i # ,

which, in general, are different from Si-1, because SniSni# are different from the identity matrices when Sni are singular matrices.

Therefore, in the case of singular Sni, the CDF (2015) used with the generalized inverse matrices of Sni, Eq. (15), is equivalent to

(18) x f = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 S n i S n i # α i + S a - 1 x a .

This equation shows that the CDF (2015) used with the generalized inverse matrices is an approximation of the more rigorous CDF (2022), and the quality of the approximation depends on how much SniSni# is close to the identity matrix.

2.3 The new formula in the presence of coincidence and interpolation errors

We know that in the applications of the CDF to real measurements it is often necessary to fuse vertical profiles measured on different grids and at either different times or locations so that interpolation and coincidence errors must also be considered. The expression of the CDF with interpolation and coincidence errors, which can be called the operational CDF, was calculated in Ceccherini et al. (2018) and was derived from the CDF (2015) that, as we have seen above, is not valid when there are singular matrices. In this section, we show how the expression of the operational CDF can be written in a more general form, using the CDF (2022) and exploiting the equivalence of CDF (2015) and CDF (2022) in the case that the CMs of the noise errors are nonsingular.

We start from the formula that deals with interpolation and coincidence errors, given in Ceccherini et al. (2018), based on the CDF (2015) and equal to

(19) x f = i = 1 N R i T A i T S ̃ n i - 1 A i R i + S a - 1 - 1 i = 1 N R i T A i T S ̃ n i - 1 α ̃ i + S a - 1 x a ,

where Ri are the generalized inverse matrices of the interpolation matrices Hi, which interpolate the profiles from the retrieval grids to the fusion grid. Furthermore,

(20)α̃i=αi-AiC(i)-RiC(f)xa,fine,(21)S̃ni=Sni+AiC(i)-RiC(f)Sa,fineC(i)-RiC(f)TAiT+AiC(i)ScoinC(i)TAiT,

where xa,fine is the a priori profile used to constrain the data fusion represented on a fine grid that includes all the levels of the fusion grid and of the N retrievals grids. C(i) and C(f) are the sampling matrices from this fine grid to the grid of the ith retrieval and to the fusion grid, respectively. Sa,fine and Scoin are respectively the fusion a priori CM and the CM describing the variability of the true profiles related to the measurements that we fuse: both CMs are represented on the fine grid. The same limit of Eq. (9) also applies to Eq. (19) that, evidently, can be written only in the hypothesis that S̃ni are nonsingular matrices.

In order to write an equation similar to Eq. (7) for S̃ni, we define the matrix S̃i:

(22) S ̃ i = S i + A i C ( i ) - R i C ( f ) S a , fine C ( i ) - R i C ( f ) T + A i C ( i ) S coin C ( i ) T

and from Eqs. (7), (21) and (22) we see that the following equation holds:

(23) S ̃ n i = S ̃ i A i T .

We observe that the matrix S̃i is not symmetric and, therefore, does not represent a CM. However, this only concerns the physical meaning of the quantities and does not interfere with the validity of the equations. On the other hand, we can see from Eq. (21) that S̃ni is symmetric and, therefore, equal to its transpose so that the following equation also holds:

(24) S ̃ n i = A i S ̃ i T .

We substitute Eq. (23) in Eq. (19) and obtain

(25) x f = i = 1 N R i T A i T S ̃ i A i T - 1 A i R i + S a - 1 - 1 i = 1 N R i T A i T S ̃ i A i T - 1 α ̃ i + S a - 1 x a .

From Eq. (23) we see that the hypothesis of S̃ni nonsingular implies that Ai and S̃i are also nonsingular; therefore, from Eq. (25) we obtain the new formula for operational CDF that no longer contains inverse of matrices that can be singular:

(26) x f = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 i = 1 N R i T S ̃ i - 1 α ̃ i + S a - 1 x a .

It is simple to see that in the case of the absence of interpolation and coincidence errors (that is all the vertical grids coincide and Scoin is zero), Eq. (26) becomes Eq. (5). Therefore, Eq. (26), which coincides with the operational CDF of Eq. (19) when S̃ni are nonsingular and coincides with the CDF (2022) in the absence of interpolation and coincidence errors, can be used as the new operational CDF that is rigorously valid also when the noise CMs of the retrieved products are singular matrices.

We can also calculate the AKM and the CMs of the fused profile obtained using Eq. (26). The AKM of xf is given by

(27) A f = x f x = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 i = 1 N R i T S ̃ i - 1 α ̃ i x = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 i = 1 N R i T S ̃ i - 1 A i R i ,

where x is the unknown profile estimated by the data fusion, which for example can be the mean value of the true profiles of the measurements that are fused. The value of the derivative α̃ix=AiRi is obtained from Eq. (17) of Ceccherini et al. (2018).

Exploiting the fact that the CMs of α̃i due to noise, interpolation and coincidence errors are S̃ni (Ceccherini et al., 2018), the corresponding CM of xf is equal to

(28) S nf = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 i = 1 N R i T S ̃ i - 1 S ̃ n i S ̃ i - 1 T R i i = 1 N R i T A i T S ̃ i - 1 T R i + S a - 1 - 1 .

In order to simplify this equation, we consider the symmetric matrix given by the product S̃i-1S̃niS̃i-1T and use Eq. (24):

(29) S ̃ i - 1 S ̃ n i S ̃ i - 1 T = S ̃ i - 1 A i S ̃ i T S ̃ i - 1 T = S ̃ i - 1 A i = A i T S ̃ i - 1 T ,

where the last equality is obtained making the transpose and exploiting the fact that the matrix S̃i-1S̃niS̃i-1T is symmetric.

Using Eq. (29) in Eq. (28), the CM Snf becomes

(30) S nf = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 i = 1 N R i T S ̃ i - 1 A i R i i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 .

The smoothing error CM of xf is equal to

(31) S sf = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 S a - 1 i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1

and the CM of xf, obtained adding to the Snf given in Eq. (30) the smoothing error CM given in Eq. (31), is equal to

(32) S f = S nf + S sf = i = 1 N R i T S ̃ i - 1 A i R i + S a - 1 - 1 .

This is a useful new equation that was not considered in Ceccherini et al. (2018).

3 Performance comparison of the original and the new formula using an IASI measurement

In this section, we show an example of the error that we make using CDF (2015) instead of CDF (2022) on real data, using a Metop-B IASI ozone measurement acquired in the geolocation 43.45 of latitude and 10.77 of longitude at 08:45:56 UTC on 18 October 2021.

In Fig. 1 we report the retrieved ozone profile with its a priori profile, errors and averaging kernels obtained with the Fast Optimal Retrieval on Layers for IASI (FORLI), described in Hurtmans et al. (2012) and Astoreca et al. (2014). This product was downloaded from the web page IASI Combined Sounding Products – Metop. FORLI retrieves the ozone profiles by means of the optimal estimation method, and the radiative transfer calculation is performed using tabulated absorption cross sections at various pressures and temperatures in order to speed up the calculation time. The derivatives of the direct model with respect to the state vector are computed analytically. The retrieval spectral range is 1025–1075 cm−1 and the a priori information relies on the McPeters–Labow–Logan climatology of ozone profiles (McPeters et al., 2007). The ozone product of FORLI is a profile retrieved on 40 layers between surface and 40 km, with an extra layer from 40 km to the top of the atmosphere.

https://amt.copernicus.org/articles/15/7039/2022/amt-15-7039-2022-f01

Figure 1Panel (a) shows the retrieved ozone profile and the a priori profile, panel (b) shows the errors and panel (c) shows the averaging kernels of the IASI measurement. The dots in panel (c) represent the diagonal values of the AKM.

Download

From Fig. 1 we can see that the profile used in this study is a typical product obtained with the optimal estimation method where most of the information is provided by the a priori as it results from the number of degrees of freedom, obtained by the sum of the diagonal values of the AKM, equal to 3.3 that is much smaller than the number of retrieved points.

In Fig. 2, we report the eigenvalues of Si and Sni for this IASI measurement calculated with the linalg.eigvals function of NumPy Python 3 module version 1.20.2 (NumPy, 2022a).

https://amt.copernicus.org/articles/15/7039/2022/amt-15-7039-2022-f02

Figure 2Eigenvalues of the CMs Si and Sni of the IASI measurement.

Download

As expected, the eigenvalues of Si are all different from zero; on the other hand, only 6 eigenvalues of Sni have large values, while the others have values smaller than the numeric noise. The distribution of the eigenvalues of Sni is due to the fact that the AKM and the retrieval error CM provided to the users are compressed (Astoreca et al., 2017) and are reconstructed using the 6 largest eigenvalues of the Fisher information matrix.

This product is used to perform a consistency check using the two CDF formulas, as described below.

The CDF formula can also be used to estimate, in the linear approximation, how the retrieved profile x^i changes when the a priori profile xai and its CM Sai are changed. This operation, explained in detail in Ceccherini et al. (2014), consists of using the CDF formula with a single input retrieved profile x^i, obtained with its a priori profile xai and a priori CM Sai, and with the application of a new constraint xai and Sai. The new profile x^i, which is the original measurement with a new constraint, can be obtained using either CDF (2022) or CDF (2015):

(33)xiCDF(2022)=Si-1Ai+Sai-1-1Si-1αi+Sai-1xai,(34)xiCDF(2015)=AiTSni#Ai+Sai-1-1AiTSni#αi+Sai-1xai,

where in the expression derived from CDF (2015) we have used the generalized inverse matrices of Sni to deal with the most general case in which Sni is singular.

When in Eqs. (33) and (34) we use a new constraint that is equal to the original one, xai=xai and Sai=Sai, the formulas should provide the retrieved profile x^i. This is a check that we use to validate the self-consistency of the input data and that we can use here to assess the differences between the two CDF formulas.

Substituting αi from Eq. (6) in Eq. (33) and using Eqs. (2) and (16), we obtain that actually

(35) x i CDF ( 2022 ) x a i = x a i , S a i = S a i = x ^ i .

On the other hand, substituting αi from Eq. (6) in Eq. (34) we obtain

(36) x i CDF ( 2015 ) x a i = x a i , S a i = S a i = x ^ i + A i T S n i # A i + S a i - 1 - 1 A i T S n i # - I x ^ i - x a i ,

where I is the identity matrix. The second term of Eq. (36) measures the error made using the generalized inverse and, using Eqs. (1), (3) and (16), we see that, in the case that Sni is nonsingular, it is equal to zero.

We have calculated the difference xiCDF(2015)xai=xai,Sai=Sai-x^i for several values of the threshold used to determine the eigenvalues that are neglected in the calculation of the generalized inverse matrix of Sni. In Fig. 3 we report the consistency test provided by this difference in the case of 3 values of the threshold that correspond to selecting, respectively, the 5, 6 and 7 largest eigenvalues. The generalized inverse matrices are calculated with the linalg.pinv function of the NumPy Python 3 module version 1.20.2 (NumPy, 2022b), which calculates the Moore–Penrose pseudo inverse of a matrix using the singular value decomposition and a threshold for the eigenvalues.

https://amt.copernicus.org/articles/15/7039/2022/amt-15-7039-2022-f03

Figure 3Results of the consistency test with CDF (2015) considering only the 5, 6 and 7 largest eigenvalues in the calculation of the generalized inverse matrix of Sni.

Download

We can see that the smallest differences are obtained for the case of 6 eigenvalues, as expected from the distribution of the eigenvalues. The case of 5 eigenvalues is affected by the loss of useful information; on the other hand the case of 7 eigenvalues is affected by the amplification of the numeric noise. In this case, the choice of the threshold value can simply be done by looking at Fig. 2, where the abrupt variation of the eigenvalues clearly indicates the threshold. In a general case, in which the variation of the eigenvalues is smooth, this test can be used to define the threshold for the eigenvalues, choosing the value that minimizes the difference xiCDF(2015)xai=xai,Sai=Sai-x^i.

Using the optimum number of 6 eigenvalues for CDF (2015), in Fig. 4 we compare the differences xiCDF(2022)xai=xai,Sai=Sai-x^i and xiCDF(2015)xai=xai,Sai=Sai-x^i of the consistency test for the two CDF formulas with the retrieval error of the profile estimated by the square root of the diagonal elements of the CM Si.

As expected the consistency test provides zero differences using CDF (2022), and detectable differences, although much smaller than the retrieval errors, are present when using CDF (2015). These differences are an estimate of the errors introduced by CDF (2015) in the fusion process with respect to the results of CDF (2022). The comparison between Fig. 3 and Fig. 4 shows that the use of a number of eigenvalues that differs from the optimum value by one unity produces an error comparable with the retrieval error; therefore, it is very important to identify the optimum number of eigenvalues with the test described above.

https://amt.copernicus.org/articles/15/7039/2022/amt-15-7039-2022-f04

Figure 4Results of the consistency test applied to the IASI measurement for the two formulas CDF (2015) and CDF (2022) compared with the retrieval error of the profile.

Download

The errors introduced by CDF (2015) depend on the compression used to represent the matrices in the files provided to the users. If less compression was applied to the data, a greater number of eigenvalues could be considered in the calculation of the generalized inverse matrix of Sni, and the errors introduced by CDF (2015) would be further reduced.

When no compression is applied, the errors introduced by CDF (2015) are due to the numerical precision with which the data are provided, because the eigenvalues smaller than the numerical precision of the largest eigenvalue will usually only contribute to the noise of the generalized inverse. Therefore, less compression and improved numerical precision can reduce the approximation introduced by CDF (2015).

4 Conclusions

The original CDF (2015) formula requires the calculation of the inverse matrices of the noise CMs Sni of the input profiles and, therefore, can be rigorously applied only when these CMs are nonsingular. In the other cases, the CDF (2015) can still be used, replacing the inverse matrices of the noise CMs with the generalized inverse matrices, but the result is an approximation. Furthermore, a variable exists in this operation, and a threshold has to be identified for the choice of how many eigenvalues are used in the calculation of the generalized inverse matrices.

A new formula CDF (2022) has been presented that contains the inverse matrices of the retrieval error CMs (the CMs that include both the noise and the smoothing errors), instead of the inverse matrices of the noise CMs. Since the retrieval error CMs are always nonsingular matrices, the new formula can be used without resorting to generalized inverse matrices.

We deduced the analytical relationship between the two formulas and observed that the quality of the approximation provided by the old formula depends on how much SniSni# is close to the identity matrix.

Furthermore, we have obtained the expression of the operational CDF (2022), which can handle interpolation and coincidence errors. The operational CDF (2022) is indispensable for the application of the CDF to real measurements, which are often measured on different vertical grids and at different times and/or locations.

Finally, we have introduced a consistency check that can be used to define the threshold for the eigenvalues of the noise CMs and applied it to a real IASI measurement to evaluate the errors made using CDF (2015) instead of CDF (2022). We observed that in practice the errors introduced by the use of CDF (2015) are much smaller than the retrieval errors and depend on the data compression and numerical precision with which the data are provided to the users.

The errors made with the old CDF (2015) do not appear to be too large, even in the case of a significant data compression; however, the use of the new CDF (2022) and operational CDF (2022) is recommended for data fusion processing.

Appendix A

In this Appendix, we prove that Eq. (5) is the generalization to N profiles of the new formula for the CDF obtained in Ceccherini (2022) in the case of the fusion of two profiles using the Kalman filter.

At the basis of this proof there is the consideration that the product of the CDF is characterized by the same quantities that characterize the retrieval product: CMs, AKM and a priori information; therefore, it can be used as input for successive fusion operations.

Here we demonstrate that if Eq. (5) is valid for N it is valid also for N+1 and, since we know that it is valid for N=2, using the induction principle, we deduce that it is valid for any N.

We suppose to have fused N profiles and, therefore, for hypothesis we have obtained the profile xf given by Eq. (5). Now we fuse xf with another profile x^N+1 using the Kalman filter. From Eq. (16) of Ceccherini (2022), we obtain the new fuse profile given by

(A1) x f N + 1 = S f - 1 A f + S N + 1 - 1 A N + 1 + S a - 1 - 1 S f - 1 α f + S N + 1 - 1 α N + 1 + S a - 1 x a ,

where αf is given by

(A2) α f = x f - x a + A f x a .

Using Eqs. (10) and (14) we derive that

(A3) S f - 1 A f = i = 1 N S i - 1 A i

and using Eqs. (5), (10), (14) and (A2) we derive that

(A4) S f - 1 α f = i = 1 N S i - 1 α i .

Substituting Eqs. (A3)–(A4) in Eq. (A1) we obtain

(A5) x f N + 1 = i = 1 N + 1 S i - 1 A i + S a - 1 - 1 i = 1 N + 1 S i - 1 α i + S a - 1 x a ,

which is Eq. (5) written for the fusion of N+1 profiles. Therefore, as anticipated above, using the induction principle, we can state that Eq. (5) is valid for any N.

Appendix B

In this Appendix, we rewrite some equations of the CDF presented in the paper in a way that better highlights their physical meaning.

If we expand the relationships between the retrieved profiles x^i and the true profile xt to the first order around the a priori profiles xai, we obtain

(B1) x ^ i = x a i + A i x t - x a i + G i ε i = A i x t + I - A i x a i + G i ε i ,

where Gi=KiTSnyi-1Ki+Sai-1-1KiTSnyi-1 are the gain matrices and εi are the noise errors of the measured radiances yi. Using Eqs. (6) and (B1) we can rewrite αi as

(B2) α i = A i x t + G i ε i ,

that is, αi is the true profile smoothed by the averaging kernels of the ith measurement plus the error. Therefore, αi can be interpreted as a measurement of the true profile performed with the weighting functions given by the rows of Ai.

Substituting Eq. (B2) in Eq. (5) we obtain

(B3) x f = i = 1 N S i - 1 A i + S a - 1 - 1 i = 1 N S i - 1 A i x t + S a - 1 x a + i = 1 N K i T S n y i - 1 ε i

and using Eq. (16), Eq. (B3) becomes

(B4) x f = i = 1 N F i + S a - 1 - 1 i = 1 N F i x t + S a - 1 x a + i = 1 N K i T S n y i - 1 ε i .

Equation (B4) clearly shows that the CDF profile is the weighted mean of the true profile, weighted N times with the Fisher information matrices of the different N measurements, and of the a priori profile weighted with the matrix Sa-1.

Using this formalism we can also rewrite Eqs. (10) and (14), which give the expressions of the AKM and CM of the fused profile:

(B5)Af=i=1NFi+Sa-1-1i=1NFi,(B6)Sf=i=1NFi+Sa-1-1.

Equation (B4) is equivalent to Eq. (5) and reveals the physical meaning of the CDF as a weighted mean of a set of measurements. However, while Eq. (5) is expressed using the retrieval products (αi quantities obtained from the retrieved profiles, AKMs and CMs) and, therefore, can be operatively used, the same does not apply to Eq. (B4), which is expressed using unknown quantities (such as the true profile and the errors).

As a final consideration, we notice that the CDF can be traced back to the general approach outlined in Sect. 4.1.1 of Rodgers (2000) once the new linearized independent measurements αi have been introduced. Indeed, if in Eq. (4.20) of Rodgers (2000) we replace the measurements yi with αi, the Jacobians Ki with Ai and the CMs Sεi with Sni, we obtain the CDF formula in the formalism of Eq. (9), apart from the difference that in Eq. (9) the a priori is made explicit. Therefore, the CDF can be interpreted as an optimal estimate obtained by all the considered measurements linearized around the individual solutions. However, the general formalism exposed in Sect. 4.1.1 of Rodgers (2000) cannot be directly applied to the profiles retrieved with the optimal estimation method, because affected by the bias of the a priori and the merit of the CDF is the individuation of the αi quantities that overcome this limitation.

Data availability

The IASI data used in the paper are available on the web page IASI Combined Sounding Products – Metop: https://navigator.eumetsat.int/product/EO:EUM:DAT:METOP:IASSND02 (last access: 21 October 2022; EUMETSAT, 2010). The results of the analysis performed on these data are available from the authors upon request.

Author contributions

SC derived the new formula of the CDF and extended it to the case in which coincidence and interpolation errors are present. He wrote the draft version of the paper. NZ contributed to the method extension, implemented the formulas in a Python code and performed the test on the IASI measurement. BC contributed to the interpretation of the results and made extensive revisions, giving a coherent structure to the paper. All the authors revised the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements

The authors thank EUMETSAT for producing and distributing the IASI data used in the paper.

Review statement

This paper was edited by Alyn Lambert and reviewed by two anonymous referees.

References

Astoreca, R., Hurtmans, D., and Coheur, P.-F.: Fast Optimal Retrieval on Layers for IASI, Algorithm Theoretical Basis Document, The EUMETSAT Network of Satellite Application Facilities, 14 pp., https://acsaf.org/docs/atbd/Algorithm_Theoretical_Basis_Document_IASI_CO_Feb_2014.pdf (last access: 21 October 2022), 2014. 

Astoreca, R., Coheur, P., Hurtmans, D., Hadji-Lazaro, J., George, M., and Clerbaux, C.: Product User Manual, Near real-time IASI CO, The EUMETSAT Network of Satellite Application Facilities, 28 pp., https://www.eumetsat.int/media/41287 (last access 21 October 2022), 2017. 

Ceccherini, S.: Comment on “Synergetic use of IASI profile and TROPOMI total-column level 2 methane retrieval products” by Schneider et al. (2022), Atmos. Meas. Tech., 15, 4407–4410, https://doi.org/10.5194/amt-15-4407-2022, 2022. 

Ceccherini, S., Carli, B., and Raspollini, P.: Quality quantifier of indirect measurements, Opt. Express, 20, 5151–5167, 2012. 

Ceccherini, S., Carli, B., and Raspollini, P.: The average of atmospheric vertical profiles, Opt. Express, 22, 24808–14816, 2014. 

Ceccherini, S., Carli, B., and Raspollini, P.: Equivalence of data fusion and simultaneous retrieval, Opt. Express, 23, 8476–8488, 2015. 

Ceccherini, S., Carli, B., Tirelli, C., Zoppetti, N., Del Bianco, S., Cortesi, U., Kujanpää, J., and Dragani, R.: Importance of interpolation and coincidence errors in data fusion, Atmos. Meas. Tech., 11, 1009–1017, https://doi.org/10.5194/amt-11-1009-2018, 2018. 

Clerbaux, C., Boynard, A., Clarisse, L., George, M., Hadji-Lazaro, J., Herbin, H., Hurtmans, D., Pommier, M., Razavi, A., Turquety, S., Wespes, C., and Coheur, P.-F.: Monitoring of atmospheric composition using the thermal infrared IASI/MetOp sounder, Atmos. Chem. Phys., 9, 6041–6054, https://doi.org/10.5194/acp-9-6041-2009, 2009. 

EUMETSAT: IASI Combined Sounding Products – Metop, EUMETSAT, https://navigator.eumetsat.int/product/EO:EUM:DAT:METOP:IASSND02 (last access 21 October 2022), 2010. 

Fisher, R. A.: The logic of inductive inference, J. Roy. Stat. Soc., 98, 39–54, 1935. 

Hurtmans, D., Coheur, P., Wespes, C., Clarisse, L., Scharf, O., Clerbaux, C., Hadji-Lazaro, J., George, M., and Turquety, S.: FORLI radiative transfer and retrieval code for IASI, J. Quant. Spectrosc. Ra., 113, 1391–1408, https://doi.org/10.1016/j.jqsrt.2012.02.036, 2012. 

Kalman, R. E.: A New Approach to Linear Filtering and Prediction Problems, J. Basic Eng.-T. ASME, 82, 35–45, https://doi.org/10.1115/1.3662552, 1960. 

Kalman, R. E.: Algebraic aspects of the generalized inverse of a rectangular matrix, in: Generalized Inverse and Applications, edited by: Nashed, M. Z., Academic Press, San Diego, CA, 111–124, https://doi.org/10.1016/B978-0-12-514250-2.50006-8, 1976.  

McPeters, R. D., Labow, G. J., and Logan, J. A.: Ozone climatological profiles for satellite retrieval algorithms, J. Geophys. Res., 112, D05308, https://doi.org/10.1029/2005JD006823, 2007. 

NumPy: numpy.linalg.eigvals: https://numpy.org/doc/1.20/reference/generated/numpy.linalg.eigvals.html, last access: 21 October 2022a. 

NumPy: numpy.linalg.pinv: https://numpy.org/doc/1.20/reference/generated/numpy.linalg.pinv.html, last access: 21 October 2022b. 

Rodgers, C. D.: Inverse Methods for Atmospheric Sounding: Theory and Practice, Vol. 2 of Series on Atmospheric, Oceanic and Planetary Physics, World Scientific, Singapore, 2000. 

Schneider, M., Ertl, B., Tu, Q., Diekmann, C. J., Khosrawi, F., Röhling, A. N., Hase, F., Dubravica, D., García, O. E., Sepúlveda, E., Borsdorff, T., Landgraf, J., Lorente, A., Butz, A., Chen, H., Kivi, R., Laemmel, T., Ramonet, M., Crevoisier, C., Pernin, J., Steinbacher, M., Meinhardt, F., Strong, K., Wunch, D., Warneke, T., Roehl, C., Wennberg, P. O., Morino, I., Iraci, L. T., Shiomi, K., Deutscher, N. M., Griffith, D. W. T., Velazco, V. A., and Pollard, D. F.: Synergetic use of IASI profile and TROPOMI total-column level 2 methane retrieval products, Atmos. Meas. Tech., 15, 4339–4371, https://doi.org/10.5194/amt-15-4339-2022, 2022. 

Tirelli, C., Ceccherini, S., Zoppetti, N., Del Bianco, S., Gai, M., Barbara, F., Cortesi, U., Kujanpää, J., Huan, Y., and Dragani, R.: Data fusion analysis of Sentinel-4 and Sentinel-5 simulated ozone data, J. Atmos. Ocean. Tech., 37, 573–587, https://doi.org/10.1175/JTECH-D-19-0063.1, 2020. 

Zoppetti, N., Ceccherini, S., Carli, B., Del Bianco, S., Gai, M., Tirelli, C., Barbara, F., Dragani, R., Arola, A., Kujanpää, J., van Peet, J. C. A., van der A, R., and Cortesi, U.: Application of the Complete Data Fusion algorithm to the ozone profiles measured by geostationary and low-Earth-orbit satellites: a feasibility study, Atmos. Meas. Tech., 14, 2041–2053, https://doi.org/10.5194/amt-14-2041-2021, 2021. 

Download
Short summary
A new formula of the complete data fusion that, differently from the original one, does not contain matrices that can be singular is discussed. We show that the new formula is a generalization of the original one and analytically and numerically, using a real IASI ozone measurement, derive the errors made with the old formula when the generalized inverse of singular matrices is used. An operational version of the new formula that includes interpolation and coincidence errors is also provided.