An improved formula for the Complete Data Fusion
 Istituto di Fisica Applicata “Nello Carrara” del Consiglio Nazionale delle Ricerche, Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy
 Istituto di Fisica Applicata “Nello Carrara” del Consiglio Nazionale delle Ricerche, Via Madonna del Piano 10, 50019 Sesto Fiorentino, Italy
Abstract. The Complete Data Fusion is a method that combines independent measurements of an atmospheric vertical profile. Recently a new formula for the Complete Data Fusion, which does not contain matrices that can be singular and overcomes the generalized inverse approximation used when singular matrices have to be inverted, has been proposed. We show that the new formula is a generalization of the original one and analyze the analytical relationship between the two formulas when generalized inverse matrices are used for singular matrices. We extend the new formula to include interpolation and coincidence errors, which must be considered when the profiles to be fused are measured on different vertical grids and at either different times or locations. Finally, we use a real measurement of the IASI instrument to show the improved performances of the new formula with respect to the original one.
Simone Ceccherini et al.
Status: open (until 03 Oct 2022)

RC1: 'Comment on amt2022182', Anonymous Referee #1, 12 Sep 2022
reply
General
Ceccherini et al. in this work provide an improved formalism for the complete data fusion of OE satellite retrievals, which they have developed earlier. The derivation of this new formalism is rigorous, but could be better situated with respect to the CDF development in previous work. Moreover, the demonstration of the ‘improvement’ in Section 3 looks quite poor. I therefore suggest major revisions, in line with the comments below.
Specific comments
Abstract and throughout the text: It sounds misleading to state that the CDF “combines independent measurements of an atmospheric vertical profile” – thus a single one – given that its extent (in space and time) is unknown and if later the formalism is extended “to include interpolation and coincidence errors, which must be considered when the profiles to be fused are measured on different vertical grids and at either different times or locations.” The latter additionally implies that the indication of a single profile does not hold, in agreement with lines 2628.
Eqs. (5), (10), and (12), are straightforwardly obtained from insertion of Eq. (8) of this work into Eqs. (5), (6), and (7) of Ceccherini et al., 2015, whereafter the need for nonsingular matrices (or the generalized inverses of singular matrices) can be dropped from the CDF formalism. This more intuitive line of thought should be discussed, with appropriate reference to what has already been achieved in Ceccherini, 2022.
Given Sections 2.1 and 2.3, the conclusion of Section 2.2 is quite trivial and could be dropped.
A nice plus with respect to previous work is the brief expression for the full CM of the CDF as expressed in Eq. (14) and its extension in the presence of coincidence and interpolation errors by Eq. (32). Following up on the previous comment, these findings could be better situated with respect to the equations in Ceccherini et al., 2018.
In order to be exhaustive, several things are missing from Section 3. It would at least require (1) a paragraph on the IASI data retrieval (lines 150151), (2) information on the definition of the IASI profiles (cf. Figure 1, partial columns for which layers?), (3) a view on the AKM and CM shapes for the IASI retrieval and a discussion of how representative these are for OE retrievals, (4) expressions for how eigenvalues and generalized inverses are calculated.
Given that (lines 159161) “The distribution of the eigenvalues of S_ni is due to the fact that the AKM and the retrieval error CM provided to the users are compressed (Astoreca et al., 2017) and are reconstructed using the 6 largest eigenvalues of the Fisher information matrix.” the result shown in Figure 3 does not come as a surprise. Figure 4 is then created for “the optimum number of eigenvalues” (please mention six explicitly). The most important conclusion of this work, however, is the following (last sentence of Section 4): “The use of the new CDF(2021) and operational CDF(2021) is recommended for data fusion processing, but the errors made with the old CDF(2015) do not appear to be important, even in the case of a significant data compression.” Numbers should be given with this statement. In order to do so, Figure 4 should be reproduced for a range of generalized inverses (or numbers of eigenvalues), including the popular MoorePenrose pseudoinverse, to quantify what the effect of this choice is on the error that has been made by using CDF(2015). And finally, how does this error relate to the full CM S_f upon CDF of an increasing number of retrieved profiles? That should be the full merit of Section 3.
Technical corrections
Lines 2325: “The method is equivalent to the simultaneous retrieval of all the measurements that are combined when the linear approximation of the forward models is appropriate in the variability range of the results of the individual retrievals.” A proof or reference is needed for this statement.
Referencs to (Ceccherini, 2021) should be updated to (Ceccherini, 2022).
Line 63: AKM and CM as a ‘measure’ for sensitivities and retrieval errors, respectively, does not sound correct. These are retrieval quantities that define / determine the sensitivities and errors.
Line 73: “With an iterative procedure that adds one by one the extra profiles to the fused product…” A proof or reference is needed for this statement.
The IASI data providers are not coauthors nor acknowledged, which seems inappropriate.

RC2: 'Comment on amt2022182', Anonymous Referee #2, 12 Sep 2022
reply
GENERAL COMMENTS
================The paper is well written and describes the employed methods in sufficient detail.
With the last section, the method is applied to a practical example and the differences
are examined.The topic fits the journal well.
The method, as extended in Sect. 2.3 is particularly useful when joining
satellite measurements taken on different grids and with colocation errors,
where only certain diagnostic matrices are provided.
I suggest publication after properly positioning the CDF method as multivariate
inverse CMweighted mean and addressing the major and specific comments below.
MAJOR COMMENTS
==============line 76
I think it might help here the understanding to introduce the relation of
\hat{x} = A x_true + (IA) x_a + G \epsilon
as this shows more readily the nature of the formula: a weighted average of the
true state transformed by the different measurement characteristics of the involved instruments:
x_f = (\sum S^1_i A_i + S_a^1)^1 (\sum S^1_i (A_i x_true,i + G_i \eps_i) + S_a^1 x_a).
which also leads pretty naturally to the derivation of the aggregated averaging kernel matrix.The formula above as well as (10)  (12) can also be simplified drastically by exploiting that
S_i^1A_i = F_i,
which mathematically is very reasonable and fits well to the general framework of optimal
estimation and Kalman filtering.Is the whole method, in its given form, not fully identical to a "simple/straightforward"
linear optimal estimation/maximum likelihood estimate of all involved instruments *linearized*
around the individual solutions? Which is indeed very reasonable, but not really a "new" method.
The new mathematical description makes this pretty obvious in contrast to the
original, more convoluted formula.The given mathematical notation can be argued for due to the information
supplied by typical retrieval products, but both forms, the "standard" form
using the (inverse) Fisher information matrix as weight in a weighted mean and
the given form should be described and compared against each other.The authors should discuss this and how it differs (or not) from the method described, e.g., by
Rodgers in Sect. 4.1.1.
SPECIFIC COMMENTS
=================line 33
You stated that the method delivers the same result as a simultaneous retrieval,
so in what respect or in relation to what can its quality be better?line 48
Didn't you just state that the formula was introduced by Ceccerini (2021)? So it
isn't introduced here, "only" discussed in greater detail?In fact, Ceccerini (2021) seems to suggest that the formula was introduced by
Schneider (2021)?I think the historical development and relationship between the papers and
methods should be discussed in slightly more detail than given here, taking into
account in particular other peoples contributions.line 233
Is the Python code with a reference implementation available? I.e. can the results
of Section 3 be reproduced?
MINOR REMARKS
=============line 7
Who has proposed it?
line 30
Performances ... have > performance has
line 39/43
I would say that while Rodgers provides a very useful discussion on the use of
Kalman filters for the use case at hand, it is not a suitable reference without
also giving (Kalman, 1960; see Rodgers). Are the references in lines 39 and 43
switched?line 66
The readability of the formulas could be greatly improved when the "^1" notation
of the involved matrices would be above the index, not after it.
Simone Ceccherini et al.
Simone Ceccherini et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

162  42  7  211  2  1 
 HTML: 162
 PDF: 42
 XML: 7
 Total: 211
 BibTeX: 2
 EndNote: 1
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1