Application of the Complete Data Fusion algorithm to the ozone profiles measured by geostationary and low-Earth-orbit satellites: a feasibility study

The new platforms for Earth observation from space are characterized by measurements made at great spatial and temporal resolutions. While this abundance of information makes it possible to detect and study localized phenomena, it may be difficult to manage this large amount of data for the study of global and large-scale phenomena. A particularly significant example is the use by assimilation systems of Level 2 products that represent gas profiles in the atmosphere. The models on which assimilation systems are based are discretized on spatial grids with horizontal dimensions of the order of tens of kilometres in which tens or hundreds of measurements may fall in the future. A simple procedure to overcome this problem is to extract a subset of the original measurements, but this involves a loss of information. Another option is the use of simple averages of the profiles, but this approach also has some limitations that we will discuss in the paper. A more advanced solution is to resort to the so-called fusion algorithms, capable of compressing the size of the dataset while limiting the information loss. A novel data fusion method, the Complete Data Fusion algorithm, was recently developed to merge a set of retrieved products in a single product a posteriori. In the present paper, we apply the Complete Data Fusion method to ozone profile measurements simulated in the thermal infrared and ultraviolet bands in a realistic scenario. Following this, the fused products are compared with the input profiles; comparisons show that the output products of data fusion have smaller total errors and higher information contents in general. The comparisons of the fused products with the fusing products are presented both at single fusion grid box scale and with a statistical analysis of the results obtained on large sets of fusion grid boxes of the same size. We also evaluate the grid box size impact, showing that the Complete Data Fusion method can be used with different grid box sizes even if this possibility is connected to the natural variability of the considered atmospheric molecule.


Fusion of 1000 pixels in coincidence
Here, the CDF is applied to 1000 coincident L2 measurements that refer to the same true profile, the same AK matrix and the same CM but have different (noise) errors δi randomly generated according to Eq. (3). The 1000 products have been simulated according to the specification of GEO platform and thermal infrared (TIR) band. It is noted that the particular type of product is of secondary importance in this example, which aims to evaluate the behaviour of the fusion of many coincident measurements of the same type that only differ by the random error. In the left panel of Figure S1 the profile obtained fusing 1000 coincident L2 products is compared with their arithmetic average, with the true profile and with the a priori profile. Since in this case the 1000 pixels are coincident in space and time, no coincidence error δcoinc,i was added in the CDF formulas of Eqs.(6).
In the right panel of Figure S1, the deviations of the fused profile (hereafter indicated with FUS), of the average value of the L2 measurements (indicated as <L2>) and of the a priori profile from the true profile are shown. In the same panel, the estimate of the total error standard deviation σtotal that characterize each of the 1000 L2 profiles (calculated as the root square of Stotal, Eq. (5)), the estimate of the total error standard deviation of FUS profile σf total (calculated as the root square of Sf total, Eqs. (6)) and the estimate of the total error standard deviation of the average of the L2 measurements (calculated by dividing σtotal by √1000, as if no bias is present) are also represented. It is worth noticing that σtotal/√1000 is much smaller than the observed (<L2> minus true) differences, suggesting the presence of a bias. A clear similarity of these differences with the shape of the (a-priori minus true) profile can be observed indicating a link between this bias and the a priori information. The fused profile provides instead a better representation of the true profile with residuals that are consistent with the estimated errors, although these are much larger than σtotal/√1000.

Figure S1
Left panel: Vertical Ozone Volume Mixing Ratio (VMR) profiles. The blue line represents the true profile; the black line indicates the a priori used in the L2 simulations (the same for all the 1000 L2 products) that is also the one used to constrain the fused profile, represented in dark red. The green line represents the average of the 1000 simulated profiles and the grey shaded area centred on the green line represents the standard deviation of the L2 total error (the same for all the 1000 L2 products). Right panel: the green and the dark-red lines represent respectively the difference between the <L2> average and the FUS profile from the true profile. The magenta dash-dotted profile represents the difference between the a priori and the true profiles. The dotted black line represents the standard deviation of the total error estimate of each of the 1000 L2 measurements. The dash-dotted dark-red line represents the standard deviation of the total error estimate of the FUS product. The dotted green line is the standard deviation of L2 total error estimate divided by √1000 Recalling Eq. (8), it is the term ( I − A ) ( x a − x t ) that causes the bias observed in the right panel of Figure S1. Figure S2 compares the amplitude of the bias term ( I − A ) ( x a − x t ) with the mean total error. For illustration, the total errors computed when only considering either 5 or 10 individual measurements are also plotted. As it can be noticed, the mean total error tends to the bias term as the number of profiles increases. When a large number of profiles are considered (order of 1000) the mean total error substantially coincides with the bias itself. Figure S2: comparison of the bias term in Eq. 8 and the difference between <L2> average and true profile (i.e. mean total error) with different numbers of averaged L2 profiles.

Single grid-box analysis (1 o x1 o )
A single 1 o x1 o cell is considered ( Figure S3  Comparing Figure S4 with the correspondent left panel of Figure 2 in the paper, that refers to the 0.5 o x0.625 o cell, it can be observed that while the FUS total error decreases augmenting the number of fused measurements, the FUS-<true>differences show a little increase in their maximum value but still remain much lower than individual L2-true differences.

Figure S4: differences between L2 profiles and their true profiles (green lines), difference between the fused profile and the average of the true profiles (dark red continuous line), average of the total errors of the L2 measurements (black dash-dotted lines), total error of the fused product (dark red dash-dotted lines).
It is also interesting to look at the diagonals of the AK matrices of Figure S7 Figure S5 is evident that the contribute of the LEO UV measurements to the fusion is significant at higher altitudes, where their information content is sensibly higher than in the other kinds of L2 products.  Figure S6 shows the SF DOF obtained in the case of a 1 o x1 o resolution (Table 4). A test of the flexibility of the data fusion procedure is the objective of this analysis and, for simplicity, the same coincidence error used for the higher resolution grid  Figure S6 the SF DOF increases linearly with the logarithm of the number of L2 fusing profiles, like in Figure 6, and with a similar rate of growth so that Figure S6 looks like an extrapolation of Figure 6, for greater values of N. This is because the same types of L2 measurements as in the previous case are being fused.  Figure S7 shows the SF AK and SF ERR now computed for the coarse resolution grid and for the 775 FUS products considered in Table 4. The greater number of fusing observations with respect to Figure 7 produces a general improvement for both the AK diagonal values and the total error, although in the figures it is difficult to detect the first improvement because of the logarithmic scale. The CDF method can be used with a wide range of grid-box size and data compression and the quality of the products generally improves with larger cells. An upper limit to the grid-box size is caused by the requirement of a coincidence error amount, which degrades the quality of the fused product.