I reviewed the previous version of this paper. My main issue with the previous submission was that there were too many comparisons and similar unclear acronyms being used, which made it hard to find a coherent message. In this version the authors have reworked and streamlined the paper; I appreciate these efforts and think it makes some aspects easier to follow. However, I think some more work is needed. Some figures are still hard to understand (lots of nearly-overlapping similar dots, large numbers of panels which also tend to look similar). Some of the text is still unclear, out of order, or not needed (e.g. on line 166 there are 8 references for the simple statement that the AHI retrieval uses the Cox-Munk ocean surface reflectance; none of these references are to the Cox and Munk papers describing the actual equations). So it is still difficult to me to pick out a main message other than merging seems better than not merging. I think the relative merits of fusion over just doing the simple bias correction step should also be explored. From the figures, in general, it is difficult to pick out the key points and main message.
I therefore recommend some more revisions for clarity. I would be happy to review the revised version. I do not want to discourage the authors: this will be a good paper for the journal, and as I said in my previous review it is important that aerosol products from these geostationary sensors are given more attention. It is just not quite there yet, in terms of text and graphics.
Line 170: the use of 0.02 mg/m3 Chl still does not seem reasonable to me. I see the text has been changed from the previous submission, but “less contaminated” does not make sense.
Lines 172-190: most of this text is a discussion of the relative merits of the MRM and ESR techniques. As such it probably belongs in the introduction, where these techniques are discussed, rather than in the algorithm description section.
Lines 207-208: “the accuracy of GOCI, according to NDVI, has a negative bias for V1 and mostly a positive bias for V2” I don’t understand what “according to NDVI” means here. Can this be reworded?
Line 263: The Sayer (2013) paper here in fact says the opposite of what the authors cite it for: it says that dAOD is not Gaussian, at least for Deep Blue data. See Figure 5(b) of that work. That is why the Deep Blue team have to define the expected error envelope and normalize to make it Gaussian. I think the authors are doing something similar here (in Figure 2) but the wording of the text implies the opposite.
Figure 2: it would be good to add a line showing the theoretical QQ plot for a Gaussian distribution, to make it easier to show how close the data sets are. Also, it is not clear from the text/caption exactly what is plotted here. The text says “dAOD” but the caption says “z score”. So is it dAOD divided by RMSE? This should be fixed.
Lines 275-282: I think this text is saying that, for each NDVI bin and hour of day bin (Figure 1), the mean bias is calculated and subtracted from the retrievals. Is that right? If so, can this paragraph be streamlined? If that is not right, can what is done be written more clearly?
Section 4: this first part (page 9) uses the word “overestimate” a lot. However what the authors seem to mean is “this satellite combination is higher than that satellite combination”. There is no discussion of the AERONET ground truth here so it is impossible to say whether one (or both) things being compared is overestimating or underestimating. It would be better to say that the two are “offset” relative to each other, as that only implies a difference, while “overestimate” implies an error.
Figures 4, 5: these have 11 panels. Do we really need all these comparisons? Is there some better way to convey the intended message from these comparisons? It is hard to know exactly what details I am supposed to focus on here. I feel the figure is mixing both the comparisons of individual products with the results from individual merges. Maybe it would be best just to show campaign averages of the 4 baseline satellite products here so we can see the differences in them. Or show panels a-e (i.e. FM1 and the satellite products) in one figure and compare the other merges somehow. I am sorry I know it is difficult to present a large amount of information like this, and I am also not sure of the best way.
Figure 6: this is interesting and to me shows that the MLE method helps at this site in low AODs. However it is hard to see some details because there are lots of near-overlapping dots from the measurements every 1 hour or so. And lots of big gaps from night time. So perhaps this plot could have the points plotted at daily scale instead? Or else add an additional figure which focuses on a short period (maybe a day or a few) so the x-axis is zoomed more effectively, and we can see how the products resolve the diurnal variability?
Lines 345-357: I agree that the fused (especially MLE) products are better here. But it is not clear to me how much of this difference is due to the fusing, and how much is due to the bias correction step. Really we have two separate stages here: going from (1) satellite products to bias-corrected satellite products, and (2) from bias-corrected satellite products to the fused bias-corrected products. Unless I have misunderstood what is done here. There is obviously value in doing a bias correction (step 1), but the next step is less clear. Can the authors somehow separate this in the analysis?
Lines 373-374: why compare to the MODIS DT EE when this is the expected uncertainty for a different algorithm and a different sensor? If the authors just want to provide a common reference uncertainty to benchmark against, ok, but then the paper has to be clear that this is not an “expected” error for any of the data sets (or merge) used here so is something of an arbitrary reference. Perhaps the GCOS goal (greater of 0.03 or 10% of AOD) would be a better comparison point since that is an international target not tied to one algorithm and sensor.
Section 5: it is a little strange to have the AERONET data and matchups described here when they were first used in Section 3.4 for the bias correction step. Some of this material should probably be moved earlier in the paper.
Figures 7, 9: again, with 11 panels, it is hard to know what I should be looking for here. What is the main message of these figures? Are they necessary? Maybe it would be better if it were rearranged, with all “raw” satellite products on one row, all “averaged fused” on the middle, all “MLE” on the bottom? Or maybe it could be replaced with some plots of overall dAOD (combined to a smaller number of panels) and we hopefully see that the distribution of dAOD is narrower for the MLE results than the others? I am not sure but think somehow this should be streamlined. I am not sure we need to see 11 (mostly similar) scatter plots.
Figures 8, 10: same comment about 11 panels and hard to know what is the main message I should be extracting here. I think the authors need to think about what they are trying to show here. If it is that MLE has a higher fraction in EE than others, then make a plot showing that more directly. There are a lot of colored dots and a lot of white space and nowhere to easily direct the reader to which panel(s) they should be comparing to tell whether something is better or worse.
Figures 11, 12: why is figure 11 shown as dots and horizontal envelope lines while 12 is dots with vertically drawn bars? I get that the lines and bars are conveying the same information but it would be better to present these two consistently (especially as the figure 12 caption says “as Figure 11”). I personally find the style of Figure 12 easier to see and get the main message (the bias corrected merged products are flatter with smaller errors). However it would be good to pick different colors for the top panels (raw data) compared to the other rows. This is because the eye will naturally compare the same color in each row but that is not relevant here because the colors of the satellite products in the top are not directly relating to the merges in the other rows. It makes sense to match colors between rows two and three because F1-F3 are equivalent to FM1-FM3, just not with the top row.
Section 6: if I understand correctly the authors recommend MLE over simple mean as the merging technique. This should probably be in the Abstract.
Section 6: In the Conclusions I would also appreciate some simpler, higher-level statements rather than repeating numbers about EE, etc mentioned a few pages earlier in the paper (no need to write them about twice). For example, about what the differences between the simple mean fusion experiments F1-F4 tell us, and between the individual MLE experiments FM1-FM3 tell us. For example the fact that FM1 is better than FM2 but F1 is worse than F2 is interesting. Comparing F1 with F2 (or FM1 with FM2) should tell us something about the quality of the NRT vs non-NRT data so it is interesting that the opposite results are obtained between simple mean and MLE approaches. |