I reviewed a previous version of this manuscript. In this version the authors have made significant rewrites to better describe their methods and results, and I appreciate these efforts. I now better understand the analysis methods. Unfortunately, this better understanding confirms some (in my view) shortcomings of the approach, particularly in the way uncertainty is handled, which gives an overly-optimistic view of how well the method would work (and is not really discussed). I therefore recommend further revisions. I would be happy to review the next version if the Editor would like.
1. Line 5: I would add here that the method requires prior assumptions about effective radius, single scattering albedo, and asymmetry factor. These are non-trivial constraints and the current title and abstract imply that just spectral AOD is needed, which is misleading. I would even consider adding “and optical properties” or something in the title.
2. Line 11: “In the total retrieval uncertainty, the forward model contributes less than 10%, confirming its robustness.” I am still uneasy about this statement because it depends strongly on the relative uncertainty of AOD (which depends on the absolute AOD) and on the strengths of the prior and model parameter constraints. I suggest deleting it.
3. Line 81: “compositions” here should be singular “composition”.
4. Section 2: I think there should be a section for MERRA2 here, since this is an input to the retrieval. Part of my confusion on the previous version of the manuscript was on the role of MERRA2. I think having it up front in this Data section would make things clearer and the method more reproducible. Additionally, as there are so many file types and variable names, the specific file types and variables used should be written (i.e. the five components, how they are converted to number concentrations from optical depths, and then the simple fraction definition).
5. Line 105: which wavelength(s) are SSA and AF used at? From table 1 I think just 440 nm, this should be specified here as well.
6. Line 138: I think this sentence as written is too optimistic and would say that these are “sometimes” available from “ground-based” remote sensing observations. They are rarely available robustly from satellite retrievals, especially for the bulk of scenes where AOD is low. And from ground-based remote sensing like AERONET, they are also much more limited than direct-Sun data due to a less frequent measurement cadence and various other scene requirements (e.g. azimuthal symmetry which limits the applicability in cases of plumes where the aerosol is spatially hetereogeneous).
7. Line 191: I still don’t see why using MOPSMAP directly should result in worse results than the NN emulator. To me it suggests a code bug or an issue with e.g. minimization settings (maybe something numerical which becomes less of an issue in the NN normalization process). The underlying reality is the same and if the code is written well then it should not care whether it is being fed MOPSMAP or an emulator. So I am still suspicious that the NN emulator is needed at all, though it is of course useful as a speedup.
8. Equation 2: I would move lines 200-202 here instead of where they currently are. I was initially confused because Table 1 gives not just AOD but also Reff, SSA, and AF as outputs but in the sense of this section, they are inputs. But then line 200 explains where the 9 inputs come from and what the outputs are.
9. Line 205: the citation for Adam optimization should be given. Yes it’s common, but the ReLU citation is given and that is also common...
10. Equation 12 and sections 3.5 and 4.3: it looks like uncertainties in the model parameters (theta, i.e. SSA, AF, Reff, RH) are not accounted for in the cost function, or in Sy. This is essentially making the assumption that the constraint on these (from AERONET or MERRA2) is perfect, which is obviously unrealistic. For example, even in good conditions the AERONET SSA uncertainty is about 0.03. Given most of the time SSA various from about 0.8 to 1, this is not a small fraction of the range. Looking at e.g. the early Dubovik 2002 aerosol type paper (https://doi.org/10.1175/1520-0469(2002)059<0590:VOAAOP>2.0.CO;2 )one can see that an envelope of +/- 0.03 can encompass e.g. smoke, dust, urban aerosol, and marine dependent on where it falls. And that asymmetry factor shows small distinction between types at 440 nm so may have limited value. This means that the error budget is incomplete and, as a result, analyses based on the propagated uncertainties will be overconfident (the results seem better than they really are). This should at minimum be clearly acknowledged in the manuscript. Again, it is misleading about the fidelity of the results otherwise. Ideally, these uncertainties should be accounted for. One method to do so would be to perturb the four parameters in theta to simulate realistic errors (which might be different from MERRA2) and incorporate this either into Sy (via seeing how it affects calculated AOD in the MOPSMAP emulator). Incorporating model parameter error like this into Sy is common within the OEM. Or, this could be added as an additional uncertainty term within Equation 16.
11. Line 248: what is the justification for these prior uncertainties? It seems arbitrary. If the numbers are changed then the uncertainties and averaging kernel will change as well. If the end use is to take e.g. MERRA2 component fractions as input, then it should be determined by analysis of MERRA2 component fraction uncertainty (which I would imagine is a function of location, among other things).
12. Line 252: Note the AERONET team report direct-Sun AOD uncertainty of 0.02 at 440 nm and shorter wavelengths. 0.01 as used here is for the longer visible wavelengths. So this should be updated.
13. Section 4.5: if I understand correctly, in this section Reff, SSA, AF, and RH are switched around to be retrieved and not assumed. But none of these results are shown, and this is inconsistent with the rest of the retrieval development and analysis in the paper. It is also not clear why the authors chose this as opposed to just using auxiliary inputs like they did elsewhere in the paper. Line 381 sounds honestly like an excuse like the authors did not want to download the MERRA2 data needed for the case study. I also previously had concerns about the reasonableness of using monthly data for this purpose (it ignores sampling issues and real sub-monthly variability). In my view a simple monthly plot and then showing dust AOD is not convincing enough. So I do not think this section as presented is very useful in the context of the paper. My recommendation would be to keep the same retrieval methodology (i.e. auxiliary Reff, SSA, AF, and RH) as elsewhere in the paper, and to do the analysis using daily instead of monthly data. Then the data could be compared on a daily basis with the GEOS fields (sampled around the early-pm satellite overpass time) and a commonly-sampled dust AOD (and AOD for the other aerosol types) could also be presented here. This would be much more direct and meaningful demonstration of the method.
14. Appendix A: is this only supposed to be one paragraph? I am not sure why it is an Appendix, and as mentioned earlier think it would be better as a subsection of Section 2. |
This study presents a novel hybrid machine learning (ML) and physics-based framework for retrieving aerosol composition from multi-wavelength Aerosol Optical Depth (AOD) observations. The approach effectively bridges ML and physical modeling, offering a scalable solution with significant scientific and environmental applications. The research is well-conducted, and the results appear robust. I recommend acceptance after minor revisions addressing the following points:
1)The paper should clarify whether Single Scattering Albedo (SSA), asymmetry parameter (ASY), and relative humidity (RH) are necessary for the retrieval process. If these parameters are required, please briefly discuss how they can be obtained (e.g., from ancillary datasets, reanalysis products, or simultaneous measurements).
2) The manuscript claims that AOD in infrared (IR) wavelengths provides additional information on aerosol composition. Please elaborate on this point—for example, by explaining how IR absorption features are linked to specific aerosol types (e.g., dust, organic carbon) or how they complement visible/UV observations.
3) The text refers to MOSMAP as a "radiative transfer model," but it appears to be a bulk aerosol optical property calculator based on size distribution and refractive index inputs. Please correct this terminology. Additionally, the study relies solely on Mie scattering, neglecting non-spherical scattering methods (e.g., T-matrix for dust). Since dust aerosols are often nonspherical, this simplification may introduce errors. A brief discussion on this limitation and its potential impact should be included.
4) The Optimal Estimation Method (OEM) requires prior information and its associated covariance matrix. The manuscript should clarify: Whether prior estimates are sourced from MERRA-2 or other datasets. How the covariance matrix of the prior is defined (e.g., based on climatological variability, instrument uncertainty, or empirical assumptions).
These revisions would strengthen the manuscript’s clarity and methodological rigor.