Articles | Volume 17, issue 9
https://doi.org/10.5194/amt-17-2625-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.U-Plume: automated algorithm for plume detection and source quantification by satellite point-source imagers
Download
- Final revised paper (published on 06 May 2024)
- Preprint (discussion started on 25 Aug 2023)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2023-1343', Anonymous Referee #1, 08 Sep 2023
- AC2: 'Reply on RC1', Jack Bruno, 20 Feb 2024
-
RC2: 'Comment on egusphere-2023-1343', Anonymous Referee #2, 23 Jan 2024
- AC3: 'Reply on RC2', Jack Bruno, 20 Feb 2024
-
RC3: 'Comment on egusphere-2023-1343', Anonymous Referee #3, 24 Jan 2024
- AC1: 'Reply on RC3', Jack Bruno, 20 Feb 2024
Peer review completion
AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Jack Bruno on behalf of the Authors (20 Feb 2024)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (26 Feb 2024) by Natalya Kramarova
RR by Anonymous Referee #2 (13 Mar 2024)

RR by Anonymous Referee #3 (14 Mar 2024)
RR by Anonymous Referee #1 (18 Mar 2024)

ED: Publish subject to technical corrections (19 Mar 2024) by Natalya Kramarova

AR by Jack Bruno on behalf of the Authors (19 Mar 2024)
Manuscript
The manuscript presents a novel method to infer CH4 plume point source rates using the U-Net architecture for image segmentation followed by either a convolutional neural network or integrated mass enhancement to estimate the point source rate. They find the approach to be successful across a range of source rates and background noises and suggest a general functional relationship of the point source observability based on the source rate, wind speed, pixel size, and background noise. However, the manuscript lacks some important details about the methodology, specifically data normalization, model training, and model evaluation, which inhibit assessment and reproducibility of the work. Additionally, some claims are more broad than the presented results support; these claims should either be rephrased with more balanced language, or further work should be conducted to better substantiate the claims. Detailed comments are provided below.
1. 6870 scenes were used for training. For ML, this is considered a very small data set size, and small data sets often lead to shortcomings in the trained models when compared to identical models trained on larger data sets drawn from the same distribution. How is the model performance affected when adjusting the data set size by a factor of 2 in each direction?
2. It is stated that 90% of the images were used for training and 10% for testing. In ML applications, a validation set is used to monitor for overfitting during training. Was a validation set used during training? If so, please state that along with associated information (e.g., what fraction of the data set was used for validation, what type of cross validation was used, and any early stopping criteria contingent on the validation loss). If not, please state that and justify why it was not used.
If instead what the authors refer to as the testing set is actually the validation set, please correct the language accordingly, and please introduce a testing set to statistically evaluate the generalization of the model beyond the training and validation data.
3. Related to the above, what loss functions are minimized over which data set (training vs. validation), and what learning rate policies and stopping criteria (if any) were used when training the ML models? How many epochs were used to train each model?
4. Please add details on data normalization for the inputs/outputs of the U-Net and CNN models used in this investigation. This information is necessary to include in the manuscript as it is crucial for both assessment and reproducibility.
5. Please explicitly define the neural network architectures used in this work. The U-Net architecture is clearly defined in the source reference, so it is only necessary to state any deviations from that architecture, if any. The CNN architecture is poorly described here, providing no details on the number of layers, convolutional feature maps or kernel sizes, pooling sizes, the number of nodes in the fully-connected layers, nor activation functions. This information must be included in the manuscript for completeness.
6. Related to the above, how was the CNN architecture determined for this problem? Simple grid search, Bayesian optimization, or something else? Please add details about this in the manuscript.
7. Lines 157-158: Which specific Intel Core i7 CPU was used for this quoted benchmark? The clock speed of i7 CPUs spans a factor of ~4 depending on architecture and TDP, ranging from under 1.1 GHz to over 4 GHz, let alone other variables such as cache amounts. Additionally, please specify whether file I/O was included in this benchmark as well as whether the data were all loaded into RAM at once or batches were loaded on the fly, as there may be significant differences in performance for these scenarios.
8. This is more a general comment on the testing of the presented models. The test set solely consists of synthetic images produced in the same manner as the training data, that is, the test set does not include any real images where the predicted source rate could be compared with an existing data product. For completeness, the authors should perform some comparisons using real GHGSat-C1 scenes which contain an obvious CH4 plume that has had its source rate estimated by one or more existing methods and statistically summarize the differences between U-Plume and those methods. Ideally, many such cases should be included if feasible to better illustrate any trends in biases/deviations between this U-Plume approach and more traditional methods.
9. Lines 230-235: There are contradictory statements. deltaB < 5% is described as both "high background noise" and "low background noise", but only one of these can be true. Please correct this typo so that readers may understand what the authors consider to be low/high background noise.
10. Lines 239-242: Can you comment more on these false positives? How are they distributed with respect to the variables considered in this study? Were any other approaches considered to address them? Are the estimated source rates from these false positives generally small and could be filtered that way rather than based on number of pixels in the mask?
Using the 5-pixel-mask filtering loses 1.6% of the true positive detections; can you comment more on this? Are these generally small source rates at low wind speeds, cases with high background noise, or are they more uniformly distributed throughout the domain of interest?
11. Line 262: It says that "Detection probability is 10% for O_ps = 0.2". In Figure 4, O_ps has a minimum value of 0.5, at which it has a detection probability of 0%. O_ps = 2 looks to be closer to the 10% detection probability mentioned. Please correct this typo.
12. Lines 291-298: It is mentioned that the CNN method is biased towards the mean, that is, it overestimates small sources and underestimates large sources. What is the distribution of source rates in the training set used in this investigation? If that distribution is biased towards the mean, then that could also explain the CNN's reported behavior. If that is the case, the authors should either address this bias in the data set to improve model performance at the extrema (whether in the data set itself, or in the loss function used to train the model) or use more balanced language when discussing this limitation.
This bias towards the mean can also be a consequence of how the CNN was trained (though the manuscript lacks sufficient detail on how these models were trained to determine the likelihood of this being the case - see comments above).
While the CNN is likely to still perform poorly when extrapolating regardless of data set, the authors have not conclusively ruled out bias in the training data set or particular training methodology as the reason for the CNN's worse performance vs. the IME method over the domain that the CNN was trained on. Furthermore, it should be mentioned that expanding the training data set down to 100 kg/h source rates would likely enable the CNN to more accurately recover those scenarios.
13. Lines 380-381: What CPU was used for this quoted benchmark? Please be specific. (See also comment #7 above.)
14. Line 385-387: It states that "Evaluation with an independent dataset ...", but based on the described methodology earlier in the manuscript, it is misleading to describe the test set as "an independent dataset" given that it is drawn from the same distribution as the training data. It is only independent in the sense that it was not part of the training process, but the manuscript has not conclusively demonstrated that the model generalizes to independent data (real measurements). Please use more balanced language here, or perform the tests suggested above in comment #12 to better substantiate this claim.
15. Figure 7: The orange line looks to be biased by the outliers at O_ps ~ 2, as the line is above the vast majority of the data for O_ps < 10. Given that O_ps > 30 is omitted from the fit due to non-linearity (the error bottoms out around 10%, as mentioned), the authors may wish to consider also omitting O_ps < 3 from the fit due to non-linearity.
16. Figure 8: In the left and middle panels, some of the plotted lines are covered by the legend. Please relocate the legend in these panels to avoid this behavior. In the left panel, it looks like it could be placed in the center-left, while the middle panel could relocate the legend to the top-left or bottom-right of the plot.