Articles | Volume 18, issue 20
https://doi.org/10.5194/amt-18-5637-2025
© Author(s) 2025. This work is distributed under the Creative Commons Attribution 4.0 License.
Cloud fraction estimation using random forest classifier on sky images
Download
- Final revised paper (published on 21 Oct 2025)
- Supplement to the final revised paper
- Preprint (discussion started on 19 May 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2024-3364', Anonymous Referee #2, 06 Jun 2025
- AC1: 'Reply on RC1', Sougat Kumar Sarangi, 31 Jul 2025
-
RC2: 'Comment on egusphere-2024-3364', Anonymous Referee #1, 16 Jun 2025
- AC2: 'Reply on RC2', Sougat Kumar Sarangi, 31 Jul 2025
Peer review completion
AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Sougat Kumar Sarangi on behalf of the Authors (09 Aug 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (19 Aug 2025) by Alyn Lambert
RR by Anonymous Referee #2 (23 Aug 2025)
RR by Anonymous Referee #1 (05 Sep 2025)
ED: Publish as is (05 Sep 2025) by Alyn Lambert
AR by Sougat Kumar Sarangi on behalf of the Authors (05 Sep 2025)
Manuscript
General comments
================
The paper presents cloud fraction estimates using a random forest classifier on sky images. They compare the prediction results of the model against the semantically annotated sky images originating from multiple observation sites in the world. Overall, the paper is well structured and well explained; making it easier for the reader to follow and understand. The authors emphasize on the limitations of the study that are (1) the white-balance / colour calibration of the datasets acquired with different sensors; (2) the air pollution levels that prevents more accurate predictions for India and Australia data; (3) the artifacts on the images (sun glares, cirrus) making it difficult to extract information on sky properties. Despite these, they demonstrated that RF classifier outperforms traditional methods in high-pollution sites. Minor corrections need to be applied to improve readability, structure and syntax.
Specific comments
=================
Introduction:
The introduction lacks a presentation/structure paragraph of the remaining of the sections of the paper. Like : "The paper is structured as follows. In Section 2 we present..."
Section 2
Maybe merge the Section 2 "Observing sites" and 3 "Data Generation and Preprocessing" into one single section called "Data" for easier structure for the reader:
2. Data
2.1 Observing sites and datasets
2.2 Preprocessing
2.3 Selection
2.4 Ground truth masks
Section 3
Add a 2 panels figure to show a raw image vs the preprocessed version, to illustrate where the dead zones are located in the image, and what the circular mask does.
Section 4
Figure 1 must be cited explicitly in the text and lacks a caption sentence.
Section 5
Any idea/hypothesis of why the model performs worse in Canada and Australia ? You stated some explanations for Indian data, but not for these. Why is has the German dataset the best results ?
Section 5.1
The first paragraph needs to be moved in a prior position in the paper as it explains what is the cloud fraction and what's the goal of the model of this study; like at the beginning of Section 4 for example, or even earlier in the paper.
Section 5.2
- First paragraph : accurate white-balance or colour calibration is required then ? Is there some practical solution to this sensor uniformization issue ?
- Add a Figure or subfigure for which the predictions are really accurate, instead of only showing the shortcomings and outliers.
Technical corrections
=====================
Abstract:
- "vary" used too much in 3 consecutive sentences; use synonym
- efficiency instead of efficacy
Introduction:
- line 27-28: instead of we need, use "the scientific community requires specific devices"
- line 31 : capture data at high temporal resolutions
- line 36 : caveat with " Many researchers have adopted this (Chauvin et al., 2015; Chow et al., 2011; Ghonima et al., 2012; Kuhn et al., 2018);(Lothon et al., 2019)" and then comma and end sentence. Something is lacking. Replace by:
The clear sky (CSL) threshold method, as outlined by Shields et al. (2009), uses spectral information—particularly from the red and blue bands—to differentiate between cloudy and clear-sky conditions. This technique has been widely adopted by researchers (e.g., Chauvin et al., 2015; Chow et al., 2011; Ghonima et al., 2012; Kuhn et al., 2018; Lothon et al., 2019). However, a notable limitation is that the threshold value can vary across an image, influenced by the relative distance between the sun and each image pixel.
- line 40 : use citet or citealt and not citep for in-sentence citation.
- line 46 and so on : same citation issue
- line 57 : distinguish clouds from clear sky
Section 2
- line 78 : reformulate better "All these sites have the common make of instrument - a Total Sky Imager (TSI) that takes the sky images."
- line 81 : syntax + repetition "A shadow band is also placed that continuously rotates as it tracks the sun. This shadow band blocks the intense direct sun that can saturate the images."
Section 3.1
- line 108 : "this would be essential" and not instrumental
- line 110 : validation of the models training.
Section 3.2
- MATLAB image labeller app: add some footnote link or citation
Section 4
- line 130 + 131 : replace no. with the number of
- line 131 : did not explain what Mfeat is
- line 137 : reformulate like for example:
"The primary limitation of Random Forest (RF) models lies in their interpretability. Because RF relies on an ensemble of numerous decision trees to generate predictions, it can be challenging to trace and explain the rationale behind a specific prediction."
Section 5
- line 161 : no comma before "were used to train a random forest classifier"
- line 162 : no need for (n_estimators=100) and (random_state=42)
- line 165 : let's call it the "test" or "validation" dataset ?
- line 170 : "Table 1 shows the metrics for each dataset location"
- line 175 : "Overall, ..."
Section 5.1
- Fig 2 : last two sentences need to be moved in the main text after the reference of the Figure or discarded to avoid repetitions in the paper.
- Fig. 2 : To improve clarity, use different symbols for each model, e.g. dots, crosses and triangles. FYI the plots of Fig. 2, do not appear in color on the prepublished version.
- Fig 3 : line 205 : "first column are" english
- Fig 3 : "nicely" is not to be used in the text; use more objective synonym
- line 215 : do you have citation that maybe studies this phenomena ?
- line 224 : the horizontal axis, instead of "x axis"; idem for vertical axis; January to December in full words
- line 229 : "Germany and Canada" datasets, to precise
- Fig 4 : reformulate better like:
"Figure 4: Median Cloud Fraction (CF) heatmaps for four regions—Australia, Germany, Canada, and India—comparing CF estimates from TSI data, RF classifier output, and their percentage difference. The horizontal axis denotes the months (January to December), and the vertical axis indicates the local time of day (06:00–18:00). Distinct regional patterns emerge: TSI tends to overestimate CF in Australia (January–June) and in Germany and Canada, while underestimating CF in India."
Conclusion
- line 274 : "it has numerous..." and not the data as we are talking about the CF