Marine cloud base height retrieval from MODIS cloud properties using machine learning

Lenhardt, Julien; Quaas, Johannes; Sejdinovic, Dino

doi:https://doi.org/10.5194/amt-17-5655-2024

Articles | Volume 17, issue 18

https://doi.org/10.5194/amt-17-5655-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-17-5655-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 18

Research article

|

26 Sep 2024

Research article |

| 26 Sep 2024

Marine cloud base height retrieval from MODIS cloud properties using machine learning

Julien Lenhardt, Johannes Quaas, and Dino Sejdinovic

Download

Final revised paper (published on 26 Sep 2024)
Preprint (discussion started on 07 Feb 2024)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-327', Anonymous Referee #1, 19 Mar 2024
Dear editor, dear authors,

This manuscript presents a novel cloud base height retrieval algorithm based on Aqua MODIS level 2 cloud properties. Cloud base and cloud vertical extent are essential to determine the Earth's energy budget and reducing uncertainties would help to constrain studies of cloud regime dependent processes and to evaluate how clouds are represented in models. Therefore, the presented contribution is very welcome and in general fits the scope of AMT very well. Moreover, the use of machine learning poses a new perspective on satellite-based CBH retrievals which could help to overcome the challenge of determining subadiabaticity of clouds.

The presented approach utilizes a CNN-based autoencoder to reduce the dimensionality of input fields of size 128 km x 128 km which feature cloud top height, cloud optical thickness and cloud water path. In a second step an ordinal regression is performed on the latent feature vector produced by the encoder. Therefore, marine CBH observations from surface-based ceilometers which are available in discrete bins, are utilized as ordinal reference data. The study is limited to ocean only to exclude potential complexity related to topography over land and the authors limit the retrieval to scenes with at least 30% cloud cover. In principle, I agree with the author's reasoning that spatial patterns of CTH, COT and CWP could be exploited to retrieve CBH. Overall, this could be an approach that further reduces uncertainties for satellite-based CBH retrievals while at the same time providing an interesting example of how to utilize machine learning for performance gains in satellite remote sensing of clouds. However, while I appreciate the compact form of the manuscript, I would like to encourage the authors to address the following major and minor concerns before considering its publication.
Major points:
I) Autoencoder setup:

The autoencoder (AE) role is to reduce the dimensionality of the input fields. The latent feature vector created thereby is then used as input for an ordinal regression onto ground-based reference CBH retrievals which are available in discrete height bins. While it seems promising to exploit the information inherent in the spatial distribution of CTH, COT, CWP using CNNs, I have the following concerns regarding the AE:
The information how the AE is trained and evaluated are distributed across different parts of the manuscript (i.e. Section 2.4, Appendix B, Appendix C) which makes it difficult to follow the data processing steps clearly. According to Section 2.4, the AE is trained on data from the year 2008. It is not clear which data is used for validation (e.g. in Fig. C.1). Furthermore, the performance on another test data set is not provided specifically for the trained AE which is ultimately used for the CBH retrieval model. This should be included in Section 2 to show the capabilities of the utilized AE. Maybe a figure showing the original and reconstructed CTH, COT, and CWP for one exemplary tile could be helpful to illustrate the process together with panels similar to Fig. B.2 but for training, validation and test data for the actual AE that is used later on.

The selected architecture with the 5 convolutional blocks seems arbitrary. As this study proposes a novel retrieval algorithm, describing the algorithm development (together with validation) should be considered most relevant. Therefore, it is necessary to elucidate decisions, such as architecture, activation functions, loss function, etc., in more detail. The following issues should be addressed:

- Why was this particular architecture chosen (i.e. the 5 Conv. blocks with 3 conv. layers and a max pooling layer each, Table C.1)? Were other combinations tested?

- Why are LeakyReLU and MSE chosen as activation and loss function? Have others been tested? So far, the choices appear arbitrary.

- The validation data should be used to tune the architecture (i.e. number and type of layers, loss function, activation functions, learning rate, etc.) with the goal of minimizing some performance measure (e.g. MSE). The the test data should shed light if the final setup generalizes to independent data.

The advantage/benefit of the AE in leveraging the spatial information and its sensitivity to the input tile size has not been shown so far. The AE reduces an array of 3x128x128 to a vector of 256 values. The size was chosen "to give enough spatial information to the AE" (l. 166). But what is enough spatial information and how sensitive is the reconstruction error and more importantly the performance of the ultimate CBH retrieval to this size? The authors argue that similar spatial scales were used in other studies. However, for instance, Mülmenstädt et al. (2018), and Lu et al. (2021) extrapolated the signal from thinner clouds into a larger domain and Böhm et al. (2019) inferred the CBH from the distribution of CTH within a cloud field. While the assumption of a homogeneous CBH within a certain area is the same, the objective to utilize a certain spatial scale differs from this study. Here the question is, how much spatial information is needed to reproduce the input cloud properties.

How about using CTH, COT, CWP fields from 9x9 tiles around the reference CBH to perform the ordinal regression directly without use of an AE? That would have a similar dimensionality (3x9x9=243). Thereby, a potential benefit of the AE could be assessed making a stronger argument for the choice. Together with some statistically based arguments on the tile size, the method development will be more stringent.

To exploit the input features most efficiently in problems with sparse reference labels, I would argue, that a self-supervised pretraining with subsequent finetuning could result in better overall performance. That way, the trained AE could actually be geared towards CBH retrieval and not just towards reconstructing the input fields. Fabel et al. (2022, https://doi.org/10.5194/amt-15-797-2022) show an example for cloud layer classification (low/medium/high) from all-sky cameras. While they do not consider the ordinal nature of their classes, there are also some works that find targeted neural network solutions for ordinal target data (e.g. Lazaro and Figueiras-Vidal, 2023, https://doi.org/10.1016/j.patcog.2023.109303). I do not expect the authors to change to such methods, as long as it can be shown that the use of an AE actually adds value and that there is some sensitivity towards the tile size which would imply that the AE actually leverages the spatial structure of the input fields leading to an improved CBH retrieval.

II) Ordinal regression:
What parameters are fitted by minimizing the loss is missing. Furthermore, it is unclear which data set (time period, region) is used to train the OR model. Next to the performance on the test data set (Table 2), the performance should also be shown for the training data to see if there is overfitting.

The authors investigate the ability of the AE to generalize on unseen spatial and temporally independent data. While this is interesting, it is further and even more necessary to show the generalizability also for the final CBH retrieval model which includes both encoder and OR. The authors argue that testing the spatial generalizability is challenging due to sparse distribution of collocated retrievals and that "Limiting the training dataset to a selected area would greatly hinder the representativeness notably because the different labels display diverse spatial patterns." (l. 283-284). Would it be possible to train on the northern hemisphere and test on the southern hemisphere and evaluate the performance for each CBH bin seperately? That should ensure that the training data includes sufficient samples from all CBH bins and that a potential different CBH distribution in the test data does not yield an apparent performance difference. If more data would be required, an option would be to obtain MYD06 data directly from NASA's Level-1 and Atmosphere Archive & Distribution System Distributed Active Archive Center (https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/) which goes beyond the 2 years included in CUMULO.

To obtain a reference, the authors adapt CBH retrieval methods based on an adiabatic cloud profile (adapted from Goren et al., 2018) and based on a statistical relationship between cloud geometrical thickness on one hand and CTH and CWP on the other (adapted from Noh et al., 2017). While they could show that their retrieval can achieve lower errors (Table 2), it remains unclear whether the method outperforms a trivial approach. While the overall bias is close to zero, the methods underestimates higher CBHs and overestimates lower CBHs (Fig. 3) which indicates that the algorithm favors CBH values closer to the mean of the reference CBH distribution. Therefore, two things should be investigated to really show the benefit of the proposed method:

- I would argue, that the utilized error metrics are not sufficient to investigate the skill of an algorithm. A model with a high MAE could still have skill if the ordering of the retrieved CBH resembles the ordering of the reference CBHs. While I agree that standard correlation indices are not appropriate for ordinal data, there are some metrics proposed, such as the ordinal classification index (Cardoso and Sousa, 2011, https://doi.org/10.1142/S0218001411009093) which was updated by Silva et al. (2018, https://doi.org/10.1109/IJCNN.2018.8489327) that could help assess the skill and compare different methods.

- How does the algorithm compare against a trivial (always choose the majority bin) and semi-trivial retrieval (always choose the bin for which the MAE or RMSE are minimized when this bin is chosen for all samples)?

III) Global distribution:

The developed CBH retrieval algorithm is applied to assess the global distribution of CBH for the whole year of 2016. However, for this application the authors should better use the previously trained and evaluated model for error statistics are known instead of retraining the model with different data.
Minor points:
In the introduction (specifically l. 61ff), the authors should state more clearly what their goal is (i.e. CBH retrieval with reduced uncertainty) instead of saying the "developed ML model aims to draw on the spatial information [...]" (l.62). They should state why they expect their ML approach to be superior, e.g. through a hypothesis that the spatial pattern of a cloud field holds information on the CBH and the potentially non-linear relationship to satellite observations can be exploited through a convolutional neural network.

structure of Section 2 (Data and methods) should be improved. I suggest:

- Briefly describe the overarching idea of the approach in the introduction:

- building a CBH retrieval using level 2 satellite data (CTH, COT, CWP)

- self supervised training of a CNN applying an autoencoder

- subsequent ordinal regression using ground-based marine CBH retrievals

- Take out the first 2 paragraphs in Section 2 (the text before Section 2.1)

- Instead, put all details on the ground-based and satellite data as well as the methods in the corresponding section (Section 2.1 to 2.5)

- Information from Appendix C should be merged into the main text (i.e. into Section 2.4). It is partly repeating points already mentioned in Section 2.4.

It should be clarified what the reconstruction error is actually. In l. 214 it is mentioned that an l2-norm would be common choice. It should be clarified that this is also applied here. Next to Table C.2 which states that a MSE is used as loss function, terms such as "reconstruction error ratio" (Fig. B.1), "reconstruction relative error" (Table B.2) and "reconstruction error" (Fig. B.2) are used. Providing an equation to define the reconstruction error as a variable would avoid confusion.

"the binning process can lead to an underestimation of the actual CBH" (l. 113) - Has this been shown? Would not an overestimation also be possible?

Use of the term "swath": The term "swath" refers to the width of the instruments view of the Earth surface (e.g. MODIS swath is 2330 km). I think the authors should exchange their usage of "swath" by "granule". A MODIS granule is the information stored in one MODIS file and it covers 2330 km x 2000 km.

Satellite data description should include more details on the MODIS retrieval of CTH, COT, CWP. For instance, it should be mentioned that they require additional input such as temperature, water vapor and ozone profiles from NCEP GDAS (e.g. Platnick et al., 2003, https://doi.org/10.1109/TGRS.2002.808301; or Baum et al., 2012, https://doi.org/10.1175/JAMC-D-11-0203.1) as this has implications on potential uncertainties in particular in remote marine regions with sparse observations available for assimilation.

The study focuses on low clouds for which the CO2 slicing technique fails. It should be outlined, that the CTH retrieval is then based on the 11µm brightness temperature combined with simulated BTs based on vertical profiles from GDAS using surface temperature together with a monthly averaged lapse rate (Baum et al., 2012, https://doi.org/10.1175/JAMC-D-11-0203.1). It would be helpful, if the authors could comment briefly on potential impacts on seasonally and regionally changing biases that might relate to such derived MODIS CTHs.

MODIS tile around surface observation: The description in Section 2.3 implies that the 128 km x 128 km MODIS tile is selected so that the surface observation is located at the center of the tile but it is not explicitly mentioned. Or could the surface observation also be closer towards the edge of the tile? Maybe that could be clarified.

For the tile size, the authors "compromise between considering all the relevant information while not discarding too many samples which might fall outside of the distance limit." (l. 168-169). Does that mean, the 128x128 is big enough to include relevant spatial information but small enough, so that not too many sample have to be discarded because the tile size would exceed the region covered by the MODIS granule? Maybe rephrase the sentence, to be clear.

Cloud cover filter: The authors filter for MODIS tiles with cloud cover >= 30% and mention that a lowering of this filter does not improve the performance. Would one not aim for a filter as low as possible to be able to apply the developed algorithm to as many scenes as possible? In other words, it would be more interesting to see if algorithm performance would decrease if the threshold is lowered. Ideally, the authors would find a minimum value for which the performance still holds but would allow to include more scenes.

The authors mention that the aim of the cloud cover filter is to avoid missing values. If that is the case, why not filter for scenes with a certain minimum portion of valid retrievals directly? Anyhow, it should be stated clearly how scenes with missing values in the CTH, COT, CWP are treated.

To illustrate the functionality of the AE, it would be helpful to show an example of an original MODIS tile and the corresponding reconstruction. This could possibly be combined in a multipanel plot together with the training and test loss (Fig. C.1) and be placed in the subsection with the AE development (Section 2.4).

In Section 3.1 the AE-OR CBH retrieval is compared to methods developed by Goren et al. and Noh et al. The respective input data and how these methods are implemented here should be stated more clearly and be placed somewhere in Section 2. Furthermore, potential differences from the original implementations of those methods should be mentioned. For instance, the Noh et al. applied their method to VIIRS.

Phrases like "fails at predicting with good accuracy" (l. 254) should be avoided. It is not clear what is "good" or "bad" accuracy. The question is, is one retrieval more suitable for a desired application than another.

The authors could also comment (maybe in the conclusion) if they would consider the AE-OR CBH retrieval to work on instruments other than AQUA MODIS.

Equations should be numbered (e.g. in Appendix D) so they can be referenced in the text

2nd Equation in Appendix D: alpha_y -> alpha_i (c.f. Rennie et al. Equation 13)

Additional specific/technical comments
l. 59

"Subsequent uncertainties" -> Subsequently, uncertainties
l. 59

"can then relate to uncertainties" -> propagate into uncertainties
l. 62

"using an innovative machine learning (ML) model" -> using a machine learning (ML) model
l. 64ff

"As the CBH is typically derived from the surface": Consider rephrasing. It should be stated that more accurate CBH retrievals are obtained through ground-based remote sensing which are only available at isolated locations but can serve as reference data to develop satellite-based retrieval algorithms.

"we focus on lower clouds in particular as the retrieval quality is generally higher": I guess, this refers to higher accuracy for ground-based CBH retrievals compared to satellite-based estimates but it is not clearly stated.
l. 107ff

"At the beginning of meteorological [...]" - Due to its content, this sentence should already start the 2nd paragraph which describes the ground-based CBH retrieval.
l. 114-115

"[...] the surface-based observations specify quantities like temperature, humidity and wind speed [...]" - If the listed quantities are not used in the study, this sentence should be removed. Else it should be stated, what they are used for.
l. 116-117

This paragraph turns back to geographic location of the ground-based CBH observations -> move to 1st paragraph of Section 2.1
l. 130

"from the AQUA satellites" - from the AQUA satellite
l. 168-169

"compromise" - the two things the authors "compromise" between seem to pull in the same direction: make the scene larger -> consider more information + obtain more collocations -> that is not what is meant (the trade-off is the representativeness of the ground-based CBH retrieval) -> maybe rephrase
l. 173

"The extracted tile is then filtered" -> start new paragraph
l. 174-175

"The latter condition is primarily aimed at retrievals of poor quality" - maybe change to: "The latter condition is primarily aimed at avoiding retrievals of poor quality"
l. 182-185

"future avenues of research could consider directly modelling unmatched datasets" - This is rather vague. Consider removing this paragraph.
l. 213

"The main goal of the AE is to minimise the loss function" - The main goal is the dimensionality reduction. Maybe state "The main goal of the AE training [...]"
l. 215

"A common choice for the reconstruction metric is the ℓ 2- norm" - The authors should state that this is also what they chose to apply here.
l. 252-253

"later on we refer to the overall conceived method [...] as OR + AE, interchangeably as OR or as the prediction model" - This could be simplified by giving the model one name and staying with it. How about ORABase?
l. 311

"2B-CLDCLASS-LIDAR retrievals closer to the surface are not well captured" -> I think it is supposed to mean, the CBH is not well captured by the retrieval
l. 317

"prediction model achieves similar performance as presented in Table 2" -> What does "similar" mean? The performance should be stated quantitatively.
l. 335-336

"We then spatially aggregate the predictions over the year" -> It should be mentioned what the target pixel size is for the averaging. Furthermore, it is also a temporal averaging, so not just spatial aggregation.
l. 338

"more than 100 CBH retrievals over the year are displayed thus impacting mostly coastal and polar regions" -> displayed where? If you refer to Fig. 5, it seems that values are displayed for all ocean pixels, not just coastal and polar.
l. 363-364

"Using the spatially-resolved information of cloud fields with passive satellites allows to properly quantify lower cloud bases, more specifically avoiding the noisy retrievals of active satellites closer to the surface." -> I think what the authors are trying to say is: Using the herein described CNN-based autoencoder to process spatially-resolved CTH, COT, CWP results in more accurate CBH retrievals compared to the 2B-CLDCLASS-LIDAR product. I mean, using observations from active sensors could probably also be applied to train some retrieval algorithm which performs better than products currently available. But that is not part of this paper. Therefore, such phrasing needs to be revised. Also "avoiding the noisy retreivals" is not precise. It should be stated that certain performance measures are better for the developed retrieval method compared to other products which have been considered for this study.
l. 364-365

"A CNN proves to be valuable to leverage spatial information" - This suggests that the convolutions play a key role. However, that has not really been shown. For instance, one could train a multi-layer perceptron to process pixel-based information to retrieve the CBH for each pixel and then calculate a field average or regress the px values onto the ground truth. Maybe that would produce similar results. All that has been shown was, using a CNN in an AE to reduce the dimensionality with a subsequent OR results in a better CBH retrieval compared to previous methods.
l. 465

"The correct interval for this this sample is then (α ¿ ¿ y−1 , α y )¿" -> 2x "this" and formatting of the interval need to be revised
Fig. 5

MAD CBH shows a white (i.e. missing value) pixel in the South Atlantic (bottom panel), which is filled for the mean CBH (top panel)
Fig. D.1

Is it correct that the right panel shows the same loss for y=4 and y=5?
Citation: https://doi.org/10.5194/egusphere-2024-327-RC1
- AC2: 'Reply on RC1', Julien Lenhardt, 31 May 2024
  
  Please find our response to the referees in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2024-327-AC2
RC2:
'Comment on egusphere-2024-327', Anonymous Referee #2, 03 May 2024

This is a useful and straightforward paper that develops a new algorithm for estimating marine cloud base height from MODIS data, employing a machine learning technique. Evaluations against surface ceilometer observations and CALIPSO data demonstrate its superior performance over previous methods for cloud-base height retrieval. Furthermore, the resulting cloud-base height products are made publicly available on Zenodo, facilitating their utilization by other researchers within the community.
While the methodology and results are robust and convincing, the paper's presentation suffers from several shortcomings. There is a lack of coherence between sentences and paragraphs, making it challenging for readers to follow the logical flow. Additionally, the frequent use of phrases like "It is to be noted that" disrupts the clarity of the text. Grammar errors, such as the phrase "allow to properly quantify" in Line 363, further detract from the overall quality of the paper.
Therefore, I recommend that the authors undertake a comprehensive revision of the language to improve coherence, eliminate ambiguous phrasing, and rectify grammar errors. This revision will enhance the paper's suitability for publication in ACP.

Citation: https://doi.org/10.5194/egusphere-2024-327-RC2
- AC1: 'Reply on RC2', Julien Lenhardt, 31 May 2024
  
  Please find our response to the referees in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2024-327-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Julien Lenhardt on behalf of the Authors (31 May 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Jun 2024) by Peer Nowack

RR by Anonymous Referee #1 (05 Jul 2024)

Suggestions for revision or reasons for rejection

Dear Editor, dear authors,

The authors made a substantial effort to improve the manuscript and to support their results. My previously raised points have been addressed properly and I appreciate the thorough analysis. I recommend accepting the manuscript for publication after the following few minor/technical points have been addressed:

It should be introduced why the reconstruction error, introduced as MSE (l. 211), does not carry a unit. If it is calculated as MSE between output and input field of the AE, it should carry the squared unit of the input variable. Or does it refer to the normalized input? This should be clarified.

The line numbers refer to the tracked changes document.

l. 23-25: This implies that the test data set consists of both ceilometer and CALIOP CBH retrievals. I guess, that is not what is meant. Furthermore, the phrase "performs well on both datasets" is not helpful. A more quantitative statement is necessary.

l. 71-73: "The objective of the developed method is primarily to produce CBH retrievals with reduced uncertainty, and additionally to extrapolate CBH retrievals from local surface observations to a wider spatial and temporal coverage" - Sounds like the ground-based observations are used as input and then together with satellite data extrapolated into space. However, they are only used to train the algorithm.

l. 129-130: One sentence alone should not constitute its own paragraph. Possibly connect it to the following paragraph.

l. 131: Right before, it is stated that earlier reports are based on human observers. Now it is stated "The CBH is derived using a ceilometer". Whether the authors utilize only CBHs retrieved by ceilometers should be clearly stated.

l. 136-141: From Fig. 2b the binning of the utilized ground-based CBH data set becomes clear. However, according to this text passage, the authors use the minimum of the bin range for their evaluation. Other than introducing a negative bias, it is unclear how this procedure affects the results.

l. 148-149: Table 1 caption - references to Section 2.1 and 2.2 seem to be mixed up.

l. 187-189: "The level 2 product [...]" - sentence can be removed.

l. 190: "in particular" - these words can be removed.

l. 207: "Appendix B" -> "Appendix C" (I suppose is meant here)

l. 265-278: The structure of this new paragraph should be improved. First, they say, data are only taken for the year 2008. Then they say, data are only taken for a single year to avoid correlation between training and test data. Then they say they obtain 500.000 samples that are split into training, validation and testing based on retrieval date. So are all these 500.000 samples from the year 2008? And then how are they split into train, validation and test sets? Adding to the confusion is the statement regarding another test set using data over land for the year 2016. Please, clearly state, which period and location are used for training, validation and test, respectively.

l. 354: It should be added that the 9x9-tile-simple method still features the OR. However, instead of using the AE feature vector as input, it uses the 3 cloud properties within the 9x9 tile simply as a flattened vector as input. At least, I assume that is what is done.

l. 357-361: The phrasing appears a bit complicated. The term "the baseline model" is not really defined. There appear 3 "baseline" models if you want (2 trivial, one 9x9 tile) so using this term is a bit confusing. And the error metrics for the two trivial methods should be identical since both always predict the 600m bin. Maybe it would be easier to just state the MA-MAE, and MA-RMSE for the developed AE-OR method and then in increasing order the errors for the other (9x9 method, trivial methods).

l. 431-434: First, it is stated that the AE-OR method with immediate-threshold setup (IT) has similar (low) skill compared to the other retrieval. It is also stated, that the AE-OR method with all-threshold setup (AT) performs much better "on par with the other retrievals". If IT and AT methods differ, they cannot both be similar to the other retrieval method. Consider rephrasing this part.

l. 525 - "pacific" -> Pacific

Hide

ED: Publish as is (15 Aug 2024) by Peer Nowack

AR by Julien Lenhardt on behalf of the Authors (16 Aug 2024) Author's response Manuscript

Short summary

Clouds play a key role in the regulation of the Earth's climate. Aspects like the height of their base are of essential interest to quantify their radiative effects but remain difficult to derive from satellite data. In this study, we combine observations from the surface and satellite retrievals of cloud properties to build a robust and accurate method to retrieve the cloud base height, based on a computer vision model and ordinal regression.