Segmentation-based multi-pixel cloud optical thickness retrieval using a convolutional neural network

Nataraja, Vikas; Schmidt, Sebastian; Chen, Hong; Yamaguchi, Takanobu; Kazil, Jan; Feingold, Graham; Wolf, Kevin; Iwabuchi, Hironobu

doi:https://doi.org/10.5194/amt-15-5181-2022

Articles | Volume 15, issue 17

https://doi.org/10.5194/amt-15-5181-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Special issue:

Cloud, Aerosol and Monsoon Processes Philippines Experiment...

https://doi.org/10.5194/amt-15-5181-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 15, issue 17

Research article

|

14 Sep 2022

Research article |

| 14 Sep 2022

Segmentation-based multi-pixel cloud optical thickness retrieval using a convolutional neural network

Vikas Nataraja, Sebastian Schmidt, Hong Chen, Takanobu Yamaguchi, Jan Kazil, Graham Feingold, Kevin Wolf, and Hironobu Iwabuchi

Download

Final revised paper (published on 14 Sep 2022)
Preprint (discussion started on 15 Feb 2022)

Interactive discussion

Status: closed

RC1:
'Comment on amt-2022-45', Anonymous Referee #1, 04 Mar 2022

Comment to “Segmentation-Based Multi-Pixel Cloud Optical Thickness Retrieval Using a Convolutional Neural Network” by Nataraja et al.

This study develops a new machine learning approach to retrieve cloud optical thickness (COT) fields from visible passive imagery. It takes the spatial context of a pixel into account, and thereby reduces artifacts arising from net horizontal photon transfer, commonly known as independent pixel (IP) bias. It demonstrates the different performance of CNN in retrieving cloud properties over various locations. This study provides a baseline for future implementations of the CNN in COT retrievals for different regions. In addition, the paper is well written. It is worthy for publication after necessary modifications.

Line 17-18, a few references regarding the importance of COT might be helpful, with Zhao and Garrett (2015, doi: 10.1002/2014GL062015) suggested.

Line 23, the inhomogeneity issue exists in both spatial and temporal.

Line 46, IP bias is not defined yet in the main text, while defined in the abstract.

Line 51-53, I appreciate the information here. However, I wonder if the satellite spatial resolution is high enough to make us ensure that the optimum occurs at a scale of about 1 km.

Line 63, “distinguished”

Figure 2, It seems to me that the difference (IPA COT) has a very good linear relationship with true COT, making me think that the IPA COT could be highly improved by simply corrected with this linear relationship. If this is ture, why do not we use this simple method?

Line 207, what are the three aerosol number concentrations?

Line 214-217, why do the authors only use two daytime periods?

Equation(5), In my understanding, this equation calculates the water vapor amount instead of liquid water content. Could the authors help explain?

Line 393-394, It seems to me that this sentence need modify to make it clearer.

Line 444-446, could this selection introduce some uncertainties to the results?

Line 536, “50%”?

Figure 11 and 13, unit of CF should be “%” or should be with value less than 1.

Citation: https://doi.org/10.5194/amt-2022-45-RC1
- AC1: 'Reply on RC1', Vikas Nataraja, 20 Apr 2022
  
  Dear Reviewer,
  Thank you for your feedback on our manuscript, and we appreciate you taking the time to do so. Our responses to your comments are detailed below:
  
  Line 17-18, a few references regarding the importance of COT might be helpful, with Zhao and Garrett (2015, doi: 10.1002/2014GL062015) suggested.
  -> Thank you for pointing that out. We will follow this advice in the revised version. Instead of including the haze-specific paper the reviewer suggested, we chose the more general cloud climatology paper on ISCCP by Rossow and Schiffer (1991, doi: 10.1175/1520-0477(1991)072<0002:ICDP>2.0.CO;2) where COT features prominently in Figure 4. We will change the text as follows:
  "Cloud optical thickness (COT) is important for the shortwave CRE, and it is therefore a key parameter in cloud climatologies (e.g., Rossow and Schiffer, 1991, Figure 4). Deriving the COT accurately from satellite imagery will help to improve our understanding of the energy budget."
  
  Line 23, the inhomogeneity issue exists in both spatial and temporal.
  -> We agree, but since we are proposing a spatial-based solution with the Convolutional Neural Network (CNN), we limited our discussions to spatial-related problems.
  
  Line 46, IP bias is not defined yet in the main text, while defined in the abstract.
  -> Thank you for pointing this out, we will make this change in time for the next stage of the review.
  
  Line 51-53, I appreciate the information here. However, I wonder if the satellite spatial resolution is high enough to make us ensure that the optimum occurs at a scale of about 1 km.
  > This is a good point. However, the optimum was determined with synthetic data (Figure 13 in Davis et al., 1997, https://doi.org/10.1175/1520-0469(1997)054<0241:TLSBIS>2.0.CO;2, cited in the paper). In this case, artificial clouds representing stratocumulus were generated with a fractal cloud algorithm and fed into 3D RT calculations to generate synthetic radiance data. From these, COT was derived, and the optimum resolution of an imager was determined. These (and related) findings ended up guiding the choice of the spatial resolution of EOS imagers (at least they were one factor).
  
  Line 63, “distinguished”
  -> We will address this typo in time for the next stage of the review.
  
  Figure 2, It seems to me that the difference (IPA COT) has a very good linear relationship with true COT, making me think that the IPA COT could be highly improved by simply corrected with this linear relationship. If this is true, why do not we use this simple method?
  -> This is a very valid and appropriate observation. In fact, the IPA dependence of (retrieved-true) COT on the COT is even more linear than for the CNN retrieval. We will add this observation to the revised text. It is indeed possible to parameterize this effect as a 3D correction, and this has been done in the past. We described these approaches in Section 1.2. See for example Equations (2), (3), (4), and the related literature citations. Iwabuchi and Hayasaka (2002) introduced a more complex statistical parameterization. The problem is that the parameters in all of these are fixed, and derived for very specific cloud fields in a multivariate fitting manner. What we do instead is use the spatial context to drive the 3D correction in a flexible and generalizable manner. To be more specific, for the simplest parameterization (fitting a linear regression line to the IPA results from Figure 2), the problem is that the slope varies from scene to scene. You can see this in Figure 8, where the slope of the IPA retrieval is plotted as a function of cloud fraction and other scene parameters. In other words, a single slope parameter does not allow the correction of IPA retrievals. The CNN technique can be regarded as a more complex form of "fitting", where a parameterized correction of 3D effects in COT retrievals is done, in part, as a function of the structure of the cloud field. Still, we appreciate this comment as it made us think about the distinction between simple linear parametrizations and more complex CNN approaches. Once we hear back from the complete review committee, we will add a statement to the revised manuscript. Here is a draft of what that might look like:
  "It is worth noting that the IPA in Fig. 2 does appear to have a linear relationship with the (retrieved - true) COT which would imply that it is indeed possible to parameterize this effect as a 3D correction. Furthermore, as we discuss in Sect. 1.2, there have been approaches that have attempted to do so, including Iwabuchi and Hayasaka (2002) who introduced a more complex statistical parameterization. However, the underlying problem with such a method is that the parameters are fixed, and derived for very specific cloud fields using multivariate fitting. By contrast, with our proposed CNN (and future iterations of it), the intention is to utilize the existing spatial context in cloud imagery to learn the underlying features that can then be generalized and applied to correct 3D radiative/net horizontal photon transport effects."
  
  Line 207, what are the three aerosol number concentrations?
  -> Per the Yamaguchi et al. (2009) paper that conducted the study over the Sulu Sea (and cited in this paragraph), the three aerosol concentrations were 35, 150 and 230 mg^-1. We will add this to the revised manuscript.
  
  Line 214-217, why do the authors only use two daytime periods?
  -> The Lagrangian LES conducted by Kazil et al. (2021) focus on a cloud state transition (closed- to open-cell stratocumulus) from the first to the second daytime period. The simulations capture the cloud state transition in its entirety, which is sufficient for the work's objectives. The cloud deck dissipates shortly after the end of the simulations (as seen in satellite imagery) and longer simulations would not provide additional data. Finally, the LES in Kazil et al. (2021) use very large domains and sectional (bin) cloud microphysics, which makes them expensive.
  
  Equation(5), In my understanding, this equation calculates the water vapor amount instead of liquid water content. Could the authors help explain?
  -> Thank you very much for pointing out this error, we missed this. The equation is indeed supposed to use mr_cloud rather than mr_water. We will make this change in the revised manuscript.
  
  Citation: https://doi.org/10.5194/amt-2022-45-AC1
RC2:
'Comment on amt-2022-45', Anonymous Referee #2, 05 May 2022

Overview:

The authors have developed an efficient way of taking a continuous monochromatic radiance field (6.4 x 6.4 km^2 image) formed by sunlight reflected off clouds and mapping it to the underlying cloud optical thickness (COT) field at the pixel scale (0.1 km). More precisely, each pixel is assigned to one of 36 predefined classes of COT defined as intervals. This is done using a convolutional neural net (CNN) trained on synthetic imagery of LES clouds using a forward 3D RT model. The LES clouds of course come with known ground truth in COT. As is customary in machine learning (ML), 1/5 of the synthetic data (radiances and COTs) is set aside for CNN performance evaluation. Specifically, the authors ask how much better does the CNN do compared to the independent pixel approximation (IPA).

Although this is not the first study to use ML to go from radiance fields to COT maps, it is still an emerging method, and the authors are commended for pushing this envelop in a new direction by using a discretized ("segmented") COT scale. The paper will likely become an important contribution to the emerging literature in cloud remote sensing where the 3D variability of clouds is embraced rather than ignored. However, to get there I recommend a major revision of the current manuscript.

Major concerns:

As a reader/reviewer with very little knowledge about ML, I approached this paper with a strong desire to learn more. And about how ML can be applied to a an endeavor that I care about. The fact that none of the authors come with an affiliation in computer science made my expectation even greater. I was, however, disappointed. The key Section 3 was not easy to read. I came out of it feeling that that there was either too much or not enough detail. In particular, what I guess is ML jargon was often not explained.

I strongly recommend that the authors make that Section 3 into an Appendix with improvements suggested below (mostly more details), and leave in the main text (thinking of it as "mandatory reading") a well-crafted high-level summary. That summary of what's going on in the "black box" should be just enough to leap into the interesting results presented in Section 4.

Sequential comments:

* Fig. 1: The radiance scale is missing (best to use "BFR" units, pi I / mu_0 F_0).

* Fig. 1: To better show the IPA underestimation, maybe add a 4th panel: same as (c) but with a stretched scale.

* Fig. 1 and elsewhere: Avoid the "rainbow" color scale that does not work well in B&W print, nor for color-blind persons. Hint: the default "green-yellow" scale in python avoids these pitfalls.

* Section 1.2: The history of efforts to mitigate 3D RT effects is interesting to read. Someday it would be nice to have a more exhaustive version, but here at least one approach antedates BL95 and is worth mentioning, namely:

Cahalan, R.F., 1994. Bounded cascade clouds: Albedo and effective thickness. Nonlinear Processes in Geophysics, 1(2/3), pp. 156-167.

Interestingly, Cahalan's solution involves a multiplicative prefactor, as in (4), rather than BL95's scaling exponent.

* Section 1.3: This too is an interesting read that contrasts (physics-based) cloud tomography and (statistics-based) neural nets. The former is a pretty recent development with the real breakthrough paper being:

Levis, A., Schechner, Y.Y., Aides, A. and Davis, A.B., 2015. Airborne three-dimensional cloud tomography. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3379-3387).

A paper of special interest here that uses a CNN rather than 3D RT in the cloud tomography per se is:

Sde-Chen, Y., Schechner, Y.Y., Holodovsky, V. and Eytan, E., 2021. 3DeepCT: Learning volumetric scattering tomography of clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5671-5682).

* Fig. 2: What happens to the IPA when true COT > 40? We need to see that for a fair visual comparison with CNN.

* Fig. 2: How well does the prediction in (4) for the slope work here, in comparison with the empirical IPA vs true COT slope?

* Fig. 2: To the eye, it looks like, although biased low, the dispersion around the IPA retrieval is much smaller than around the (unbiased) CNN retrieval. Why? This looks like an opportunity to get the best of both approaches.

* lines 190-193: I couldn't comprehend the interchangeable use of "level of coarsening" and "aspect ratio" (AR) until I downloaded and browsed BL95. As I understand, BL95 is based on cloud models generated from 2D (Landsat) imagery. So, there is a user choice of how much geometric thickness (h) to assign to the clouds. In that case, talking about AR makes sense, and BL95 modulated it by varying h. But here the LES-generated clouds are inherently 3D. So, talking about coarse-graining (in the horizontal plane) makes sense. But the AR is something different, unless all that is taken from the LES is the 2D COT field and cloud thickness in the 3rd dimension is assigned and held constant, like in BL95. If that is the case, I missed it.

* line 208: Not an expert here, but I thought that LES microphysics schemes were either "bulk" or "bin" and, in the _former_ case, they can be either 1- or 2-moment. Please clarify "two-moment _bin_ microphysics".

* Section 2.1.2: Can I suggest one figure here to visualize the important differences with the Sulu Sea LES clouds? Something like Fig. 3 for the Sulu Sea simulations.

* Eq. (5): I think you mean r_cloud, not r_water, and maybe "." like in (6), not "*".

* Eqs. (5-6): Why not use the more common "q_lw" and "q_wv" for your mixing ratios? And, accordingly, the usual "Q_ext" for the Mie efficiency factor?

* line 286: Up to 9 km? Is is 6.4 x √2? If so, say it.

* Fig. 4: Great start for understanding the ML technique used here! However, still too many questions and undefined concepts for the non-cognoscente:

How do you get the 64 layers from a single one in step #1? Can it be another number? (I understand the subsequent doubling and halving.)

What is ReLU Activation?

What is Batch Normalization?

(Maybe better to have different colors for these two operations?)

* Section 3.2: "cross-entropy" is explained, but not "one-hot encoding", nor is "softmax activation".

* Eq. (11): Does alpha depend on i or c? If so, which and how? (And add the appropriate subscript.) If not, it can be factored out.

* Fig. 5c: What is the top layer? Looks like a binary cloud mask resulting from all the COT classes.

* line 403: Delete "resolution" (it is the domain size).

* Below Fig. 6: Please tell us a little about the "Adams" optimizer.

* Above Eq. (16): One COT bin (#27) is finally given explicitly. What about the others? Are they linearly sampled? Logarithmically? Surely this discretization of the COT scale also has to be somehow optimized.

* Section 4.1, and below: Better to use "coarsening factor" than "aspect ratio" (see comment above for lines 190-193).

* line 447 and Figs. ≥7: Cloud Variability is an interesting non-dimensional quantity. It seems to have an upper bound of 2, but that isn't clear from the definition. Please clarify.

* Section 4.2: Why is the number of scenes used for training described as "cloud morphology"? (That term does comes up later on, in p. 25 in Sect. 4.4 where clouds from different regions are contrasted.)

* Fig. 9, caption: What are the red dots?

* line 576: typo in "erroneous"

* line 605: Maybe the contradiction found here with BL95 has to do with the key difference (discussed previously) between their use of "aspect ratio" and the "coarsening level" that it is equated to here?

Citation: https://doi.org/10.5194/amt-2022-45-RC2
- AC2: 'Reply on RC2', Vikas Nataraja, 02 Jun 2022
  
  The comment was uploaded in the form of a supplement: https://amt.copernicus.org/preprints/amt-2022-45/amt-2022-45-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/amt-2022-45-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Vikas Nataraja on behalf of the Authors (30 Jun 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (01 Jul 2022) by Jing Wei

RR by Anonymous Referee #3 (14 Aug 2022)

ED: Publish as is (14 Aug 2022) by Jing Wei

AR by Vikas Nataraja on behalf of the Authors (23 Aug 2022)

Short summary

A convolutional neural network (CNN) is introduced to retrieve cloud optical thickness (COT) from passive cloud imagery. The CNN, trained on large eddy simulations from the Sulu Sea, learns from spatial information at multiple scales to reduce cloud inhomogeneity effects. By considering the spatial context of a pixel, the CNN outperforms the traditional independent pixel approximation (IPA) across several cloud morphology metrics.