the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Deep-Pathfinder: a boundary layer height detection algorithm based on image segmentation
Jasper S. Wijnands
Arnoud Apituley
Diego Alves Gouveia
Jan Willem Noteboom
Download
- Final revised paper (published on 17 May 2024)
- Preprint (discussion started on 02 May 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on amt-2023-80', Anonymous Referee #1, 20 Jul 2023
This paper addresses a very interesting topic and proposes an interesting solution, which can be applied in several ceilometers stations. However, some conceptual points need to be better described/commented.
General comments
Line 123: This is a conceptual error. The Mixing Layer does not exist during the night. After sunset, due to a stable regime, the ML becomes Stable Boundary Layer and Residual Layer.
Line 126: The MLH only occurs during the daytime, because it is a convective process.
Line 127: How is possible to identify the Stable Boundary Layer from RCS if such a layer has only a thermodynamic definition? RCS provides the top of different aerosol layers, which in some cases can be coincident with a stable layer top, but it is not a rule. It is necessary a temperature and/or wind speed profile to identify a stable regime.
Figure 7: MLH only occurs during the daytime. The MLH does not decrease it becames Residual Layer and Stable Layer, it is important to present such division in this picture.
Technical comments
Line 26: “throughout the low atmosphere”
Line 26: I recommend changing “Further” per “Consequently”, because you used “further” other time in the same phrase
Figure 2: It is necessary to correct this figure, because the Mixing Layer only occurs during the daytime.
Line: 119: MLH only occurs during the daytime
Line 124: nocturnal MLH : MLH only occurs during the daytime
Line 125: nocturnal MLH : MLH only occurs during the daytime
Line 139: nocturnal MLH : MLH only occurs during the daytime
Line 140: Is the figure number correct?
Line 222: This phrase is redundant because the convective boundary layer only occurs during the daytime
Line: 225: nocturnal MLH : MLH only occurs during the daytime
Line: 255: nocturnal MLH : MLH only occurs during the daytime
Citation: https://doi.org/10.5194/amt-2023-80-RC1 -
AC1: 'Reply on RC1', Jasper Wijnands, 25 Sep 2023
The comment was uploaded in the form of a supplement: https://amt.copernicus.org/preprints/amt-2023-80/amt-2023-80-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Jasper Wijnands, 25 Sep 2023
-
RC2: 'Comment on amt-2023-80', Anonymous Referee #2, 25 Jul 2023
Comments on
"Deep-Pathfinder: A boundary layer height detection algorithm based on image segmentation"
by J.S.Wijnands et al.
The authors describe the application of a machine learning based computer vision algorithm to ceilometer backscatter data to derive mixing layer height (MLH). The approach to use computer vision based on neuronal network techniques (NN) and apply it to ceilometer backscatter data is to my knowledge new and the approach is therefore of general interest. Nevertheless the article lacks a deeper investigation of abilities and shortcomings of the algorithm. The evaluation with respect to other existing MLH retrievals based on physical considerations is very short and could be expanded. I therefore recommend major revisions to this work.
Basic idea is that backscatter data from ceilometers provide all the information you need to identify the height of the mixed layer (ML). Ceilometers are automatic lidar systems originally developed to determine cloud base which defines the so called 'ceiling' for visible flight. They are therefore widespread at airports and constitute a rather dense measurement network. These instruments also provide - as an additional by-product - profiles of aerosol backscatter which shows especially on more or less cloud free, calm days typical patterns which give the impression that it is possible to visually 'see' how the boundary layer develops and how far up turbulent mixing reaches. It is thus a logical step to use computer vision to analyse these patterns and try to identify the mixed layer (ML) or even the development state of the whole atmospheric boundary layer (ABL).
There are uncounted studies proposing different approaches to derive MLH from ceilometer backscatter data (for a review see Haefffelin et al 2012, or very recent Kotthaus et al 2023). The authors give here also a short review but with a focus on methods which use multiple instruments and parameters. I am missing arguments why this study falls back to a single parameter from a single instrument. I would recommend to mention in the introduction the principal methods based on backscatter alone (Gradient, Wavelet transfrom, variance, fit of a function, ... ) and give arguments why Neuronal Network based computer vision might perform better.
At only one point in the paper it is mentioned that backscatter data provide just a proxy, but not directly the ML. It would be very helpful to give a short description why backscatter shows information about the state of the ABL (different aerosol content in the ML and above in the RL or the FT, ...). Such a description then could be used to argue when and why any algorithm based on aerosol backscatter alone can perform good, and when it has its shortcommings (no aerosol difference between ML and layer above, effects of RH, ...). And of course when and why the algorithm performs good when compared to the reference algorithms (STRATfinder, and instrument build in), Examples for such a discussion can be found in Schween et al. (2014) or references in Kotthaus et al (2023).
These diffculties in MLH detection are strongly connected to the different phases of the diurnal development of the ABL: (1) early morning growth of the ML into the stably stratified, unmixed nocturnal boundary layer at the surface (NBL). (2) Hours till noon: fast growth into the more or less neutrally stratified residual layer (RL), remainder of the ML of the previous day. (3) Noon and early afternoon: fully developed ML growing slowly into the free troposhere (FT). (4) Later afternoon: more or less sudden breakdown of convective turbulence and thus breakdown of mixing. (5) evening and night: development of a new NBL and RL. These ideal development can be found in text books (Stull, 1988 or Garratt, 1992) but also in current articles (Kotthaus et al 2023). The comparison of the different NN based algorithms has been done on three different timeslots for different ABL stages (tables 3-5). But I would recommend to evaluate the performance in comparison with STRATfinder and the instruments algorithm based on these five phases as well, i.e. expand the analysis presented around table 1 and 2 to at least some of these BLH development phases, and discuss results with respect to them: is it possible to detect the ML in every of these phases with equal chance or are there different problems at different phases. And can recommendations for retrieval algorithms derived from this?
It is mentioned in the discussion that Deep-Pathfinder prefers in situations with multiple cloud layers, the lower cloud, while STRATfinder tends to the upper cloud layer. As discussed in several papers (Schween et al. 2014, Kotthaus and Grimmond 2018 and Kotthaus et al. 2023 and references therein) clouds are a challenge for MLH detection, not only because of the more conceptual problem whether a cloud base is really the top of the ML. The evaluation of the algorithm could therefore also be restricted to situations with no or few clouds to see whether the algorithms agree better in such cases. And again the question: can recommendations for retrievals derived from that?
A widespread misconception is that the high backscatter in the lower levels during night identifies the NBL or even a mixing layer. The high backscatter layer is result of the fact that aerosol backscatter does not only depend on aerosol particle number concentration but also on backscatter cross-section, which depends via Mie Theory on particle size, which in turn depends on relative humidity (RH, see e.g. Tuomi 1976 or Fitzgerald 1984 for the exact mathematical formulation and example plots). When the NBL develops in the evening, air is cooling from the bottom and a very stable stratifiaction develops (I am sure this can be seen in the tower data at Cabauw, see also van Ulden and Wieringa 1996). Stable stratification means no, or only very, very weak local vertical mixing. Decreasing temperature lets RH increase and accordingly backscatter increase without a change in number concentration and without mixing of aerosol. This can be nicely seen in fig.2 where RH in the lowest layers nearly reaches 100%. The increase in backscatter is not linear with RH, and only RH values above 70-80% (depending on aerosol properties and wavelength of the lidar) lead to a significant increase in backscatter. As only the lower levels in the NBL reach high RH values above 70% high backscatter values can only be expected in the lower part of the NBL. In summary the increased backscatter values are a nonlinear combination of several parameters. They do not identify the upper border of the NBL and they do not mark a mixed layer. Therefore a backscatter based identification of a surface based layer during night has no meaning at all. I recommend to drop all the night time retrievals except the authors can show on the basis of the Cabauw tower data (potential (Tpot) temperature and water vapor mass mixing ratio (qv), but not RH) that the region of enhanced backscatter really identifies a mixed layer (Tpot and qv constant with height).The authors write that they do not address limitations of the instrument due to its optical design. This is of course dangerous: Instruments have their limitations and any analysis should consider them. This is especially the case with the here used CHM15k ceilometer which is known to have a rather large region of incomplete overlap and an imperfect built in overlap correction (see Hervo et al. 2016). As a result there appear spurious gradients at heights of about 100m and 200m to 300m (see fig.2.iii, i.e. the gradient panel) but also in fig.1, fig.5, fig.6 and fig.8). The no-overlap zone, i.e. the range close to the instrument where no backscatter can be measured, is mentioned but its range is not given in the text. The instrument delivers in this region noise (see e.g.fig.2) and the question arises whether this influences the algorithm. Another speciality of the CHM15k is the output variable which has the name 'beta_raw' in the original netcdf data files and which is described as 'normalized backscatter' and which I guess is used here as 'range corrected signal' (RCS). This variable is the overlap, background and range corrected signal. It lacks a correction for the temporal variability of emitted photos and diverse state parameters of the instrument (see Schween et al. 2014). And more important this variable lacks a conversion to physical units for backscatter (1/(Sr m)). Allthough this is not a full calibration it would make output of the instrument comparable to other ceilometers and it would allow to generalize the algorithm. All these points should be mentioned in the text.
And one minor point:
Despite the similar name, the presented algorithm has nothing in common with the PathfinderTURB algorithm presented in Poltera et al (2017). Please clarify that in the text.--------------------------------------------------------------------------------
Comments to text:
line 21:
"MLH is not constant, ... depending on a myriad of factors ..."The physical model of Ouwersloot and Vilà-Guerau de Arellano (2013) reduces this 'myriad of factors' to a handful: surface sensible heat flux, temperature difference over the inversion layer at top, temperature gradient of the atmosphere above the ML, subsidence.
line 39:
"... measuring the back-scattering by aerosols to eventually obtain estimates of particle concentrations"Backscatter not only depends on particle number concentration but also on backscatter cross section which depends via Mie Theory on particle size which depends on relative humidity.
line 40:
".. incomplete optical overlap of many lidar systems results in a blind spot at low altitudes,"To be precise: it is not a spot but a range of distances and this blind zone is due to no-overlap between laser beam and field of view of the receiver telescope. The incomplete overlap region can in principle be corrected (see Hervo et al 2016).
line 42:"Possibilities and limitations depend mainly on the instrument optical design, which will not be addressed here"
Some points need to be addressed (see above).
line 44:
" ... methods occasionally experience sudden jumps in the MLH profile from one layer to another."I remember several attempts to avoid this kind of jumps (see e.g. Haeffelin et al 2011).
line 48:"it is common to reduce the temporal resolution of the input data to one or two-minute segments"
There are two reasons for this: averaging reduces noise in the data which is important for gradient based approaches.
And secondly: mixing is an average process. It is not important how far up single convective plumes reach ang how far down single 'dry tongues from the FT reach (e.g. Couvreux et al 2007 or Traeumner et al. 2011).
line 57:"Vivone et al. (2021) used edge detection techniques ..."
A former version of the STRAT algorithm named STRAT-2D already used edge detecion (Haeffelin et al 2011). And is edge detection already machine learning?
line 65-:End of introduction and literature review:
All presented machine learning based algorithms combine several parameters. I am missing a strong argument why we go now back to the single parameter aerosol backscatter. I would also recommend to describe the different strategies to derive MLH from backscatter profiles (gradient, fit of a step function, wavelet analysis, etc. see e.g. Haeffelin et al. 2011 or Kotthaus et al. 2023 for overview of these methods).
line 83:"... and limits manual feature engineering,"
What do you mean by feature engineering. Please give examples.
Figure.1:
I am missing a date and annotations on the axes and a color scale.
The image also makes the overlap problems of the CHM15k ceilometer visible (change from dark blue to green in the lower fifth and to the right of the center).
line 95:
"...recorded between June 2020 and February 2022"Why do you not use full years of data but instead one year and nine months, covering two winters, one and a half summers and only one spring.
line 98:Range corrected signal:
Output of the CHM15k ceilometers depend on firmware version running on the instrument.
Please specify what output parameter of the instrument you are using (beta_raw ?) Does it include correction for laser power fluctuations, incomplete overlap (in which range), etc ?
line 99:'manufacturer’s algorithm'
There are two extended abstract of posters and talks presented at the ISARS 2010 conference about the manufacturers algorithm: Teschke and Poenitz (2010, TP2010) and Frey et al (2010, F2010).
My notes say that TP2010 is the alogrithm used by the ceilometer from firmware version 0.719 onwards and F2010 was the version before. But you should check that with the operator and the manufacturer of the instruments.
'benchmark'I do not understand what the difference between 'only ... a benchmark' and an evaluation is, please clarify.
line 102:"RCS was capped to [0, 1e6] and rescaled to [0, 1]."
Scaling the RCS is of course a valid method, but as you do not provide a unit to the capping RCS value this scaling becomes arbitrary. If you want your method to be independent of the instrument you should provide at least an estimate for the physical units ((Sr m)-1).
line 105:"The software package labelme (Wada, 2022) was used for annotations"
This means annotation was done manually by visually inspecting the data ?
I recommend to point that out.
line 108:"The main location was Cabauw as this location also provided humidity
profile information..."Please give some more information - geographical coordinates, infrastructure like the tower and other instruments (former CESAR now Ruisdael Observatory etc) literature ?
Where does the humidity information come from ? (Tower, Microwave radiometer, ... ?)
line 109:"Instead of creating a specialised model for the Cabauw ceilometer, the model should generalise to new locations."
By restricting the training and analysis to Cabauw only, this sentence becomes an nonproven claim. I would be glad to see how the algorithm performs at any of the other sites.
line 110:
"... several days of data from ceilometers at De Kooy and De Bilt were also annotated."
Could you give some more information (geographical coordinates, why these sites and not any other of the KNMI dataset, and how did you select these 'several days' ? what made you choosing them?). As far as I understand annotated training was done with Cabauw plus some few data from these two other sites. The analysis in section 3,3 and Fig7 deals only with Cabauw data. Thus the starting sentence of section 2.1 ("a large dataset was downloaded...") is misleading: You used nearly only data from Cabauw. How does the algorithm perform at the other sites ?
Figure 2:
Please describe more accurate what is on top, middle and bottom plot. The enumeration (i, ii, etc) is not helpful. Helpful would be colour bars for the colour shading in the top and middle plot (Which colours code for min, middle, max, is white=cloud and brown=rain? I see more colors in the gradient plot than you mention in the text. The X-Axis annotation could be clearer (is this 'Month Day hour' or 'Day Month hour' is the hour in CET, CEST, LT or UCT ?). It would be a good idea to mark sun rise and sunset or at least provide the times in the figure caption. Minor tics on the X- and Y-axis at e.g. 100m steps and every hour would be helpful.
How do you deal with the noise in the data in the blind zone (below about 100m)? Would it not be more honest to remove it from the data as it cannot contain any information? The lidar is literal blind in this region. The laser beam is out of the field of view of the telescope. It cannot see the laser beam.
Top and middle plot nicely show the problems with the incomplete-overlap-correction of the CHM15k: there is a step in the top colour plot, and a vertical gradient in the middle plot at about 200m. This present as a horizontal mark starting at ~14:00 running through until the end of the plot. Your annotated MLH follows this artificial line around 21:00 i.e. this step generates an artificial MLH.
I do not understand how you could annotate the afternoon break down around 17:30 I do not see any hints why it should be there (no colour change in top plot no gradients in middle plot). And if there should be a faint gradient : could this not just be the end of the plume (or air mass ?) with lower backscatter (light green) rising from the ground at about 17:00. Similar for the annotated MLH from 06-09:00 and 09:30 to 10:00 when there seems to be no gradient.
line 111:What are representative cases ? How do you select them, what are the criteria?
line 115:"The information on potential layer boundaries was combined with other data sources ... For example, ... "
I would recommend to give all sources you used and not just examples.
line 116:"... thermodynamic MLH information from Cloudnet (ECMWF) model output (CLU, 2022) ... "
You got this MLH from the cloudnet database for Cabauw, but it is generated in the Integrated forecast system (IFS) of the European Center for Midrange Weather Forecast (ECMWF). I am sure you can find an appropriate reference for this thermodynamic MLH in IFS and give also a short description how it is calculated (Parcel method, Bulk-Richardson, flux minimum, etc...). And your reference under CLU 2022 is not understandable nor traceable. Cloudnet provides on its website unique identifiers and a 'how to cite' desciption (see e.g. https://cloudnet.fmi.fi/file/8f6ab79b-4eeb-42ab-abbc-045f9e5e172e ).
line 118:"MLH estimates from the manufacturer’s layer detection algorithm were included for comparison purposes."
Is it really comparison? Or does the person doing the annotation see the manufacturer MLH and eventually follows it if he or she thinks it is 'a good one'? In this case it would be an additional information. And I am wondering what this MLH really is: in figure 2 there are at certain times several MLH's visible. I.e. there is not only one MLH. I guess the instrument provides a first, second, third candidate. Please describe in more detail what 'the MLH' from the instrument is and how you use it.
line 119:"... the nocturnal MLH ..."
This is a common misconcetpion: there is no nightime mixing layer except in the heat islands of cities. In a rural location like Cabauw, a strong inversion develops during night, called the nocturnal boundary layer (NBL). This is a very stably stratified layer in the lowest few hundred meters of the BL (see Stull 1988 or Garrett 1992). Stable stratification means no vertical turbulent exchange, no mixing. And because there is no mixing the strong vertical temperature gradient can become even stronger. The surface decouples from the layers above and becomes due to radiative cooling and surface evaporation cooler and cooler. Any mixing would destroy this gradient (see van Ulden and Wieringa 1996). As temperature decreases relative humidity increases, aerosol particles grow and backscatter increase. There is no mixing, no transport of aerosol involved in this increase in backscatter. The height of the strong backscatter layer is not connected to the NBL it just represents the layer where depending on the type of aerosol RH is above 60-80%. Accordingly the high backscatter at night at the surface is not a mixing layer and it also does not identify the NBL.
line 120:"... the humidity profile ..."
I guess you mean relative humidity (RH).
line 122:
" ... the partially visible range of the ceilometer,..."
You mean either the 'no-overlap' or the 'incomplete-overlap' region.
In the no-overlap region the Laser beam and the field of view (FOV) of the lidar telescope do not overlap. The Lidar can not see light backscattered from the laser beam it is literally blind. In this height range there is no information in the ceilometer data about the state of the atmosphere. The incomplete-overlap region is where the laser partially lies within the FOV of the telescope. It is possible to correct this (see Hervo et al 2016 or Wiegner and Geiss 2012 (note that the latter is about the CHM15kx - with just a note about the CHM15k)).
For the CHM15k the no-overlap region ends at a range of roughly 100m the incomplete overlap ends at roughly 350m. As very small details of the optics determine the overlap, the manufacturer uses for every ceilometer an individual build-in overlap correction function. This function is not perfect (see remarks for fig.2). Hervo et al (2016) provided a method to correct it including a temperature sensitivity. To get precise numbers for the overlap regions refer to the operator or the manufacturer of the instrument.
line 124:"Humidity levels were similar when there was sufficient mixing ..."
Relative humidity (RH) is not conserved during vertical mixing processes. When RH stays constant with height this just means that air temperature and dew point increase or decrease (both is possible, especially if specific humidity varies with height) at the same rate with height. RH is thus not a reliable indicator for the height of a mixed layer. Reliable indicators are virtual potential temperature and specific humidity.
Figure 3:Ceilometer in the front, tower in the background (only lower half of the tower (van Ulden and Wieringa 1995)
One can also see a scanning microwave radiometer (MWR, left), a scanning cloud radar (CR), a micro rain radar (MRR) and a second microwave radiometer (center), a scanning doppler wind lidar (DWL) and two Distrometers (right).line 136:
"Fig. 2 indicates a MLH between 40 and 140 meters around a 4am UTC on 8 June, which slowly rose to 200 meters around 7am"
As above explained there is no mixing layer during the night. Increased backscatter signal in the lower levels just represents the layer with high RH. Beside this any information below 100m can only be derived from the tower data. The Ceilometer is blind in this region. Does it really make sense to train the algorithm with this data ?
line 137:
"The transition region from the unstable diurnal to the stable nocturnal MLH may not be clear from aerosol data (Wang et al., 2012)."
You mean the transition from the daytime convective (mixed) boundary layer to a neutrally stratified residual layer (RL) with a stable NBL below. That purely backscatter based MBL retrievals have difficulties to detect this transition has been shown on a statistical basis in Schween et al. (2014).
line 140:"Fig.3"
You mean figure 2.
line 143:"To differentiate between the stable and convective boundary layer, a night time variable was included."
Two comments on this:
1: There is no reason to identify a surface based layers during night - there is no mixing in the rural NBL. Any increased backscatter is related to certain values of relative humidity but not to mixing.
2: When during night the RL stabilizes layers with varying backscatter develop, probably to height dependent advection of layers with different aerosol (content and hygroskopy). These layers persist in the morning until the growing ML reaches their respective height (see figure 2). All algorithms have problems with this. A good example is the manufacturer algorithm which nicely identifies all of these backscatter layers.I therefore would exclude the night from the analysis and if necessary provide a morning hour flag, or even just a variable 'time after sunrise' such that the algorithm can learn by itself what to do when.
I would like to see some more details of the annotation process. What does the annotater see: the same as we see here in figure 2 including RCS, gradient, the manufacturers MLH detection, the ECMWF data and the RH line plot at the bottom of fig.2 ? Does the annotator follow the MLH for every timestep (12sec) or does the software, used for the annotation, do some interpolation?
And a provocative question: does the color scale of the backscatter image have an influence on where the annotator puts the MLH?
I would also recommend to order this section more clearly:
first describe which data you use:
1: Ceilometer from Cabauw +two other,
2: ECMWF MLH extracted from cloudnet data,
3: Ceilometer manufacturer MLH
4: Tower RH
5: and then describe the annotation process.
Fig.4:I am not an expert in computer based vision or image segmentation and this looks more like a flow chart to me (from input to output). I guess the left branch is the decoder and the right one the encoder. I would appreciate more description: what mean the arrows down, the arrows up, the horizontal arrows, the numbers in the boxes (first two numbers are divided by two from box to box. They start at input with the dimension of images mentioned in the text. Third number changes in some way but product does not stay constant. First two numbers in boxes connected by horizontal arrows are identical but third number not ...).
line 151:"latent space representation"
Please explain what this is.
line 156:"The input dimensions of the RCS image were 224×224 ..."
This means that the analysis is based on chunks of 224*12sec=44min:48sec - you mention that later but here would be a good point to do that. And I understand that 224 is used because it can be split in many prime factors (224=2^5*7) and because it fits to the number pixels in the vertical - which of course was chosen for the same reason.
I learn now that the third number in the boxes in fig.4 is the number of features. How do you choose the number of features for every step of the decoding and encoding process ?
line 164 / Section 2.4. Model CalibrationI am missing an information which part of the data was used for this pretraining:
All, the first year, only selected days ?
line 168:"Removing the skip connections was necessary as information would otherwise flow directly from input to output without passing through"
I do not understand why it is then build into the algorithm ?
line 172:"Given the temporal resolution of 12 seconds, a total of 6976 different images can be extracted from a full day of data,..."
This means you generate for every timestep (12sec) a new image and between subsequent images you have an overlap of 99.6%. Is this large overlap necessary ?
line 177:
"the model reached convergence with a reconstruction loss close to 0."
Can you explain what this means.
line 180:
"Unsupervised pre-training of encoder weights outperformed randomly initialising weights or loading ImageNet weights (results not presented)."
Based on which metric do you know that it outperformed the untrained model ?
What did the result of this pretraining look like ?
Does the network at this stage already recognize the ML ?
Why is additional training on annotated data necessary ?
I am also not sure how the terms 'pre-trainig' and 'model calibration' are related to each other.
Please clarify.I also do not understand how unsupervised training leads to a meaningful segmentation of the images. Please explain.
line 182:
"transfer training"
What is that ?
line 182:
Sentence starting with
"During model calibration, each batch ... "This sentence is not clear to me. Please reformulate.
If I understand right you extract from every annotated day 16 subimages of size 224x224.
These images are then used as training material. Why not use the full set of annotated images ?
line 184:image pairs are RCS image and image mask from the manual annotation ?
line 186:
Sentence starting with
"For each 45-minute sample ..."seems to be incomplete.
line 189:"... several ..."
Fig 5 shows exactly three pairs
line 190:"350,000 samples"
I thought for every of the 50 days with annotated masks 16 pairs where extracted that would give only 800 pairs or samples. Where does this large number come from ?
line 191:labelled data is equal to annotated images ?
line 192:"A small validation dataset for one additional annotated day was used to tune model hyperparameters"
what are your "hyperparameters" ?
The additional day mentioned here is a 51st day which has been annotated?
How was this day selected?
How can one single day with its specific situation can be used to tune and improve the results?lines 193 - 199:
"The main selection criterion for model evaluation was the mean
accuracy of the generated masks in the validation set,..."I cannot follow:
You are talking here about a //validation dataset// which is used to //tune 'hyper'parameters//. And this set is selected because it performs good.
In my humble opinion an evaluation should use randomly selected data which has not been used for the regression or training to test the performance of a method.
Please clarify.
Figure 5.:The figure shows the outcome of the visual-manual annotating process. It therefore would rather fit into section 2.2.
The letters a,b,c used in the caption are not in the figure.
An annotation of time and height would be helpful,
Are times given in LT or UTC?
These three samples are manually annotated images which are used for the training - please include this information in the caption..All three examples show cloud free morning to noon hours. It would be interesting to see also other times like e.g. morning (sunrise, start of convection), early afternoon (fully developed ML), late afternoon (breakdown of convection), evening (sunset...), scenes with clouds etc.
Subfig.a is from June 2019. although according to section 2.1. data was from *June 2020* to February 2022.
Why has example c the lowest MLH although it is noontime in Summer ?
I even have the feeling that inside the identified ML ondulating layers with different RCS are visible. That would mean that there is no mixing.
Why is the noisy layer at the bottom in example a higher than in the other two ?
In the middle of b an intrusion of light gray (=aerosol less air?) from above is visible. It persists some minutes at lower levels while above again darker pixels are visible. This is not represented by the ML mask below. What are the rules to deal with such intrusions and similar with extrusions?What would show the outcome of the non supervised training?
line 201:"correspond" not corresponding
line 202:"First, output masks were generated ... using one-minute intervals ..."
I thought model output is a 224x224x1 ML mask (see fig.4) representing about 45min in time.
How can the model produce one minute intervals with a length of 5pixels ?
line 203:"... overlapping predictions for each time step, which were averaged"
If I understand right you average the discrete black (in ML) or white (outside ML) values of the ML mask to something in between (some grey tone). Is this done by just arithmetically averaging the white and black values (0 and 1 or 0 and 255 or whatever)
Please explain that.line 205 - 210:
"A loss function was formulated ..."
With this text I do not understand what is done. Pleas be more precise.
I guess you mean that pixels are counted and this count is interpreted as a loss function and you search the minimum of it. But are only pure black and pure white pixels considered or also grey ones lying above/below a certain threshold (0,0.5,1)? Where comes the softmax or sigmoid function in play? Do you use 'the sigmoid function' or 'the softmax function' or a sigmoid like function?
line 211:This trailing paragraph of section 2.5 is rather an introduction to section 3.1.
line 211:
"... proprietary algorithm of the ceilometer manufacturer based on wavelet covariance transform ... "
You already mentioned the algorithm in section 2.1. Add the information about the principle there and refer here only to the respective section.
line 213:"... STRATfinder was still in active development ..."
The same strategy for STRAT as for the manufacturer algorithm: describe it in section 2 and refer her only to the section.
line 214:Acronym IPSL not explained.
line 215:"... data from the second half of 2020 at Cabauw was used as a test set,..."
I lost oversight which parts of the dataset are used for (pre-)training, annotation and which for this inter comparison, please clarify that at the beginning, at best in section 2. Second half of 2020 means July-December and thus 2/3 of the Summer, full Autumn and first month of winter. Why not a full year ? This means calm autumn days dominate the statistics of this comparison.
line 220:
"the benchmark methods"
I am not very content with the term 'benchmark': it has several very different meanings in geography, business or computer technology. I would rather call them 'reference methods' and introduce them with this name in section 2.
line 224:"... Fig. 6, top row"
Use letters a, b, c, ... to identify sub plots.
line 224:"... jumps between several residual layers ..."
These are not different RL's but aerosol layers which developed inside the residual of the ML of the day before. This is a typical problem of solely aerosol based algorithms: these aerosol layers show strong gradients or steps in backscatter which are then detected as 'the MLH' although they are result of no mixing between the layers.
line 225:"Deep-Pathfinder and STRATfinder correctly identified the nocturnal MLH around 100-200 meters..."
I repeat: There is no mixing at night. What you can see sometimes is increased backscatter due to RH above 60-80%. What you see in the first hours of 02-10 and 06-08 is that Deep-PAthfinder sticks to the top of the noisy layer i.e. the lowest layer were the ceilometer can detect anything. You probably annotated the top of the noisy layer as a ML during nights with low aerosol. In such situations the algorithm has to be trained that there is no detecion possible - as you did for precipiation.
I realize that STRATfinder provides no MLH below ~100m (minor ticks would be helpful). If it does not find a MLH above it seem to provide this height. Is this a valid strategy? Check the description of STARTfinder and discuss.
line 229:"All algorithms had difficulties capturing the decline in MLH around sunset, which is a typical limitation for MLH detection based on aerosol observations."
... see Wang et al 2012 or Schween et al 2014.
line 231:
"... a considerable amount of MLH estimates of the Lufft algorithm were missing due
to quality control flags. An example is provided for 9 July 2020,..."Quality control flags should ensure that a meaningful retrieval is possible. If the quality is bad no retrieval should be done. You could discuss which flags lead to no retrieval.
Especially 09-07 is very difficult and I am surprised that deep pathfinder provides a MLH:
The white areas below your retrieved MLH is precipitation partly not reaching the ground (check data from the Cabauw MRR and the Distrometers). Somewhere above it was stated that in case of precipitation the Deep-Pathfinder should deliver zero as indication for no retrieval possible. It seems as if this works not here. Instead Deep-Pathfinder as well as STRATfinder identify cloud base as MLH (recognizable on the darkband above, result of saturation of the ceilometer receiver, and noise further up, you may also check the cloud radar). One may discuss whether this is a valid retrieval - I would say no.
line 237:"When a clear convective boundary layer is not apparent (e.g., 10 December 2020), Deep-Pathfinder and STRATfinder were still able to correctly track the shallow MLH
throughout the day."How do you know that this is the correct detection of a mixed layer. A shallow layer with increased backscatter during night and during calm winterdays might be just the layer with highest RH in a layer with a strong temperature inversion. Beside of that, STRATfinder sticks here over long times to its lowest possible output around 100m, and Deep-Pathfinder sticks during several moments to the top of the noise layer. You may check the potential temperature and humidity profiles from the tower whether there is really a mixed layer or stable stratification with high RH.
line 243:"STRATfinder scored an average correlation of 0.591"
This is a poor correlation considering that both algorithms use the same data source. But it is in agreement with former intercomparisons of MLH algorithms based on Aerosol backscatter. The question arises whether a solely backscatter based MLH detection algorithm is meaningful.
(see Hennemuth and Lammert 2006, Muenkel et al 2007, Traeumner et al 2011, Haeffelin et al. 2011, Eeresmaa et al 2012, Caicedo et al 2017, Kotthaus and Grimmond 2018, Kotthaus et al 2020)This is an average of daily correlation coefficients for ~210days. Why don’t you provide an overall correlation ? Why don’t analysis it season wize?
line 247:(end of section 3.2. correlation analysis)
I would like to see a separate analysis of correlations for the different phases of the development of the ML (morning growth, mature ML or afternoon breakdown).
Eventually it is also worth to look at this during different seasons.
Figure 6:
Add identifiers like a, b, c, ... to the subplots.
Table 1:The diagonal is of course 1 and the table symmetric about the diagonal. One could use the lower triangular part of the table to present another measure like e.g. standard deviation, mean absolute difference etc. ... (see e.g. Haeffelin et al 2011)
I am missing an information on how many datarecords where used.Fig.7:
These are "Mean diurnal patterns..."
I am surprised to see that the average night time MLH values are much larger (3-400m)than in the 'typical' examples (e.g. 100-200m on 02-10, 06-08, 13-10, 10-12 in fig.6). These plots show that no MLH retrievals are possible below ~150m (especially the STARTfinder algorithm which 25% percentiles shows a sharp lower boundary).
line 254:"In contrast, nocturnal MLH conditions were more stable throughout the months in the test ..."
As I said: there is no ML during night.
Aside of this night time values in fig.7 show that the STRATfinder never goes lower than about 150m. Similar Deep-Pathfinder MLHs go only occasionally below 150m - probably due to the noise layer due to no-overlap which is about here. Both lower limits confine MLH ranges during night.
Additionally I am wondering how the algorithm deals with cases when there is no turbulence and accordingly no mixing, and MLH should be zero meters (whole atmosphere between 0 and 224m is not mixed, the mask should be everywhere white.) This seems never to be identified, neither by the traditional algorithms nor by Deep-Pathfinder. But I am sure that this happens: strong surface inversion can only develop if there is no mixing (see van Ulden and Wieringa, 1996). Or if you want to relate it to own sensual experience: imagine calm autumn nights when there is no wind.
line 256:"...a tendency to follow residual layers."
As above: these are aerosol or moist layers in the Residual layer.
line 258:"... in case of multiple cloud layers our annotations typically followed the lower layer, while STRATfinder followed the higher layer."
To investigate whether this influences the average MLH you could remove times with a certain amount of clouds (see e.g. Kotthaus et al 2023) and repeat the analysis. I recommend to do this.
line 259:
"Secondly, Deep-Pathfinder MLH estimates fluctuated more by following short-term reductions in MLH, while STRATfinder did not."
But these fluctuations are much smaller than the mean differences between the algorithms. Behalf of that I would expect that an average of the fluctuating MLH over some minutes comes at least within some ten meters close to the MLH based on a smoothed backscatter signal.
Line 262 - end of section 3.3The section ending here contains some hypothesized reasons (different behaviour in case of clouds, short term variability of MLH) for the differences between the algorithms. It would be worth to investigate this in more detail: Does it help to exclude clouds from the analysis, do 1min, 15min, 1h means of the algorithm still deviate from each other ?
line 271:"Note that these experiments did not use any form of unsupervised pre-training."
Is this intercomparison of NN architectures than comparable to the pretrained version?
line 274:
"... was also not ... " instead of "... was no ..."
It would be of interest to shortly explain what the main differences between the neuronal network architectures is.
line 275:
"... the MLH was extracted for a full day."
For one single day or all day in the validation set (July - December 2020) ?
line 276:"These statistics were computed with respect to the annotations for the validation set,"
Do I understand right that you use a part of the training set (annotations from 2019-2021) to validate the models in the validation set (July-December 2020) ?line 279:
"A full evaluation on six months of test data was not performed for the alternative architectures."
You try to get an idea on the quality of the different alternative NN architectures by investigating its performance in the validation set. But the validation data set has been used to improve model performance. Is it not common technique to use an independent "test data set" which has not been used for training and validation of the model ?
Please clarify.line 284:
"nocturnal"
typo.line 284:
"Mean absolute error was the lowest for the noctural MLH"Of course MAE is smallest during this time of the day as ceilometer backscatter is limited at the bottom by the no-overlap region and at the top by the position of the moist non-mixed layer.
line 284:
"MAE was substantially higher for the late afternoon decay in MLH,"Your afternoon extends into the night (16-24UTC) where differences can be expected to be small again (see fig.7). A better split of time would be 16-19UTC . And I am still convinced that a night time analysis makes no sense.
line 285:"Correlation was highest during daytime,"
You could even improve the performance by selecting the time of the fully developed ML (12-16UTC) when the top of the ML coincides with the top of the Aerosol layer topped by the free troposphere with low aerosol. This was at least the outcome of Schween et al (2014) who compared a wind lidar based retrieval with an aerosol backscatter based retrieval.
Table 3:
I am missing an information on how many data records where used (50 independent days with in total ~350thsnd records at 12sec resolution which are of course highly correlated, One could calculate a correlation length and from this an effective number of independent samples see Lenschow et al 1994)Table 4:
is the mean squared error given in meters squared ?
Line 311:
"The Deep-Pathfinder methodology was robust against differences in annotation methods, leading to different results, but functioning appropriately regardless of the chosen dataset."I do not understand: a different annotation method leads to different results but was appropriate?
Pleas explain in more detail:
What was the difference in the annotation technique?
What was different in the results?
Why were these differences acceptable?
line 314:"For different types of ceilometers (e.g., Vaisala CL31), it is recommended to repeat the unsupervised pre-training using unlabelled data of the corresponding instrument. This should not be necessary when using Deep-Pathfinder at other locations with the same instrument type."
I would expect that in more complex environments than Cabauw (homogenous landscape over tens of kilometres) e.g. mountain valleys or coastal sites with a pronounced land-sea breeze system the Cabauw training will fail. But you have the experience with other dutch ceilometer data. You could give proofs for your statement.
line 325:
"leads to a grayscale output mask (see Fig. 8)."
I guess this is the result of the method described following line 203.
Figure 8 could thus go to that section to illustrate the method.
I am also not sure whether I understand fig 8: around 17-19UTC there are two distinct MLH can you explain what happens here and what the final MLH would be.
(Does deepPathfinder stick to the upper layer and at a certain moment realises MLH is lower and then from 17:45UTC on decides that previous MLHs must also be rather at 600m than at 1200m ?)
line 326:
"value of the loss *function*"Missing word
line 327:
"Further quality control flags could be set if rain is detected,"
I thought you set MLH to zero if rain is detected.
I would recommend the same for cloudy moments (see discussion of clouds in Kotthaus and Grimmond 2018 and Kotthaus et al 2023). Ceilometers deliver cloud base height that can be easily used for a cloud flag or when averaged over longer time periods as cloud fraction.line 329:
"Instead of using only two output classes (i.e., mixing layer or not), image segmentation methods are suitable for the detection of multiple classes. ..."
This is indeed an interesting possibility and I guess you can give more details about your ideas how to approach this from ceilometer data alone (top of the Aerosol layer and residual layer), or with the help of other algorithms (e.g. PathfinderTURB, Poltera et al. 2017) or Manninen et al (2018), which of course would require a whole set of additonal data which, I guess, are all available at Cabauw (surface sensible heat flux, profiles of horizontal and vertical wind, cloud bases etc.).
Figure 9:
This is not your work. Did you asked for permission to publish it in your paper?
line 337:"U2-Net has so many skip connections that it would not be feasible to apply the pre-training approach used in our study."
I thought you cut all the skip connection for the pretraining in your model. Why not in U2net too ?
line 348:
"One challenge for model development is that no ground truth MLH data is available,"
But there are several different methods based on different physical parameters: Radiosonde based temperature profiles, radar wind profilers based turbulence profiles, doppler lidar based wind profiles, etc. All of them could be used for intercomparison.
line 352:"However, manual feature engineering based on expert decisions is avoided,"
But you used for the NN architecture intercomparison only learning based on annotated data. Annotating is highly based on expert knowledge: the annotating person knows (or learns quickly) during which time of the day which behaviour of the MLH can be expected, what to do in the case of clouds or rain etc.
line 358:"The availability of real-time MLH estimates from a large-scale ceilometer network could be used for the advancement of NWP models via data assimilation."
I am not sure whether MLH can be easily assimilated into NWP. Do you have references for this,
--------------------------------------------------------------------------------References:
Caicedo et al 2017
Comparison of aerosol lidar retrieval methods for boundary layer height detection using ceilometer aerosol backscatter data
Atmos. Meas. Tech., 10, 1609-1622, 2017
doi:10.5194/amt-10-1609-2017Couvreux et al 2007
Negative water vapour skewness and dry tongues in the convective boundary layer: observations and large-eddy simulation budget analysis
Boundary-Layer Meteorol (2007) 123:269-294
DOI 10.1007/s10546-006-9140-y
Eeresmaa et al 2012
A Three-Step Method for Estimating the Mixing Height Using Ceilometer Data from the Helsinki Testbed
JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY 51 pp2172
DOI: 10.1175/JAMC-D-12-058.1
Fitzgerald 1984
Effect of relative humidity on the aerosol backscattering coefficient at 0.694- and 10.6-µm wavelengths
Applied Optics Vol. 23, Issue 3, pp. 411-418 (1984)
DOI: 10.1364/AO.23.000411
Frey et al. 2010
Detection of aerosol layers with ceilometers and the recognition of the mixed layer depth
Extended abstract on poster presented at 'International Symposium for the Advancement of Boundary Layer Remote Sensing' (ISARS), June 2010 at Paris, P-BLS/12)
http://www.isars2010.uvsq.fr/index.php?option=com_content&view=article&id=42&Itemid=36
http://www.isars2010.uvsq.fr/images/stories/PosterExtAbstracts/P_BLS12_Frey.pdf
Garratt 1992
The Atmospheric Boundary Layer
Cambridge University Press
ISBN 0 521 46745 4
Haeffelin et al. 2011
Evaluation of Mixing-Height Retrievals from Automatic Profiling Lidars and Ceilometers in View of Future Integrated Networks in Europe
Boundary-Layer Meteorol
DOI 10.1007/s10546-011-9643-zHennemuth and Lammert 2006
DETERMINATION OF THE ATMOSPHERIC BOUNDARY LAYER HEIGHT FROM RADIOSONDE AND LIDAR BACKSCATTER
Boundary-Layer Meteorology (2006) 120: 181-200
DOI 10.1007/s10546-005-9035-3Hervo et al 2016
An empirical method to correct for temperature-dependent variations in the overlap function of CHM15k ceilometers
Atmos. Meas. Tech., 9, 2947-2959, 2016
doi:10.5194/amt-9-2947-2016Kotthaus et al 2023
Atmospheric boundary layer height from ground-based remote sensing: a review of capabilities and limitations
Atmos. Meas. Tech., 16, 433-479, 2023
https://doi.org/10.5194/amt-16-433-2023Lenschow et al 1994
How Long Is Long Enough When Measuring Fluxes and Other Turbulence Statistics?
Journal of Atmospheric and Oceanic Technology 11(3):661-673
DOI: 10.1175/1520-0426(1994)011<0661:HLILEW>2.0.CO;2Muenkel et al 2007
Retrieval of mixing height and dust concentration with lidar ceilometer
Boundary-Layer Meteorol (2007) 124:117-128
DOI 10.1007/s10546-006-9103-3Ouwersloot and Vilà-Guerau de Arellano 2013
Analytical Solution for the Convectively-Mixed Atmospheric Boundary Layer
Boundary-Layer Meteorology
DOI 10.1007/s10546-013-9816-zPoltera et al 2017:
PathfinderTURB: an automatic boundary layer algorithm. Development, validation and application to study the impact on in situ measurements at the Jungfraujoch,
Atmospheric Chemistry and Physics, 17, 10 051-10 070,
DOI: 10.5194/acp-17-10051-2017
Schween et al 2014
Mixing-layer height retrieval with ceilometer and Doppler lidar: from case studies to long-term assessment
Atmos. Meas. Tech., 7, 3685-3704, 2014
doi:10.5194/amt-7-3685-2014
Stull, 1988
An Introduction to Boundary Layer Meteorology,
Atmospheric and Oceanographic Sciences Library, Springer Netherlands,
ISBN 978-94-009-3027-8Teschke and Poenitz 2010:
On the Retrieval of Aerosol (Mixing) Layer Heights on the Basis of
Ceilometer Data
Extended abstract on talk presented at 'International Symposium for the Advancement of Boundary Layer Remote Sensing' (ISARS), June 2010 at Paris, P-BLS/12)
http://www.isars2010.uvsq.fr/index.php?option=com_content&view=article&id=41&Itemid=35
http://www.isars2010.uvsq.fr/images/stories/PosterExtAbstracts/P_RET07_Ponitz.pdfTraeumner et al 2011
Convective Boundary-Layer Entrainment: Short Review and Progress using Doppler Lidar
Boundary-Layer Meteorol (2011) 141:369-391
DOI 10.1007/s10546-011-9657-6Tuomi 1976
Backscatter of light by Aerosol at high relative humdity
J. Aerosol Sci., 1976, Vol. 7, pp. 463 to 471.
DOI: 10.1016/0021-8502(76)90051-3
van Ulden and Wieringa 1996
Atmospheric boundary layer research at Cabauw
Boundary-Layer Meteorol 78, 39-69
DOI: 10.1007/BF00122486
Wiegner and Geiss 2012
Aerosol profiling with the Jenoptik ceilometer CHM15kx
Atmos. Meas. Tech., 5, 1953-1964, 2012
doi:10.5194/amt-5-1953-2012Citation: https://doi.org/10.5194/amt-2023-80-RC2 -
AC2: 'Reply on RC2', Jasper Wijnands, 25 Sep 2023
The comment was uploaded in the form of a supplement: https://amt.copernicus.org/preprints/amt-2023-80/amt-2023-80-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jasper Wijnands, 25 Sep 2023