Lidar temperature series in the middle atmosphere as a reference data set – Part 1: Improved retrievals and a 20-year cross-validation of two co-located French lidars

Wing, Robin; Hauchecorne, Alain; Keckhut, Philippe; Godin-Beekmann, Sophie; Khaykin, Sergey; McCullough, Emily M.; Mariscal, Jean-François; d'Almeida, Éric

doi:https://doi.org/10.5194/amt-11-5531-2018

Articles | Volume 11, issue 10

https://doi.org/10.5194/amt-11-5531-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-11-5531-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 11, issue 10

Research article

|

10 Oct 2018

Research article |

| 10 Oct 2018

Lidar temperature series in the middle atmosphere as a reference data set – Part 1: Improved retrievals and a 20-year cross-validation of two co-located French lidars

Robin Wing, Alain Hauchecorne, Philippe Keckhut, Sophie Godin-Beekmann, Sergey Khaykin, Emily M. McCullough, Jean-François Mariscal, and Éric d'Almeida

Download

Final revised paper (published on 10 Oct 2018)
Preprint (discussion started on 02 May 2018)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review comments on Wing et al. (2018) Part A.', Anonymous Referee #1, 01 Jun 2018
- AC1: 'Response to Referee 1', Robin Wing, 16 Jul 2018
RC2: 'Review of Wing et al., Part A', Anonymous Referee #2, 04 Jun 2018
- AC2: 'Response to Referee 2', Robin Wing, 16 Jul 2018
RC3: 'Comment on “Lidar temperature series in the middle atmosphere as a reference data set. Part A: Improved retrievals and a 20 year cross-validation of two co-located French lidars” by Wing et al.', Anonymous Referee #3, 04 Jun 2018
- AC3: 'Response to Referee 3', Robin Wing, 16 Jul 2018

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Robin Wing on behalf of the Authors (16 Jul 2018) Author's response Manuscript

ED: Referee Nomination & Report Request started (18 Jul 2018) by Markus Rapp

RR by Anonymous Referee #3 (09 Aug 2018)

Suggestions for revision or reasons for rejection

Thank you for providing additional information and clarifications. Though most of my concerns were addressed adequately in your response, some issues still remain. I wrote additional comments concerning those issues below. Where relevant, I reproduced my original remarks/questions and your responses.

Original comment: Section 3.3.3: Since you do not precisely explain what the Matlab Neural Code does and how the blue trace is derived, I suggest you remove that part and shorten this section.

Author: I have had several discussions at NDACC lidar meetings and at the last IRLC about using machine learning and MatLab’s neural network toolbox to estimate lidar profile backgrounds. I think that is plot will be interesting to several people.

Reviewer: I do not question potential interest within the community in machine learning and using Matlab’s toolbox. Machine learning is a complex and much hyped topic, but many issues regarding e.g. reproducibility of the results are not yet well understood, as results often depend on how the particular network was trained and what training data were used.

If you do not provide any information about how you set up the neural network and how it was trained, nobody will ever be able to reproduce your findings or compare with other results. Thus, publishing those results will be worthless for the science community.

You may show results obtained with the Matlab Neural Network tool in your paper. However, you should briefly explain what you did. Just saying “I used this toolbox and that is what I got” is no scientific work. On the other hand, you could say “I took the Fourier transform of time series x and here is the spectrum”, as the Fourier transform is a well-defined algorithm which will always produce the same results when applied to the same data sets. With neural networks many things can go wrong because their behavior is not so well-defined. In your case that is especially important given your statement “the software requires an exhaustive set of example bad profiles which we cannot supply”. So how was the network trained without a representative training data set? Did you use any trick nobody else knows?

Coming up with a precise description of what the neural network did will likely be difficult. For that reason, and because the neural network part is not essential for your paper’s main results, I suggested to remove this part. However, you may insist on showing these results. In that case I will insist on a description of the neural network. That description may be brief, but all essential information which allows someone to repeat your steps must be given.

Line 245: “bad scans” -> “bad profiles”?

Original comment: Lines 267-269: What is the reason for choosing “the point where the signal to noise equals one in the density profile”?

Author: It seems like a reasonable choice for deciding on an arbitrary starting point. We were
motivated by getting temperatures in the UMLT. However, the point is well taken. I’m
aware that other groups use other definitions. It would be good to see a study devoted specifically on this topic.

Reviewer: I may sound harsh, but your answer “it seems like a reasonable choice” does not convey any scientific value. To be more precise, why is that a reasonable choice? If you did not investigate this question, it is ok to say so (given the limited time, no study can be perfect). I am aware of different groups using different definitions for the starting point, and I was just curious whether you have any convincing arguments for a particular definition.

Original comment: Section 3.5.3: You should not attempt to “correct” signal induced noise. It is fundamentally impossible to characterize properly signal induced noise in lidar signals because the noise is superposed on the atmospheric signal. Determining the signal induced noise from the background signal above the lidar signal is bound to fail because you are essentially observing the noise at different times outside the period where you actually are interested in. Signal induced noise is highly non-linear and therefore it is impossible to properly correct it. The data should be regarded as corrupt and not be used in lidar analysis. Besides, significant signal induced noise (e.g. blue trace in Figure 9) indicates that detectors are operated outside safe limits or there is a general technical problem with the lidar. If you insist on using the questionable data, you should assess how the retrieved temperature profile changes when you tweak your model representing the signal induced noise (e.g. cubic versus linear). How do your retrieved profiles compare to independent observations e.g. radiosondes at lower altitudes?

Author: I disagree with the conclusion that we should not make the attempt at a SIN correction. You are quite right that a perfect correction might be impossible. However, we have found that a correction of the sort described in the paper, for the types of signal induced noise that we see at OHP, can be adequately applied for the purposes of our temperature retrievals. The effects of this signal induced noise in our profiles, when uncorrected, is to warm the upper altitude regions of the temperature profiles. Conveniently, we have two measurement channels (the high and low gain channels) which make coincident measurements in this region. Typical count rates within this region are are well within the linear response regime of the high gain channel; therefore dead time correction is not required at these altitudes, and we can believe the high gain channel temperature profile in this region. The quadratic correction for signal induced noise in the low gain channel brings the resulting low gain temperatures into agreement with those from the high gain channel at these high altitudes.
While it would be wonderful to eliminate every stray source of noise in the lidar, we cannot do this for the measurements going back 40 years and more - which form a valuable data set. We also point out that the effect of this quadratically-characterized signal induced noise is negligible at low altitudes: For example, in Fig. 9, the SIN contribution at 30 km is less than 100 counts, compared to a bg + signal value in the tens of MHz (see fig03). In terms of contribution to temperature, this is so small as to not be observable.
I did some initial quality testing between my 3 channel lidar temperature retrieval and the radiosondes launched from the station at Nimes (~150 km west) and the results are reasonable. There’s some expected differences but the results can be very good when the sonde travels directly east. That said the focus of this paper is above 30 km and a full radiosonde comparison study with calculated air mass trajectories would be a good project for the next student.

Reviewer: I agree, we can’t change the past and need to work with the data at hand. Above you mention the agreement between high and low channel temperatures, which improves when the quadratic correction is applied. That is very valuable information, because it gives credibility to your approach, and should be mentioned in your manuscript as well. On the other hand, your validation works only for the lower channel. In the absence of any validation for the upper channel which shows a completely different behavior (linear versus cubic), you could at least provide an estimate of the magnitude of the correction, e.g. x K at 75 km, where x is the difference between temperature profiles retrieved with and without correction. If x is sufficiently large (e.g. >1 K), that should be acknowledged as potential source of error, as signal induced noise is a dynamic phenomenon which commonly depends on several factors (e.g. peak intensity, average intensity, particular type of detector) and thus likely varies over a broad range of time scales (from pulse-to-pulse to months). The problem is that signal induced noise causes a non-Gaussian error, so integrating longer does not help you. Your correction most likely helps alleviating the problem, but it won’t be perfect. How well it really works – we don’t know. E.g. you may unknowingly overcorrect the noise resulting in a cold bias, or undercorrect and still retain a warm bias. Only the comparison with an independent data set can tell whether your correction is working as it is supposed to. However, if your x is small (I think it is. Unless both lidar systems show exactly the same behavior, I would expect larger differences in Fig. 15 for a large x.), you may argue that the effect of signal induced noise on temperature is small as well.
Because of the problems it causes, most groups try to avoid signal induced noise by limiting the peak count rate to safe levels.

Original comment: Figure 13: It is hard to estimate absolute temperature differences. I suggest you use a segmented color bar with 6-10 different colors. Can you provide a plot showing combined temperature error estimates of both lidar data sets? There is a period in mid 2001 with distinct blue color (negative temperature differences) between 30 and 55 km altitude. Could these observations also have been affected by misalignment? A similar area can be found in right after the last marked region in 2011.

Author: The same information is already presented in a more compact way in Fig14 I’ve added the following text: ‘For reference, a typical LTA temperature profile with an effective vertical resolution of 2 km has an uncertainty due to statistical error of 0.2 K at 40 km; 0.4 K at 50 km; 0.6 K at 60 km; 0.7 K at 70 km; 1.8 K at 80 km; and 602 K at 90 km. For reference, a typical LiO3S temperature profile with an effective vertical resolution of 2 km has an uncertainty due to statistical error of 0.3 K at 40 km; 0.5 K at 50 km; 1.0 K at 60 km; 2.7 K at 70 km; and 10 K at 80 km.’ I cannot account for the blue regions in Fig13 based on either lidar uncertainty budget or through geophysical explanations. Yes you’re correct the blue bias between 30-50 km is likely due to misalignment. Given 5 mirrors in LTA and 4 mirrors in LiO3S there are many possible ways to be misaligned. As well the severity of the misalignment.

Reviewer: If blue (or red) biases outside the boxes may be caused by misalignment, then misalignment is obviously a major source of error which ultimately limits the accuracy (and, depending on time scale, also precision) of your measurements.

Hide

RR by Anonymous Referee #2 (20 Aug 2018)

Suggestions for revision or reasons for rejection

This review is on the revised manuscript of the Wing et al. paper. The authors addressed the comments of the previous review in the extensive (appreciated!) reply and in the new manuscript. The paper is improved and much better comprehensible, now. Nevertheless, there are still some remarks, either to be repeated or new topics coming up with the new version.
The general challenge of the paper is the combination of different aspects (or goals of the paper) and their proper description in the text, that I see better now after reading the review reply. I find three goals: i) the introduction of the long-term data set and the instrumental changes, iii) treatment of this heterogeneous data set – or quality assurance - for the use in the accompanying paper, iii) improvement of the temperature algorithm and reduction of the bias compared to satellite soundings. Of course, these goals cannot be completely separated from each other, but they often affect different altitude sections. I recommend to make these goals clearer, try to subdivide the text appropriately or refer each (sub)section to its particular goals. As an example, the instrumental achievements for background reduction (Sec. 3.4) are dedicated to the first goal, but the deadtime correction (3.5.1) to the second.

Line 45 – end: Please refer “first part” (second, third) to the Section numbers used in the manuscript.
l. 72: Please explain shortly how your Mie channel is working if it uses a similar filter as the Rayleigh channels.
Table 1: Please check the dimension of the mirror diameters.
Section 3.2: I got confused (and the general reader may also) about the altitude resolutions, integration times, and potential smoothing. I find raw data with 75 m resolution, temperate data with 300 m and 2 km, nightly means and individual profiles. I recommend showing raw data only at the resolution used for the quality control, and temperatures as nightly means with the altitude resolution used for the accompanying paper.
l. 184: I suggest writing “overestimation of the background due to localized signal contaminations” (noise should not be confused with background)
l. 185: “warmer temperatures” should read “higher temperatures”
l. 185-187: I suggest removing these sentences, because signal contaminations will normally not result in underestimation of the true background (detector noise, moonlight etc.). The sentences may be moved to Section 3.5.3.
l. 240 to …: Please accept that the reader may not be able to identify three or more groups, but only “high background at low signal” and vice versa. Please explain in more detail or (preferred!) just state after introducing the red lines in Fig. 6 that this case may be simple but you are seeking for a flexible solution for 20 years of data (instrumental changes) and various conditions. Furthermore, express that you are searching for the outliers within a single night, not bad-signal profiles in general.
l. 249-254: I still recommend to delete this section and the blue line. This is not a project report. I doubt that you want to state that the Matlab software cannot be used in general. Therefore this section may at most be interesting for the NDACC people you talked to, but the general reader will be confused.
l. 260-267: I am sorry but I still do not understand how the MWW rank-sum test is applied here. As far as I understand, the test checks which of two distributions is larger. Do you simply want to check whether the background is larger than the signal at 35-45 km? Then you may not need the cumulative sum. How the rank sum is calculated and how does this compare to the cumulative sum of either background or signal? Why the first 13 profiles are discarded? From my point of view this test is a central method of the quality control procedures and therefore should be clearly described.
l. 263: As mentioned above, please clearly distinguish between “background” (the average count rate above the usable range, i.e. at 120-153 km) and “noise” (the statistical uncertainty of the count rate at a given height). The background count rate has its noise, but also the signal has. Colloquial both terms are often confused but must not in a publication.
Section 3.4: You should make clear that this Section is of different “character” than the other ones. Here you describe some (important) instrumental changes that have been done in the past, not your recent software developments. The removal of SIN is described in Section 3.5.3, i.e. either discard this paragraph in 3.4 or add a reference.
l. 346: It should read “X to Y km”, but the text in brackets can also be removed. The remaining text is clear enough.
l. 423: Please explain (in Section 2?) how you combine the N2 Raman channel with the Rayleigh channel. How is the aerosol correction done? Is the Raman channel simply treated as the molecular channel below 30 km?
Fig. 10: The paper mainly deals with nightly mean profiles; therefore I recommend to show a mean profile here. Furthermore you should indicate the transitions between the channels. I generally doubt that an error of 30 % is useful, even for statistical analyses. I recommend using much smaller error margins to avoid unrealistic temperature gradients.
l. 452: Please explain in some more detail the differences between the standard NDACC retrieval and yours. I assume that also NDACC has some quality assurance measures like removal of SIN contamination, removal of TES, or deadtime correction.
Fig. 11: This Figure somewhat demonstrates the confusion about the goals of the paper. The improved retrieval is effective mainly above 75 km. The validation by the co-located lidar is made below 75 km. The companion paper uses data up to 80 km. I do not criticize this figure, but would like to see some clarifications throughout the manuscript, which particular topic is addressed. This would help the reader finding the context of this long paper.
Figure 11 (second topic): Is the variance of the SABER data relevant? I assume it shows mainly geophysical variation of the temperatures in the data set. Median errors of all profiles are more important. How many profiles contribute to the ensemble, what are the criteria for temporal and spatial matching?
l. 480: It is confusing that you stress the advantages of your retrieval compared to NDACC in the first part of this paper, but then use the NDACC code for comparison between the lidars.
l. 514: Before, a resolution of 300 m has been mentioned.
l. 516: Do you really mean 602 K?
l. 543: Here, 300 m resolution is used again. Please explain or unify.
There are a lot of typos and odd formulations throughout the manuscript. I recommend consulting advice.

Hide

ED: Publish subject to minor revisions (review by editor) (04 Sep 2018) by Markus Rapp

AR by Robin Wing on behalf of the Authors (26 Sep 2018) Author's response Manuscript

ED: Publish as is (27 Sep 2018) by Markus Rapp

AR by Robin Wing on behalf of the Authors (27 Sep 2018)

Short summary

The objective of this work is to minimize the errors at the highest altitudes of a lidar temperature profile which arise due to background estimation and a priori choice. The systematic method in this paper has the effect of cooling the temperatures at the top of a lidar profile by up to 20 K – bringing them into better agreement with satellite temperatures. Following the description of the algorithm is a 20-year cross-validation of two lidars which establishes the stability of the technique.