Quality Control for Second-Level Radiosonde Data Based on Bezier Curve Fitting

Lai, Huixia; Chen, Lian; Zhang, Hualin; Tian, Ye; Zhang, Weijie; Wang, Bo; Zhang, Shi

doi:https://doi.org/10.5194/amt-2024-164

Preprints

https://doi.org/10.5194/amt-2024-164

Preprints

06 Nov 2024

| 06 Nov 2024

Status: this preprint was under review for the journal AMT but the revision was not accepted.

Quality Control for Second-Level Radiosonde Data Based on Bezier Curve Fitting

Huixia Lai, Lian Chen, Hualin Zhang, Ye Tian, Weijie Zhang, Bo Wang, and Shi Zhang

Abstract. The balloon-borne radiosonde observations provide high-resolution profile observations of pressure, temperature, relative humidity, and winds from the surface to the middle stratosphere. These observations help validate space-based data and are used in climate research, weather forecasting. For the large amount of second-level radiosonde data, it is tedious and time-consuming for the manual quality control (QC). Furthermore, varying experiences and different judgment standards may lead to inconsistent judgments for abnormal data. To address these issues, we propose a two-stage QC method for second-level radiosonde data based on the Bezier curve fitting. In the stage QC1, the gross errors are filtered out according to the measurement range of the sensors, the change rates and extreme temperature values based on pressure segmentations. Also, the algorithm of the longest descending sequence(LDS) is used to identify the moment of sounding termination and eliminate items after that moment. In the stage QC2, we score each item with deviations calculated using Bezier curve fitting, and then use a decision tree model ,CART, to identify anomalies in second-level radiosonde data. The experiment results first demonstrate the efficacy of QC at each step, and finally validate the rationality of our method by comparing the statistical characteristics before and after QC. After QC, the error items are greatly reduced, and the percentile profile distribution of temperature, pressure and relative humidity becomes more reasonable. The overlap of items identified by manual QC and automatic QC reaches 86 %, verifying the effectiveness of our method. This research significantly boosts QC efficiency and unifies the QC standards, providing quality assurance for various applications.

Received: 21 Sep 2024 – Discussion started: 06 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Huixia Lai, Lian Chen, Hualin Zhang, Ye Tian, Weijie Zhang, Bo Wang, and Shi Zhang

Status: closed

RC1:
'Comment on amt-2024-164', Anonymous Referee #2, 20 Nov 2024

For the large amount of second-level radiosonde data, manual QC is tedious and time-consuming, and also bring some technical problems, such as inconsistent judgments and missing errors. This manuscript proposed a two-stage QC method for second-level radiosonde data based on the Bezier curve fitting. The gross errors are filtered out in the QC1, and some other errors are filtered out in QC2. The basic idea is novel and technically sound, and the experiments are solid, but there are some problems related to technical details, valuation and writing.

1. The overlap of items identified by manual QC and automatic QC reaches 86%. It seems that the algorithmic QC doesn't seem that good.

2. LDS is used to identify the moment of sounding termination. Why only use LDS to determine the moment of sounding termination, instead of outputting the longest decreasing sequence?

3. Is this method applicable to radiosonde data collected from other upper-air sounding stations?

4. Some typos. Such as : “equipmenfaultslt” in line 31; “toeliminateg” in line 34.

Citation: https://doi.org/10.5194/amt-2024-164-RC1
- AC1: 'Reply on RC1', Shi Zhang, 23 Nov 2024
  
  Thanks for the reviewer’s valuable comments. In the following, we will reply the comments one by one, and revise the manuscript later.
  1. The overlap of items identified by manual QC and automatic QC reaches 86%. It seems that the algorithmic QC doesn't seem that good.
  Response: At present, the quality control of second-level radiosonde data is manually carried out to screen out erroneous data. The operational specification, "Operational specification for routine upper air meteorological observation", released by CMA does not propose quantitative criteria for filtering out errors. It only provides some guiding opinions. The understanding of the criteria varies from person to person. There are 4 quality control people in charge of the QC task. Therefore, the QC on the radiosonde data is hard to reach a consensus, especially for some micrometeorological errors, random errors, and systematical errors.
  The experimental data spans 3 years. The QC task is completed by the 4 QC persons individually. We think that the overlap rate of 86% is a good result. At the same time, the experiment results also show that the overlap rate of the QC1 stage results for gross error judgment reached 98.39%.
  2. LDS is used to identify the moment of sounding termination. Why only use LDS to determine the moment of sounding termination, instead of outputting the longest decreasing sequence?
  Response. During the ascent of a balloon, the balloon may fluctuate up and down because of turbulence. If we output the longest decreasing sequence and filter out other data, the final sequence may not truly reflect the complete process of the balloon from releasing to exploding. The moment of sounding termination may be earlier than the real one.
  3. Is this method applicable to radiosonde data collected from other upper-air sounding stations?
  Response: Yes, our method can be applied to radiosonde data collected at other upper-air sounding stations. Our method needs to be trained with the historical data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  4. Some typos. Such as : “equipmenfaultslt” in line 31; “toeliminateg” in line 34.
  Response: Thanks for the reviewer’s comment. “equipmenfaultslt” should be “equipment faults”, and “toeliminateg” should be “to eliminate”. They are caused by our carelessness. We have revised the manuscript carefully, and corrected them in the revised manuscript.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC1
RC2:
'Comment on amt-2024-164', Anonymous Referee #3, 21 Nov 2024

Comment Summary
The manuscript proposes an automatic quality control (QC) method for second-level radiosonde data including the utilization of Bezier curve fitting. The topic is relevant, as improving QC methods for radiosonde data is essential for meteorological applications, and the attempt to automate QC to reduce inconsistencies in manual methods is well-motivated and aligns with the need for standardization. However, there are significant flaws in its design and execution that hinder its contribution to the field. Below is a detailed review of the manuscript, including critical points of concern.
Major Concerns
1. Lack of Comparison with Existing QC Methods
Several automated QC programs for radiosonde data already exist. The lack of comparison undermines the novelty and relevance of the proposed method. The manuscript does not compare the proposed QC1+QC2 method with other established QC methods, or automated QC software like Aspen developed at NCAR EOL. Without highlighting differences or advantages, the study fails to justify why this method is necessary or superior, or at least suitable for future application.
2. Misuse of Bezier Curves
Bezier curves are primarily used for creating smooth, visually pleasing lines. A lot of time the high-resolution nature of upper-air radiosonde data is important for weather and climate. Bezier curves fitting seems unnecessarily suitable for filtering out radiosonde data especially only 2^nd-order fitting is applied in this study. It is unclear why this approach was chosen.
3. Insufficient Instrument Details
The authors do not adequately describe the radiosonde’s hardware, factory specs, sensor measurement range and error, and its existing systematic QC capabilities. Error range data and laboratory calibration results are missing, making it difficult to evaluate how the proposed QC addresses hardware limitations.
4. Overly Complex Methodology
The QC method is unnecessarily complex for the problem it aims to solve. QC1 simply deals with filtering outliers and determining termination point. Such filtering tasks can be conducted with simple random error detection like provided by NIST (https://www.itl.nist.gov/div898/handbook/pmd/section2/pmd214.htm). And for termination point detection, the author claims that LDS method is consistent with manually labeling, and minimum pressure-based method is not appropriate. However, the manually labeling process is unclear. And the manuscript does not clearly explain differences between LDS termination and minimum pressure-based termination.
Finally, it is unclear how the method handles sensor response times or time-lag corrections, which is one of the main causes of error in radiosonde data.
5. Inconsistent Use of Data
There is no explanation of whether the method accounts for known differences between day and night radiosonde data, particularly considering solar radiation-induced errors being the common main cause of error in radiosonde data.
And as stated in the manuscript, nearly 19.2% of the data was manually filtered prior to applying the QC method. This pre-processing step undermines the claim of automation and raises concerns about scalability and repeatability. How did this manual process perform?
6. Limited Validation and Generalizability
The method is evaluated using a dataset from few stations, and its effectiveness for other stations, weather conditions, or seasons is not addressed. Specifically, historical data used for calibration is not described, limiting the transparency of the validation process. How was the historical data generated? What does the historical data looks like in P, T, and RH? How trustworthy the historical data is that can be served as the reference base for QC process?
7. Unclear Standards for Manual QC
The authors mention overlaps with manual QC and shows mainly the comparison between the proposed QC method and the manual process. However, this study fail to describe the standards or processes used for manual labeling, making it challenging to interpret the results. How did the manual labeling perform? What kind of thresholds the process utilizes? And essentially, how trustworthy the manual labeling is so that the authors claim the proposed QC process is feasible when they are in consistence.

Citation: https://doi.org/10.5194/amt-2024-164-RC2
- AC2: 'Reply on RC2', Shi Zhang, 29 Nov 2024
  
  I would like to thank the reviewer's comments. In the follows, we will response them one by one.
  Comment 1: Lack of Comparison with Existing QC Methods
  Several automated QC programs for radiosonde data already exist. The lack of comparison undermines the novelty and relevance of the proposed method. The manuscript does not compare the proposed QC1+QC2 method with other established QC methods, or automated QC software like Aspen developed at NCAR EOL. Without highlighting differences or advantages, the study fails to justify why this method is necessary or superior, or at least suitable for future application.
  Response 1: The L-band (Type 1) upper-air sounding system includes hardware and software (data processing software). The software can carry out basic QC, but still remains many errors needed to be concerned. There are indeed some relevant studies, which are introduced in Section 1. We also analyzed the limitations of these methods. No matter what QC method is ultimately used, all modifications require final confirmation from QC personnel. Therefore, we consider that manual QC should be our ultimate comparison standard. Our method is training based on the historical manual QC results. The more consistent our QC results are with manual QC results, the more reliable our method is. And the comparing results are shown in Section 4.
  There are different upper-air sounding systems for each country. They have different data format. Also, we will try Aspen developed at NCAR EOL and make comparisons later.
  Comment 2: Misuse of Bezier Curves
  Bezier curves are primarily used for creating smooth, visually pleasing lines. A lot of time the high-resolution nature of upper-air radiosonde data is important for weather and climate. Bezier curves fitting seems unnecessarily suitable for filtering out radiosonde data especially only 2nd-order fitting is applied in this study. It is unclear why this approach was chosen.
  Response 2: Indeed, Bezier curves are used for creating smooth lines. We also agree that the Bezier cannot fit the sequences of temperature, pressure, and relative humidity in the radiosonde data directly. It is even impossible to find a curve that can describe the data sequence. But we notice that temperature, pressure, and relative humidity are continuous variables. The characteristic of continuous change is crucial for the manual QC. The smooth Bezier curves reflect the continuous variation characteristics of meteorological variables in the atmosphere. In our method, we first perform Bezier curve fitting based on control points, and then evaluate the degree of deviation using the difference between the fitted value and the observed value. That is to say, the fitting is used as a quantitative tool, and we evaluate the error probability based on the degree of deviation using a decision tree. We did not directly make a judgment based on fitting results.
  Through experiments, we find that the local changes in meteorological variables are more suitable for fitting with 2-order Bezier curves. One deviation is generated for each fitting. The k-order Bezier fitting will generate k-1 deviations, which may make the judgement more complex.
  Comment 3: Insufficient Instrument Details
  The authors do not adequately describe the radiosonde’s hardware, factory specs, sensor measurement range and error, and its existing systematic QC capabilities. Error range data and laboratory calibration results are missing, making it difficult to evaluate how the proposed QC addresses hardware limitations.
  Response 3: Thank you for the reviewer's reminder. Many countries have their own upper-air sounding systems. The radiosonde data we are studying is generated by the L-band (Type 1) upper-air sounding system produced in China. The software used by the quality control personnel is the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101). The data processing software can carry out basic QC.
  The L-band (Type 1) upper-air sounding system consists of GFE (L)-1 secondary wind finding radar and GTS-1 digital sonde. The sounding meteorological variables include pressure, temperature, relative humidity, wind speed, and wind direction. The sampling period is 1.2 seconds, and the sampling frequency is about 50 times per minute,
  The measurement range of sensor: temperature ranges from 50 ℃ to -90 ℃; Humidity ranges from 1% to 100% (RH); the air pressure ranges from 1050hPa to 1hPa.
  The Absolute value of measurement error are as follows.
  measurement error of temperature satisfies ≤0.5℃ when air pressure>=100hPa ;
  measurement error of temperature satisfies ≤1.0℃ when 100hPa>=air pressure>=5hPa;
  measurement error of Humidity satisfies ≤5%RH;
  measurement error of air pressure satisfies ≤1hPa.
  More details can be found in the specification, which can be downloaded online.
  Comment 4: Overly Complex Methodology
  The QC method is unnecessarily complex for the problem it aims to solve. QC1 simply deals with filtering outliers and determining termination point. Such filtering tasks can be conducted with simple random error detection like provided by NIST (https://www.itl.nist.gov/div898/handbook/pmd/section2/pmd214.htm). And for termination point detection, the author claims that LDS method is consistent with manually labeling, and minimum pressure-based method is not appropriate. However, the manually labeling process is unclear. And the manuscript does not clearly explain differences between LDS termination and minimum pressure-based termination.
  Finally, it is unclear how the method handles sensor response times or time-lag corrections, which is one of the main causes of error in radiosonde data.
  Response 4 : I think perhaps this type of task can be conducted with simple random error detection like provided by NIST. We will also make attempts in the future. Due to the turbulence of balloons and various errors in radiosonde data, the minimum value may occur during the sounding process before the balloon explodes, so it is not correct to simply determine the termination time using the minimum value. It is difficult to describe the explosive moment in a clear paragraph. Atmospheric convection may produce many unexpected phenomena. We tried many methods and finally found that LDS can output results being very close to the results of manual QC. Yes, we believe that QC personnel are reliable before finding a better reference. Our work did not take into account the issue of sensor response times or time-lag corrections.
  Comment 5: Inconsistent Use of Data
  There is no explanation of whether the method accounts for known differences between day and night radiosonde data, particularly considering solar radiation-induced errors being the common main cause of error in radiosonde data.
  And as stated in the manuscript, nearly 19.2% of the data was manually filtered prior to applying the QC method. This pre-processing step undermines the claim of automation and raises concerns about scalability and repeatability. How did this manual process perform?
  Response 5: The station carries out routine up-air meteorological sounding observations at 08:00 and at 20:00 LT every day using an operational L-band radiosonde system. There is an impact of solar radiation on measurement. But our job is not to correct the temperature, but to identify erroneous data in the data sequence. The correction work is automatically completed by the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101).
  We have collected data over the past three years and found that the data filtered out by manual QC accounts for 19.2% of the original data. The S file we received not only contains the raw data, but also includes the manual QC labels. Our job is not to perform automatic QC based on the manual QC results, but to perform it on the raw data and compare it with manual QC results. The manual QC process mainly refers to the CMA released operational specification, "Operational specification for routine upper air meteorological observation", which can also be found and downloaded online.
  Comment 6: Limited Validation and Generalizability
  The method is evaluated using a dataset from few stations, and its effectiveness for other stations, weather conditions, or seasons is not addressed. Specifically, historical data used for calibration is not described, limiting the transparency of the validation process. How was the historical data generated? What does the historical data looks like in P, T, and RH? How trustworthy the historical data is that can be served as the reference base for QC process?
  Response 6: The raw radiosonde data is confidential, and all of our experiments were conducted at the Fujian Provincial Meteorological Information Center. Indeed, we do not conduct experiments on the dataset from 120 stations across the country. But the radiosonde dataset used in this paper includes observation results from January 1, 2020 to December 31, 2022, with a total of 2275 effective balloon-borne sounding. This dataset provides a rich data source to evaluate and validate the proposed quality control methods.
  Also, as mentioned in Section 4, our QC method needs to be trained with the historical manual QC data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  The upper-air sounding process is described in detail in Section 1. A data segment is shown in Figure 1.
  All operators in China strictly carried out the upper-air sounding in accordance with the CMA’s specification. The QC personnel have undergone rigorous training. The equipment also meets the requirements of WMO. Therefore, we have absolute confidence in the data used.
  Comment 7: Unclear Standards for Manual QC
  The authors mention overlaps with manual QC and shows mainly the comparison between the proposed QC method and the manual process. However, this study fail to describe the standards or processes used for manual labeling, making it challenging to interpret the results. How did the manual labeling perform? What kind of thresholds the process utilizes? And essentially, how trustworthy the manual labeling is so that the authors claim the proposed QC process is feasible when they are in consistence.
  Response 7: As mentioned in the manuscript, CMA released an operational specification, "Operational specification for routine upper air meteorological observation", to guide the data processing. The specification does not propose quantitative criteria for filtering out abnormalies, only provides some guiding opinions. At present, the quality control (QC) of second-level radiosonde data mainly relies on manual screening.
  There is no threshold described in the specification, because the atmosphere is a complex system and it is impossible to determine which data is correct using simple thresholds. Unless the threshold is a wide range.
  Finally, I would like to point out that these QC personnel are experienced, some of whom have been working in this field for more than ten years. If you think quality control personnel may also make mistakes, I think so. But, if you don't trust them, what else can we trust more. There is no absolute correctness. Are we just doing nothing? Even if doing some work can take a small step forward, I think this is the meaning of our work.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC2
RC3:
'Comment on amt-2024-164', Anonymous Referee #1, 27 Nov 2024

This manuscript deals with a proposed method for quality control (QG) of the thermodynamic variables (i.e. pressure, temperature and humidity) for a sounding made at China. The mathematical approach for the QC is very elegant and it may deserve to be published. However, beside many points appointed directly to the manuscript itself, I would like to mention a) it is necessary a good description about the sensors used in this radiosonde, especially the thermodynamics variable., b) what at the practical use of this new QC, especially in China. Only 3 stations have been checked. Another question is associate with the winds. They were not been analysed and I have understood that the thermodynamic data comes through out the L Band channel (but this should be emphasized

Citation: https://doi.org/10.5194/amt-2024-164-RC3
- AC3: 'Reply on RC3', Shi Zhang, 29 Nov 2024
  
  Thanks for your comments.
  Response A :
  The radiosonde data we are studying is generated by the L-band (Type 1) upper-air sounding system produced in China. The software used by the quality control personnel is the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101). The data processing software can carry out basic QC.
  The L-band (Type 1) upper-air sounding system consists of GFE (L)-1 secondary wind finding radar and GTS-1 digital sonde. The sounding meteorological variables include pressure, temperature, relative humidity, wind speed, and wind direction. The sampling period is 1.2 seconds, and the sampling frequency is about 50 times per minute,
  The measurement range of sensor: temperature ranges from 50 ℃ to -90 ℃; Humidity ranges from 1% to 100% (RH); the air pressure ranges from 1050hPa to 1hPa.
  The Absolute value of measurement error are as follows.
  measurement error of temperature satisfies ≤0.5℃ when air pressure>=100hPa ;
  measurement error of temperature satisfies ≤1.0℃ when 100hPa>=air pressure>=5hPa;
  measurement error of Humidity satisfies ≤5%RH;
  measurement error of air pressure satisfies ≤1hPa.
  More details can be found in the specification, which can be downloaded online. Or I can send it to you via mail. But it is Chinese.
  Response B:
  Indeed, we do not conduct experiments on the dataset from 120 stations across the country. As mentioned in Section 4, our QC method needs to be trained with the historical manual QC data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  Response C:
  The complete upper-air sounding data includes a sequence of temperature, pressure, and relative humidity, as well as the balloon positions, which can be used to calculate wind speed and wind direction. In this manuscript, we only perform QC on temperature, pressure, and humidity sequence, and the QC on wind data will be another task.
  Lastly, thank you for the reminder. We will emphasize in the revised manuscript that the thermodynamic data comes through the L Band channel.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC3

Status: closed

RC1:
'Comment on amt-2024-164', Anonymous Referee #2, 20 Nov 2024

For the large amount of second-level radiosonde data, manual QC is tedious and time-consuming, and also bring some technical problems, such as inconsistent judgments and missing errors. This manuscript proposed a two-stage QC method for second-level radiosonde data based on the Bezier curve fitting. The gross errors are filtered out in the QC1, and some other errors are filtered out in QC2. The basic idea is novel and technically sound, and the experiments are solid, but there are some problems related to technical details, valuation and writing.

1. The overlap of items identified by manual QC and automatic QC reaches 86%. It seems that the algorithmic QC doesn't seem that good.

2. LDS is used to identify the moment of sounding termination. Why only use LDS to determine the moment of sounding termination, instead of outputting the longest decreasing sequence?

3. Is this method applicable to radiosonde data collected from other upper-air sounding stations?

4. Some typos. Such as : “equipmenfaultslt” in line 31; “toeliminateg” in line 34.

Citation: https://doi.org/10.5194/amt-2024-164-RC1
- AC1: 'Reply on RC1', Shi Zhang, 23 Nov 2024
  
  Thanks for the reviewer’s valuable comments. In the following, we will reply the comments one by one, and revise the manuscript later.
  1. The overlap of items identified by manual QC and automatic QC reaches 86%. It seems that the algorithmic QC doesn't seem that good.
  Response: At present, the quality control of second-level radiosonde data is manually carried out to screen out erroneous data. The operational specification, "Operational specification for routine upper air meteorological observation", released by CMA does not propose quantitative criteria for filtering out errors. It only provides some guiding opinions. The understanding of the criteria varies from person to person. There are 4 quality control people in charge of the QC task. Therefore, the QC on the radiosonde data is hard to reach a consensus, especially for some micrometeorological errors, random errors, and systematical errors.
  The experimental data spans 3 years. The QC task is completed by the 4 QC persons individually. We think that the overlap rate of 86% is a good result. At the same time, the experiment results also show that the overlap rate of the QC1 stage results for gross error judgment reached 98.39%.
  2. LDS is used to identify the moment of sounding termination. Why only use LDS to determine the moment of sounding termination, instead of outputting the longest decreasing sequence?
  Response. During the ascent of a balloon, the balloon may fluctuate up and down because of turbulence. If we output the longest decreasing sequence and filter out other data, the final sequence may not truly reflect the complete process of the balloon from releasing to exploding. The moment of sounding termination may be earlier than the real one.
  3. Is this method applicable to radiosonde data collected from other upper-air sounding stations?
  Response: Yes, our method can be applied to radiosonde data collected at other upper-air sounding stations. Our method needs to be trained with the historical data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  4. Some typos. Such as : “equipmenfaultslt” in line 31; “toeliminateg” in line 34.
  Response: Thanks for the reviewer’s comment. “equipmenfaultslt” should be “equipment faults”, and “toeliminateg” should be “to eliminate”. They are caused by our carelessness. We have revised the manuscript carefully, and corrected them in the revised manuscript.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC1
RC2:
'Comment on amt-2024-164', Anonymous Referee #3, 21 Nov 2024

Comment Summary
The manuscript proposes an automatic quality control (QC) method for second-level radiosonde data including the utilization of Bezier curve fitting. The topic is relevant, as improving QC methods for radiosonde data is essential for meteorological applications, and the attempt to automate QC to reduce inconsistencies in manual methods is well-motivated and aligns with the need for standardization. However, there are significant flaws in its design and execution that hinder its contribution to the field. Below is a detailed review of the manuscript, including critical points of concern.
Major Concerns
1. Lack of Comparison with Existing QC Methods
Several automated QC programs for radiosonde data already exist. The lack of comparison undermines the novelty and relevance of the proposed method. The manuscript does not compare the proposed QC1+QC2 method with other established QC methods, or automated QC software like Aspen developed at NCAR EOL. Without highlighting differences or advantages, the study fails to justify why this method is necessary or superior, or at least suitable for future application.
2. Misuse of Bezier Curves
Bezier curves are primarily used for creating smooth, visually pleasing lines. A lot of time the high-resolution nature of upper-air radiosonde data is important for weather and climate. Bezier curves fitting seems unnecessarily suitable for filtering out radiosonde data especially only 2^nd-order fitting is applied in this study. It is unclear why this approach was chosen.
3. Insufficient Instrument Details
The authors do not adequately describe the radiosonde’s hardware, factory specs, sensor measurement range and error, and its existing systematic QC capabilities. Error range data and laboratory calibration results are missing, making it difficult to evaluate how the proposed QC addresses hardware limitations.
4. Overly Complex Methodology
The QC method is unnecessarily complex for the problem it aims to solve. QC1 simply deals with filtering outliers and determining termination point. Such filtering tasks can be conducted with simple random error detection like provided by NIST (https://www.itl.nist.gov/div898/handbook/pmd/section2/pmd214.htm). And for termination point detection, the author claims that LDS method is consistent with manually labeling, and minimum pressure-based method is not appropriate. However, the manually labeling process is unclear. And the manuscript does not clearly explain differences between LDS termination and minimum pressure-based termination.
Finally, it is unclear how the method handles sensor response times or time-lag corrections, which is one of the main causes of error in radiosonde data.
5. Inconsistent Use of Data
There is no explanation of whether the method accounts for known differences between day and night radiosonde data, particularly considering solar radiation-induced errors being the common main cause of error in radiosonde data.
And as stated in the manuscript, nearly 19.2% of the data was manually filtered prior to applying the QC method. This pre-processing step undermines the claim of automation and raises concerns about scalability and repeatability. How did this manual process perform?
6. Limited Validation and Generalizability
The method is evaluated using a dataset from few stations, and its effectiveness for other stations, weather conditions, or seasons is not addressed. Specifically, historical data used for calibration is not described, limiting the transparency of the validation process. How was the historical data generated? What does the historical data looks like in P, T, and RH? How trustworthy the historical data is that can be served as the reference base for QC process?
7. Unclear Standards for Manual QC
The authors mention overlaps with manual QC and shows mainly the comparison between the proposed QC method and the manual process. However, this study fail to describe the standards or processes used for manual labeling, making it challenging to interpret the results. How did the manual labeling perform? What kind of thresholds the process utilizes? And essentially, how trustworthy the manual labeling is so that the authors claim the proposed QC process is feasible when they are in consistence.

Citation: https://doi.org/10.5194/amt-2024-164-RC2
- AC2: 'Reply on RC2', Shi Zhang, 29 Nov 2024
  
  I would like to thank the reviewer's comments. In the follows, we will response them one by one.
  Comment 1: Lack of Comparison with Existing QC Methods
  Several automated QC programs for radiosonde data already exist. The lack of comparison undermines the novelty and relevance of the proposed method. The manuscript does not compare the proposed QC1+QC2 method with other established QC methods, or automated QC software like Aspen developed at NCAR EOL. Without highlighting differences or advantages, the study fails to justify why this method is necessary or superior, or at least suitable for future application.
  Response 1: The L-band (Type 1) upper-air sounding system includes hardware and software (data processing software). The software can carry out basic QC, but still remains many errors needed to be concerned. There are indeed some relevant studies, which are introduced in Section 1. We also analyzed the limitations of these methods. No matter what QC method is ultimately used, all modifications require final confirmation from QC personnel. Therefore, we consider that manual QC should be our ultimate comparison standard. Our method is training based on the historical manual QC results. The more consistent our QC results are with manual QC results, the more reliable our method is. And the comparing results are shown in Section 4.
  There are different upper-air sounding systems for each country. They have different data format. Also, we will try Aspen developed at NCAR EOL and make comparisons later.
  Comment 2: Misuse of Bezier Curves
  Bezier curves are primarily used for creating smooth, visually pleasing lines. A lot of time the high-resolution nature of upper-air radiosonde data is important for weather and climate. Bezier curves fitting seems unnecessarily suitable for filtering out radiosonde data especially only 2nd-order fitting is applied in this study. It is unclear why this approach was chosen.
  Response 2: Indeed, Bezier curves are used for creating smooth lines. We also agree that the Bezier cannot fit the sequences of temperature, pressure, and relative humidity in the radiosonde data directly. It is even impossible to find a curve that can describe the data sequence. But we notice that temperature, pressure, and relative humidity are continuous variables. The characteristic of continuous change is crucial for the manual QC. The smooth Bezier curves reflect the continuous variation characteristics of meteorological variables in the atmosphere. In our method, we first perform Bezier curve fitting based on control points, and then evaluate the degree of deviation using the difference between the fitted value and the observed value. That is to say, the fitting is used as a quantitative tool, and we evaluate the error probability based on the degree of deviation using a decision tree. We did not directly make a judgment based on fitting results.
  Through experiments, we find that the local changes in meteorological variables are more suitable for fitting with 2-order Bezier curves. One deviation is generated for each fitting. The k-order Bezier fitting will generate k-1 deviations, which may make the judgement more complex.
  Comment 3: Insufficient Instrument Details
  The authors do not adequately describe the radiosonde’s hardware, factory specs, sensor measurement range and error, and its existing systematic QC capabilities. Error range data and laboratory calibration results are missing, making it difficult to evaluate how the proposed QC addresses hardware limitations.
  Response 3: Thank you for the reviewer's reminder. Many countries have their own upper-air sounding systems. The radiosonde data we are studying is generated by the L-band (Type 1) upper-air sounding system produced in China. The software used by the quality control personnel is the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101). The data processing software can carry out basic QC.
  The L-band (Type 1) upper-air sounding system consists of GFE (L)-1 secondary wind finding radar and GTS-1 digital sonde. The sounding meteorological variables include pressure, temperature, relative humidity, wind speed, and wind direction. The sampling period is 1.2 seconds, and the sampling frequency is about 50 times per minute,
  The measurement range of sensor: temperature ranges from 50 ℃ to -90 ℃; Humidity ranges from 1% to 100% (RH); the air pressure ranges from 1050hPa to 1hPa.
  The Absolute value of measurement error are as follows.
  measurement error of temperature satisfies ≤0.5℃ when air pressure>=100hPa ;
  measurement error of temperature satisfies ≤1.0℃ when 100hPa>=air pressure>=5hPa;
  measurement error of Humidity satisfies ≤5%RH;
  measurement error of air pressure satisfies ≤1hPa.
  More details can be found in the specification, which can be downloaded online.
  Comment 4: Overly Complex Methodology
  The QC method is unnecessarily complex for the problem it aims to solve. QC1 simply deals with filtering outliers and determining termination point. Such filtering tasks can be conducted with simple random error detection like provided by NIST (https://www.itl.nist.gov/div898/handbook/pmd/section2/pmd214.htm). And for termination point detection, the author claims that LDS method is consistent with manually labeling, and minimum pressure-based method is not appropriate. However, the manually labeling process is unclear. And the manuscript does not clearly explain differences between LDS termination and minimum pressure-based termination.
  Finally, it is unclear how the method handles sensor response times or time-lag corrections, which is one of the main causes of error in radiosonde data.
  Response 4 : I think perhaps this type of task can be conducted with simple random error detection like provided by NIST. We will also make attempts in the future. Due to the turbulence of balloons and various errors in radiosonde data, the minimum value may occur during the sounding process before the balloon explodes, so it is not correct to simply determine the termination time using the minimum value. It is difficult to describe the explosive moment in a clear paragraph. Atmospheric convection may produce many unexpected phenomena. We tried many methods and finally found that LDS can output results being very close to the results of manual QC. Yes, we believe that QC personnel are reliable before finding a better reference. Our work did not take into account the issue of sensor response times or time-lag corrections.
  Comment 5: Inconsistent Use of Data
  There is no explanation of whether the method accounts for known differences between day and night radiosonde data, particularly considering solar radiation-induced errors being the common main cause of error in radiosonde data.
  And as stated in the manuscript, nearly 19.2% of the data was manually filtered prior to applying the QC method. This pre-processing step undermines the claim of automation and raises concerns about scalability and repeatability. How did this manual process perform?
  Response 5: The station carries out routine up-air meteorological sounding observations at 08:00 and at 20:00 LT every day using an operational L-band radiosonde system. There is an impact of solar radiation on measurement. But our job is not to correct the temperature, but to identify erroneous data in the data sequence. The correction work is automatically completed by the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101).
  We have collected data over the past three years and found that the data filtered out by manual QC accounts for 19.2% of the original data. The S file we received not only contains the raw data, but also includes the manual QC labels. Our job is not to perform automatic QC based on the manual QC results, but to perform it on the raw data and compare it with manual QC results. The manual QC process mainly refers to the CMA released operational specification, "Operational specification for routine upper air meteorological observation", which can also be found and downloaded online.
  Comment 6: Limited Validation and Generalizability
  The method is evaluated using a dataset from few stations, and its effectiveness for other stations, weather conditions, or seasons is not addressed. Specifically, historical data used for calibration is not described, limiting the transparency of the validation process. How was the historical data generated? What does the historical data looks like in P, T, and RH? How trustworthy the historical data is that can be served as the reference base for QC process?
  Response 6: The raw radiosonde data is confidential, and all of our experiments were conducted at the Fujian Provincial Meteorological Information Center. Indeed, we do not conduct experiments on the dataset from 120 stations across the country. But the radiosonde dataset used in this paper includes observation results from January 1, 2020 to December 31, 2022, with a total of 2275 effective balloon-borne sounding. This dataset provides a rich data source to evaluate and validate the proposed quality control methods.
  Also, as mentioned in Section 4, our QC method needs to be trained with the historical manual QC data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  The upper-air sounding process is described in detail in Section 1. A data segment is shown in Figure 1.
  All operators in China strictly carried out the upper-air sounding in accordance with the CMA’s specification. The QC personnel have undergone rigorous training. The equipment also meets the requirements of WMO. Therefore, we have absolute confidence in the data used.
  Comment 7: Unclear Standards for Manual QC
  The authors mention overlaps with manual QC and shows mainly the comparison between the proposed QC method and the manual process. However, this study fail to describe the standards or processes used for manual labeling, making it challenging to interpret the results. How did the manual labeling perform? What kind of thresholds the process utilizes? And essentially, how trustworthy the manual labeling is so that the authors claim the proposed QC process is feasible when they are in consistence.
  Response 7: As mentioned in the manuscript, CMA released an operational specification, "Operational specification for routine upper air meteorological observation", to guide the data processing. The specification does not propose quantitative criteria for filtering out abnormalies, only provides some guiding opinions. At present, the quality control (QC) of second-level radiosonde data mainly relies on manual screening.
  There is no threshold described in the specification, because the atmosphere is a complex system and it is impossible to determine which data is correct using simple thresholds. Unless the threshold is a wide range.
  Finally, I would like to point out that these QC personnel are experienced, some of whom have been working in this field for more than ten years. If you think quality control personnel may also make mistakes, I think so. But, if you don't trust them, what else can we trust more. There is no absolute correctness. Are we just doing nothing? Even if doing some work can take a small step forward, I think this is the meaning of our work.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC2
RC3:
'Comment on amt-2024-164', Anonymous Referee #1, 27 Nov 2024

This manuscript deals with a proposed method for quality control (QG) of the thermodynamic variables (i.e. pressure, temperature and humidity) for a sounding made at China. The mathematical approach for the QC is very elegant and it may deserve to be published. However, beside many points appointed directly to the manuscript itself, I would like to mention a) it is necessary a good description about the sensors used in this radiosonde, especially the thermodynamics variable., b) what at the practical use of this new QC, especially in China. Only 3 stations have been checked. Another question is associate with the winds. They were not been analysed and I have understood that the thermodynamic data comes through out the L Band channel (but this should be emphasized

Citation: https://doi.org/10.5194/amt-2024-164-RC3
- AC3: 'Reply on RC3', Shi Zhang, 29 Nov 2024
  
  Thanks for your comments.
  Response A :
  The radiosonde data we are studying is generated by the L-band (Type 1) upper-air sounding system produced in China. The software used by the quality control personnel is the L-band (Type 1) upper-air sounding system data processing software (6.0.0.20200101). The data processing software can carry out basic QC.
  The L-band (Type 1) upper-air sounding system consists of GFE (L)-1 secondary wind finding radar and GTS-1 digital sonde. The sounding meteorological variables include pressure, temperature, relative humidity, wind speed, and wind direction. The sampling period is 1.2 seconds, and the sampling frequency is about 50 times per minute,
  The measurement range of sensor: temperature ranges from 50 ℃ to -90 ℃; Humidity ranges from 1% to 100% (RH); the air pressure ranges from 1050hPa to 1hPa.
  The Absolute value of measurement error are as follows.
  measurement error of temperature satisfies ≤0.5℃ when air pressure>=100hPa ;
  measurement error of temperature satisfies ≤1.0℃ when 100hPa>=air pressure>=5hPa;
  measurement error of Humidity satisfies ≤5%RH;
  measurement error of air pressure satisfies ≤1hPa.
  More details can be found in the specification, which can be downloaded online. Or I can send it to you via mail. But it is Chinese.
  Response B:
  Indeed, we do not conduct experiments on the dataset from 120 stations across the country. As mentioned in Section 4, our QC method needs to be trained with the historical manual QC data on the specific sounding station before applying. The training process, including collecting statistics and training the decision tree, is to customize the model, which enables the model to adapt to the data characteristics and QC requirements of different sounding stations.
  Response C:
  The complete upper-air sounding data includes a sequence of temperature, pressure, and relative humidity, as well as the balloon positions, which can be used to calculate wind speed and wind direction. In this manuscript, we only perform QC on temperature, pressure, and humidity sequence, and the QC on wind data will be another task.
  Lastly, thank you for the reminder. We will emphasize in the revised manuscript that the thermodynamic data comes through the L Band channel.
  
  Citation: https://doi.org/10.5194/amt-2024-164-AC3

Huixia Lai, Lian Chen, Hualin Zhang, Ye Tian, Weijie Zhang, Bo Wang, and Shi Zhang

Viewed

Total article views: 586 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
271	162	153	586	18	28

HTML: 271
PDF: 162
XML: 153
Total: 586
BibTeX: 18
EndNote: 28

Views and downloads (calculated since 06 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	135	55	12	202
Dec 2024	39	16	0	55
Jan 2025	21	11	2	34
Feb 2025	17	7	1	25
Mar 2025	6	17	21	44
Apr 2025	14	13	45	72
May 2025	17	11	48	76
Jun 2025	19	29	24	72
Jul 2025	3	3	0	6

Cumulative views and downloads (calculated since 06 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	135	55	12	202
Dec 2024	39	16	0	55
Jan 2025	21	11	2	34
Feb 2025	17	7	1	25
Mar 2025	6	17	21	44
Apr 2025	14	13	45	72
May 2025	17	11	48	76
Jun 2025	19	29	24	72
Jul 2025	3	3	0	6

Viewed (geographical distribution)

Total article views: 584 (including HTML, PDF, and XML) Thereof 584 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 12 Jul 2025

Short summary

The radiosonde data contains unrealistic observations. A two-stage quality control (QC) method is proposed. In stage 1, gross errors are screened out. In stage 2, some random errors, systematic errors, and micro meteorological errors are filtered out. After QC, errors are reduced. The percentile profiles of 3 variables are more reasonable. The overlap ratio between automatic and manual QC results reaches 86 %. The method reduces the human resource cost, helps unify the QC standards.


Total:	0
HTML:	0
PDF:	0
XML:	0