Preprints
https://doi.org/10.5194/amt-2022-310
https://doi.org/10.5194/amt-2022-310
 
18 Jan 2023
18 Jan 2023
Status: this preprint is currently under review for the journal AMT.

A data-driven persistence test for robust (probabilistic) quality control of measured environmental time series: constant value episodes

Najmeh Kaffashzadeh Najmeh Kaffashzadeh
  • Institute of Geophysics, University of Tehran, Tehran, Iran

Abstract. Robust quality control is a prerequisite and an essential component in any data application. That is especially important for time series of environmental observations such as air quality due to their dynamic and irreversible nature. One of the common issues in these data is constant value episodes (CVEs), where a set of consecutive data values remains constant over a given period. Although CVEs are often considered as an indicator of sensor failure or other measurement errors and removed during quality control procedures, there are situations when CVEs reflect natural environmental phenomena, and they should not be removed from the data or analysis. Assessing whether the CVEs are erroneous data or valid observations is a challenge. As there are no formal procedures established for this, their classification is based on subjective judgement and therefore uncertain and irreproducible. This paper presents a novel test procedure, i.e., constant value test, to estimate the probability of CVEs being valid data. The theoretical foundation of this test is based on statistical characteristics and probability theory and takes into account the numerical precision of the data values. The test is a datadriven (parametric) approach, which makes it usable for time series analysis in different environmental research domains, as long as serial dependency is given and the data distribution is not too different from Gaussian. The robustness of the test was demonstrated with sensitivity studies using synthetic data with different distributions. Example applications to measured air temperature and ozone mixing ratio data confirm the versatility of the test.

Najmeh Kaffashzadeh

Status: open (until 23 Feb 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Najmeh Kaffashzadeh

Najmeh Kaffashzadeh

Viewed

Total article views: 146 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
119 23 4 146 2 2
  • HTML: 119
  • PDF: 23
  • XML: 4
  • Total: 146
  • BibTeX: 2
  • EndNote: 2
Views and downloads (calculated since 18 Jan 2023)
Cumulative views and downloads (calculated since 18 Jan 2023)

Viewed (geographical distribution)

Total article views: 141 (including HTML, PDF, and XML) Thereof 141 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 30 Jan 2023
Download
Short summary
Although quality control is well-known issue in data application, research initiatives and organizations apply given methods based on traditional techniques (ad-hoc thresholds and manual). These approaches are not only error-prone, but also unsuitable for a large volume of data. The method proposed in this paper is based on a new concept (probability) as an intuitive indicator and data’s characteristic perse, which leads to be applicable to wide variety of data and eases “fit for purposes”.