Preprints
https://doi.org/10.5194/amt-2022-310
https://doi.org/10.5194/amt-2022-310
18 Jan 2023
 | 18 Jan 2023
Status: a revised version of this preprint was accepted for the journal AMT and is expected to appear here in due course.

A data-driven persistence test for robust (probabilistic) quality control of measured environmental time series: constant value episodes

Najmeh Kaffashzadeh

Abstract. Robust quality control is a prerequisite and an essential component in any data application. That is especially important for time series of environmental observations such as air quality due to their dynamic and irreversible nature. One of the common issues in these data is constant value episodes (CVEs), where a set of consecutive data values remains constant over a given period. Although CVEs are often considered as an indicator of sensor failure or other measurement errors and removed during quality control procedures, there are situations when CVEs reflect natural environmental phenomena, and they should not be removed from the data or analysis. Assessing whether the CVEs are erroneous data or valid observations is a challenge. As there are no formal procedures established for this, their classification is based on subjective judgement and therefore uncertain and irreproducible. This paper presents a novel test procedure, i.e., constant value test, to estimate the probability of CVEs being valid data. The theoretical foundation of this test is based on statistical characteristics and probability theory and takes into account the numerical precision of the data values. The test is a datadriven (parametric) approach, which makes it usable for time series analysis in different environmental research domains, as long as serial dependency is given and the data distribution is not too different from Gaussian. The robustness of the test was demonstrated with sensitivity studies using synthetic data with different distributions. Example applications to measured air temperature and ozone mixing ratio data confirm the versatility of the test.

Najmeh Kaffashzadeh

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on amt-2022-310', Anonymous Referee #1, 10 Feb 2023
    • AC1: 'Reply on RC1', Najmeh Kaffashzadeh, 02 Apr 2023
  • RC2: 'Comment on amt-2022-310', Anonymous Referee #2, 16 Mar 2023
    • AC2: 'Reply on RC2', Najmeh Kaffashzadeh, 02 Apr 2023

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on amt-2022-310', Anonymous Referee #1, 10 Feb 2023
    • AC1: 'Reply on RC1', Najmeh Kaffashzadeh, 02 Apr 2023
  • RC2: 'Comment on amt-2022-310', Anonymous Referee #2, 16 Mar 2023
    • AC2: 'Reply on RC2', Najmeh Kaffashzadeh, 02 Apr 2023

Najmeh Kaffashzadeh

Najmeh Kaffashzadeh

Viewed

Total article views: 290 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
214 61 15 290 5 5
  • HTML: 214
  • PDF: 61
  • XML: 15
  • Total: 290
  • BibTeX: 5
  • EndNote: 5
Views and downloads (calculated since 18 Jan 2023)
Cumulative views and downloads (calculated since 18 Jan 2023)

Viewed (geographical distribution)

Total article views: 290 (including HTML, PDF, and XML) Thereof 290 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 21 May 2023
Download
Short summary
Although quality control is well-known issue in data application, research initiatives and organizations apply given methods based on traditional techniques (ad-hoc thresholds and manual). These approaches are not only error-prone, but also unsuitable for a large volume of data. The method proposed in this paper is based on a new concept (probability) as an intuitive indicator and data’s characteristic perse, which leads to be applicable to wide variety of data and eases “fit for purposes”.