Contrary to the statements put forward in “Evaluation of measurement data – Guide to the expression of uncertainty in measurement”, edition 2008 (GUM-2008), issued by the Joint Committee for Guides in Metrology, the error concept and the uncertainty concept are the same. Arguments in favor of the contrary have been analyzed and found not to be compelling. Neither was any evidence presented in GUM-2008 that “errors” and “uncertainties” define a different relation between the measured and true values of the variable of interest, nor does this document refer to a Bayesian account of uncertainty beyond the mere endorsement of a degree-of-belief-type conception of probability.
It has long been recognized that quantitative characterization of the reliability of a measurement is essential for drawing quantitative conclusions
from the measured data. Various and often contradictory methods and terminologies emerged over the years. The Towards Unified Error Reporting (TUNER) activity aims at a unification of the reporting of errors in estimates of atmospheric state variables retrieved from satellite measurements
In this paper we critically assess some of the claims made in GUM-2008 and, as part of the TUNER activity, discuss its applicability to remote sensing of
the atmosphere. We start by analyzing GUM's claim about the differences between error and uncertainty (Sect.
GUM-2008 endorses a new terminology compared to that of traditional error
analysis. In the context of the work undertaken by the TUNER activity,
a project aiming at unification of error reporting of satellite data
According to GUM-2008, the concept of uncertainty analysis should replace the
concept of error analysis. The International Vocabulary of Metrology document
A key claim of GUM-2008 is that the “concept of
Already in the pre-GUM language there have been subtle linguistic differences between the terms “error” and “uncertainty”. The error has been conceived as an attribute of a measurement or an estimate, while the term “uncertainty” has been used as an attribute of the true state, or, more precisely, an attribute of an agent's knowledge about the true state. We perfectly know the result of our measurement – even if it is erroneous – but we are
uncertain about the true value.
Because of the measurement error there is an uncertainty as to what the true
value is. The uncertainty thus describes the degree of ignorance about the
true value, while the estimated error describes the degree to which the measurement is thought to deviate from the true value. In this use of language,
both terms still relate to the same concept. This notion seems, as far as we
can judge, to be consistent with the language widely used in the pre-GUM
literature since
It must, however, be noted that the term “error” is an equivocation. It has been used for both the unknown and unknowable signed actual difference between the measured value and the true value of the measurand and for a statistical estimate of it. The statistical estimate is mostly understood to be the square root of the variance of the probability density function of the
error When we use variances and standard deviations, we do not mean sample variances and sample standard variations but simply the second central moment of a distribution or its square root. In accordance with GUM-2008, this distribution can represent a probability in the sense of personal belief and thus can also include systematic effects. See also Sect. Other estimates are also used, e.g., robust ones like the interquartile range.
One of the first major documents where the term “
More recently, GUM-2008 presented a narrower definition for how we should conceive the term “error” and stipulated a new terminology, where the term “measurement uncertainty” is used in situations where one would have said “measurement error” before. According to GUM-2008 (p. 2), the uncertainty of a measurement is defined as “a parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand”. Conversely, GUM-2008 (Annex B.2.19) allows for the term “error” only the connotation “signed difference”, but their use of the terms “error and error analysis” in the first sentence of their Sect. 0.2 or “possible error” in their Sect. 2.2.4 only makes sense if the statistical meaning of the term “error” is conceded.
In spite of the explicit definition in GUM-2008, there seems to be no
unified stance among GUM-2008 endorsers as to what “error” is. For example,
The use of the term “uncertainty” in GUM-2008 seems inconsistent: the general GUM-2008 concept seems to be that the “error” has to include all error sources and thus cannot be known; “uncertainty” is weaker: it is only an estimate of quantifiable errors, excluding the unknown components. This view is supported by the following quotation (GUM-2008, p. viii): “It is now widely recognized that, when all of the known or suspected components of error have been
evaluated and the appropriate corrections have been applied, there still
remains an uncertainty about the correctness of the stated result, that is, a
doubt about how well the result of the measurement represents the value of the quantity being measured.” It is not fully clear what this means. One possible reading is that they use the term “error” in the redefined sense, i.e., as a quantity which measures the actual deviation from the true value. Then this
statement would be a mere truism, just saying that after all correction and
calibration activities there is still a need for error (in the error concept
terminology) estimation. The only other possible reading is that they want to
say that, due to unknown (unrecognized and/or recognized but not quantified Rigorously speaking, within the concept of subjective probability, recognized but unquantified uncertainties should not exist.
The introduction of the term “uncertainty of measurement” seems to us a mere linguistic revision of an established terminology which does not connect to any further insights. The issue of whether the term “error” should also be used for a statistical estimate cannot be judged on scientific grounds. It is a matter of stipulation, although in the main body of GUM-2008 this stipulation is presented as if it were a factual statement (“In this Guide, great care is taken to distinguish between the terms `error' and `uncertainty'. They are not synonyms, but represent completely different concepts; they should not be confused with one another or misused.”, Sect. 3.2, Note 2). The synonymity of “error” and “uncertainty” is thus neither true nor false but adequate or inadequate. Instead of quibbling about words, we will, in the next section, concentrate on the concepts behind these terms.
Although GUM-2008 (Sect. 0.2) claims the “concept of uncertainty as a
quantifiable attribute to be a relatively new concept in the history of
measurement”, we uphold the view that it has long been recognized that
the result of a measurement remains to some degree uncertain even when a
thorough measurement procedure and error evaluation are performed. Investigators realized already in the 19th century that measurement results always have errors.
GUM-2008 not only presents traditional error analysis in a revised language, but also suggests that there is more to it. That is to say, the entire concept is claimed to be replaced (see, e.g., GUM-2008, Sect. 3.2.2, Note 2). We understand that GUM-2008 grants that the classical concept of error analysis deals with statistical quantities, but these are statistical estimates of the difference between the measured or estimated value and the true value. We take GUM-2008 to be saying that the reference of even this statistical quantity to the true value poses certain problems, because the true value is unknown and unknowable. As a solution of this problem, the uncertainty concept is introduced, which allegedly makes no reference to the true value of the measurand and is thus hoped to avoid related problems. GUM-2008 (particularly Sect. 2.2.4) unfortunately leaves room for multiple interpretations, but our reading is that an error distribution is understood by GUM-2008 to be a distribution whose dispersion is the estimated statistical error and whose expectation value is the true value, while an uncertainty distribution is understood to be a distribution whose dispersion is the estimated uncertainty and whose expectation value is the measured or estimated value.
GUM-2008 (p. 5) characterizes error as “an idealized concept” and states that “errors cannot be known exactly”. This is certainly true, but it has never been claimed that errors can be known exactly. Since not all relevant error sources are necessarily known, any error estimate remains fallible, but still it is and has always been the goal of error analysis to provide error estimates that are as realistic as possible. To use the statistical conception of “error” and, conceding the fallibility of its estimated value, it is not necessary to know the true value. It is only necessary to know the chief mechanisms which can make the measured value deviate from the true value and to have estimates available of the uncertainties of the input values to these mechanisms.
Some GUM-2008 endorsers
The conceptual differences between error analysis and uncertainty analysis seem to come down to the different relations between the measured and true values of the measurand. In GUM-2008 (pp. 3 and 5), the claim is made that the uncertainty concept can be construed without reference to the unknown and unknowable true value while the error concept cannot (GUM-2008, p. 3) and that the uncertainty concept is more adequate because there can always exist unknown error sources which entail that an error budget can never be guaranteed to be complete (GUM-2008, p. viii). It is stated that the uncertainty concept is not inconsistent with the error concept (GUM-2008, p. 2/3). There are, however, certain inconsistencies and shortcomings, which are discussed in the following.
One of the major purposes of making scientific observations, besides triggering ideas on possible relations between quantities, is to test predictions based on theories on the real world
On page 3, GUM-2008 says that the attribute “true” is intentionally not used within the uncertainty concept because truth is not knowable. In GUM-2008 (p. 59) it is claimed that the uncertainty concept “uncouple[s] the often confusing connection between uncertainty and the unknowable quantities `true' value and `error' ”. The term “measurand” in their definition, however, is defined as the quantity intended to be measured
Subjective probability reflects the personal degree of belief (GUM-2008, p. 39). Thus, a knowledge-dependent concept of probability is used in GUM-2008. This approach has been chosen to allow the
treatment of systematic errors as dispersions, although the systematic error
does not vary and cannot thus be characterized by a distribution in a
frequentist sense (GUM-2008, p. 60). If we construe “estimated error” and
“estimated value” as parameters of a distribution assigning to each possible value the probability (in a Bayesian context) or the likelihood (in a maximum likelihood context See Sect.
There is nothing wrong with the subjectivist concept of probability, nor are we attacking the possibility of combining random and systematic errors in a single distribution. This concept, however, makes the knowledge of the true value and the true error unnecessary, and still the estimated error can be conceived as a statistical estimate of the absolute difference between the measured value and the true value. We consider it untenable and inconsistent to refer to the concept of subjective probability when it comes in handy and to deny it when it would solve the conflict between the error and uncertainty concepts.
Our skepticism about the possibility of dispensing with the concept of the
true value is shared by, e.g.,
In GUM-2008 (p. 2/3), it is claimed that the concept of uncertainty “is not
inconsistent with other concepts of uncertainty of measurement, such as a
measure of the possible error in the estimated value of the measurand as
provided by the result of a measurement [or] an estimate characterizing the
range of values within which the true value of the measurand lies It is not clear how this can be achieved without explicit consideration of the Bayes theorem.
Interestingly enough, early documents of the history of GUM
The answer to the terminological differences was found to be contingent upon the underlying stipulation and that any statement about their equivalence or difference without reference to a definition is a futile pseudo-statement. The answer to the question of conceptual differences is less trivial and deserves some deeper scientific discussion. The main question still seems to be how the true value, the error or uncertainty, and the measured value are related to each other. This question will be addressed in the following section.
The alleged key problem of the error concept is, in our reading of GUM-2008,
that the value of the true value of the measurand is not known and that this true value must appear neither in the definition of any term nor in the recipes to estimate it. To better understand this key problem, we decompose it into four sub-problems.
Quantities whose value cannot be determined must not appear in definitions. The error distribution must not be conceived as a probability density distribution of a value to be the true value. Nonlinearity issues pose problems in error estimation if the true value is not known, at least in approximation. One can never know that the uncertainty budget is complete because it can always happen that a certain source of uncertainty has been overlooked; thus, the full error estimate is an unachievable ideal, and thus the estimated error does not provide a link between the measured value and the true value.
Some of these sub-problems are in some way formulated in GUM-2008, but it is not exactly specified there why the fact that the true value of the measurand is unknowable poses a problem to the scientist applying traditional error
estimation. We have formulated others as devil's advocates, which are intended to serve as working hypotheses to critically discuss the error and uncertainty concepts in the context of indirect measurements. In the following we will scrutinize these theses one after the other.
GUM-2008 tries to avoid using the true value of the measurand in the definition of the term “uncertainty”. This strategy is employed because the true value of the measurand is “not knowable” (GUM-2008, p. 3). It may be puzzling why it should be necessary to know the value of a quantity to use it in the definition of a term. The heights of the Colossus of Rhodes or the Lighthouse of Alexandria are well-defined quantities, although we have no chance to measure them today We owe this illustrative example to
In GUM-2008 it is claimed that the definition of “uncertainty”
is an operational one (p. 2). An operational definition defines a quantity by
stipulating a procedure by which a value is assigned to this quantity. The
concept of operational definitions was suggested by
GUM-2008's claim that the uncertainty concept is based on an operational
definition leads to two further inconsistencies. First, no unambiguous
operation is stipulated on which the definition can be based, but multiple
operations are proposed, which might give different uncertainty estimates.
Thus, the definition is void. Our critical attitude with respect to
operationalism in the context of GUM-2008 is shared, e.g., by
The other problem with the operational definition is the following: in GUM-2008 (pp. 2–3), it is claimed that the uncertainty concept is not inconsistent with the error concept, and a few lines later it says that “an uncertainty component is always
In summary, the fact that the true value of the measurand is unknowable is a problem for the definition of the term “error” and its statistical estimates only if we commit ourselves to the doctrine that only operational definitions must be used. If we abandon this dogma, there is nothing wrong with conceiving of the estimated error as a statistical estimate between the measured or estimated value and the true value, and the problem is restricted to the assignment of a value to this quantity. Related issues are investigated in the following.
Many conceptions of measurement models exist which relate the measured value to the true value and, depending on the context, one can be more adequate than another
The causal error points from the true value to the measured signal. Thus, the
estimation of the true value from a measured value can be conceived as an inverse process. An argument along this line of thought, but in a context wider than that of remote sensing of the atmosphere, has been put forward
by
With a transfer function
Conversely, for a given measurement
The non-consideration of the Bayes theorem goes under the name of “base-rate fallacy”; 50 % of people suffering from Covid-19 have fever
The first solution is to apply a retrieval scheme that is based on
a Bayesian estimator. Examples are found, e.g., in
The second solution is the application of the principle of indifference, as
applied, e.g., by
The third solution is the likelihood interpretation, which has been
introduced by
We concede that the interpretation of a measured value as the most probable true value is problematic. This implies that the interpretation of the error estimate as the width of a distribution around the true value is also not generally valid. These problems could justify some reluctance with regard to the concept of the true value. This argument, involving the base-rate fallacy, however, is not invoked in GUM-2008.
Some interpretations of GUM-2008 Many of the methods presented in GUM-2008, including their “Type-A evaluation (of uncertainty)”, which is the “method of evaluation of uncertainty by the statistical analysis of series of observations” (p. 3), are from the frequentist toolbox. If the uncertainty concept was indeed based on a Bayesian framework, it would be astonishing why it does not in the first place require one to apply the Bayes theorem to convert the likelihood distributions to a posteriori probability distributions. The methodology proposed in GUM-2008 is uncertainty
propagation. This is a mere forward (or direct) problem: given that
Interestingly enough,
Thus, we reject the hypothesis that the uncertainty concept as presented in GUM-2008 is a Bayesian concept. Bayesianism does not help to understand the claimed differences between the error concept and the uncertainty concept.
The uncertainty concept relies on the possibility of evaluating uncertainties
caused by measurement errors and “systematic effects” without knowledge of
the true value. This is certainly granted for linear problems. Here the
uncertainty estimates do not depend on the value of the measurand. This is
because in the linear case Gaussian error propagation of the type
For nonlinear problems the situation is more complicated because
The endorser of the uncertainty concept has a problem if they want to stay consistent with their doctrine. Since knowledge of the true value is denied, it is not clear how Gaussian error estimation can be applied to the propagation of uncertainties, because it is not clear for which value of the measurand the required partial derivatives shall be evaluated.
On the face of it, Monte Carlo error estimation or other variants of
ensemble-based sensitivity studies can serve as an alternative. These,
however, also invoke the nonlinear model that links the measured signal with
the measurand, and uncertainty estimates thus still depend on the choice of
the estimate that represents the true value; any choice of this value which is not closely related to the true value of the measurand will produce uncertainty
estimates which are recalcitrant against any interpretation. Monte Carlo and
related methods, however, are apt for the estimation of the error budget, including the systematic effects if
In summary, the evaluation of uncertainties in the case of nonlinearity poses a problem to the scientist who denies approximate knowledge of the true value of the measurand, because the uncertainty estimate depends on the assumed value of the measurand, and it must be assumed that it represents the true value reasonably well. Within the framework of error analysis this assumption is allowed, and measurement errors as well as systematic effects can thus also be evaluated for nonlinear inverse problems.
The arguments put forward above are based on the supposition that the error budget is complete. Beyond measurement noise, the total error budget includes systematic effects in the measured signal, uncertainties in parameters other than the measurand that affect the measured signal, and effects due to the chosen inverse scheme. If our reading of GUM-2008 is correct, then the most severe criticism by GUM-2008 of the “error concept” is that one can never be sure that the error budget is indeed complete and that the error estimate thus does not characterize the difference between the value estimated from the measurement value and the true value.
The precision of a measurement is a well-behaved quantity in the sense that it is testable in a straightforward way: from at least three sets of collocated measurements of the same quantity, where each set is homogeneous with respect to the expected precision of its measurements, the variances of the differences provide unambiguous precision estimates (see, e.g.,
A falsificationist
In this section we identify issues where GUM-2008 clashes with the needs of error or uncertainty estimation in the field of remote sensing of atmospheric constituents and temperature. These issues are (1) that since the atmospheric state varies quasi-continuously in space and time, the measurand is not well defined, and (2) there are applications of atmospheric data where the total uncertainty estimate alone does not help.
On macroscopic scales, atmospheric state variables vary continuously in
space and time. On microscopic scales, the typical target quantities, concentrations, or temperature are not even defined. A typical example of this problem is the volume mixing ratio (VMR) of a certain
species at a point in the atmosphere (see also In GUM-2008 this problem is recognized, but no solution is offered; the term “definitional uncertainty” is introduced in this context but is not applied in practice.
One of the positive aspects of GUM-2008 is that it breaks with the misleading concept of characterizing systematic errors with “safe bounds”
The qualification not
In this context it is important to note that, in contrast to some older conceptions,
Benevolent readers of GUM-2008 take the GUM authors to be saying only that the aggregation of estimated errors to give the total error budget follows the same rules for systematic and random errors and that the criticized statement is not meant to deny the importance of distinguishing between random and systematic errors beyond the mere aggregation process. If this reading is correct, we agree, but here GUM-2008 leaves room for interpretation.
We have mentioned above that the uncertainty concept depends on the acceptance of the subjective probability in the sense of degree of rational belief. Without that, an error budget including systematic effects would make no sense because systematic effects cannot easily be conceived as probabilistic in a frequentist sense; that is to say, the resulting error cannot be conceived as a random variable in a frequentist sense. Being forced to adopt the concept of probability as a degree of rational belief, it makes perfect sense to conceive, after consideration of the Bayes theorem (see Sect.
The denial that a valid connotation of the term “error” is a statistical characterization between a measured or estimated value and the true value of the measurand would be an attempt to brush away centuries of scientific literature. This is, however, a matter of stipulation or convention and thus beyond the reach of a scientific argument. We thus take GUM-2008 to be conceding that both the concepts, error analysis and uncertainty assessment, aim at providing a statistical characteristic of the imperfectness of a measurement or an estimate. We understand GUM-2008 in the sense that the problem of the error concept is that it conceives the estimated error as a statistical measure of the difference between the measured or estimated values and the true value. Since the true value is unknowable, according to GUM-2008 the term “error” can neither be defined nor its value known.
It has been shown that the problem of the unknown true value of the measurand is a problem for the definition of terms like “error” or “uncertainty” only if the concept of an operational definition is pursued. This concept, however, has its own problems and is by no means without alternatives. As soon as the concept of an operational definition is given up, problems associated with defining the estimated error as a statistical estimate of the difference between the measurement or estimate and the true value of the measurand disappear, and the problem remaining is only one of assigning a reasonable value to this now well-defined quantity.
Since GUM-2008 did not provide many reasons why, in the context of indirect measurements, the error allegedly cannot be estimated without knowledge of the true value or why an uncertainty distribution does not tell us anything about the true value, we list the most obvious ones one could put forward to bolster this claim. These are the problem of the base-rate fallacy, the problem of nonlinearity, and the problem that one can never know that the error budget is complete. The problem of the base-rate fallacy can be solved by either performing a Bayesian inversion or by conceiving the resulting distribution as a likelihood distribution. Astonishingly enough, the GUM-2008's “dispersion or range of values that could be reasonably attributed to the measurand” is determined without explicit consideration of prior probabilities and thus cannot be interpreted in terms of posterior probability. The problem of nonlinearity can be solved either by assuming that the estimate is close enough to the true value and linearizing around this point or by Monte Carlo studies. A GUM-oriented scientist, who has to avoid referring to the true value, is at a loss in the case of nonlinearity because any estimate of the uncertainty of the estimate will be correct only when evaluated at the true value or an approximation of it. The problem of the unknown completeness of the error budget can be tackled by performing comparisons between measurement systems. While this will never provide positive proof of the completeness of the error budget, it still justifies rational belief in its completeness, and if error or uncertainty distributions are conceived as subjective probabilities in the sense of degrees of rational belief, this is good enough. In summary, if (a) our reading of GUM-2008 is correct in the sense that the traditional error analysis can deal with a statistical quantity and that the key difference between the “error” and “uncertainty” concepts is their relation to the true value of the target quantity and (b) that our list of arguments against the error concept is complete, and finally if (c) our refutation of these arguments is conclusive, then the claim that the “error” concept and the “uncertainty” concepts are fundamentally different is untenable.
Beyond this, reasons have been identified that bring the applicability of the GUM-2008 concept to atmospheric measurements into question. At least we can state that GUM-2008, by presenting its terminological stipulation about the terms “error” and “uncertainty” in the guise of a factual statement, has triggered a linguistic discussion that distracted attention away from the more important issues of how the principles of error or uncertainty estimation, whatever one prefers to call them, could be made better applicable to measurements beyond the idealized cases covered by their document.
No data sets were used in this article.
TvC identified the title problem and provided a draft version of the paper. SC contributed information on the history of GUM and on the literature on GUM (supportive and critical) and helped to understand some less clear parts of GUM-2008. FH contributed information to the history of science; TvC contributed information on the philosophy of science and statistics. All the authors co-wrote the final version of the paper.
Two of the authors are associate editors of
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the special issue “Towards Unified Error Reporting (TUNER)”. It is not associated with a conference.
We are grateful for the scientific guidance and sponsorship of the World Climate Research Programme in motivating this work, coordinated in the framework of SPARC and performed as part of the TUNER activity. The International Space Science Institute (ISSI) has hosted two team meetings and provided further support. Steven Compernolle is supported by EU H2020 project Copernicus Cal/Val Solution (CCVS), grant no. 101004242. We thank Antonio Possolo and two anonymous reviewers for thorough and insightful reviews of this paper.
Steven Compernolle is supported by EU H2020 project Copernicus Cal/Val Solution (CCVS) (grant no. 101004242).The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).
This paper was edited by Nathaniel Livesey and reviewed by Antonio Possolo and two anonymous referees.