In this paper we compare two different methods of estimating the error variances of two or more independent data sets. One method, called the “three-cornered hat” (3CH) method, requires three data sets. Another method, which we call the “two-cornered hat” (2CH) method, requires only two data sets. Both methods have been used in previous studies to estimate the error variances associated with a number of physical and geophysical data sets. A key assumption in both methods is that the errors of the data sets are not correlated, although some studies have considered the effect of the partial correlation of representativeness errors in two or more of the data sets.

We compare the 3CH and 2CH methods using a simple model to simulate three and two data sets with various error correlations and biases. With this model, we know the exact error variances and covariances, which we use to assess the accuracy of the 3CH and 2CH estimates. We examine the sensitivity of the estimated error variances to the degree of error correlation between two of the data sets as well as the sample size. We find that the 3CH method is less sensitive to these factors than the 2CH method and hence is more accurate. We also find that biases in one of the data sets has a minimal effect on the 3CH method, but can produce large errors in the 2CH method.

In atmospheric sciences, observations and models are
often combined with the goal of providing accurate and complete
representations of the current or future state of the
atmosphere. Knowing the error characteristics of observations and
models is important to understanding the degree to which atmospheric
phenomena of interest are accurately described and analyzed.
Estimating observational and modeling error characteristics is thus
of inherent scientific interest. In addition, knowing the error
characteristics is important for practical applications such as data
assimilation and numerical weather prediction. In many modern data
assimilation schemes, observations of a given type are weighted
proportionally to the inverse of their error variance

There are several somewhat similar methods for estimating the error
variances associated with two or more data sets. The “three-cornered
hat” (3CH) method and the closely related “triple collocation” (TC)
method have been used in physics, oceanography and other scientific
disciplines to estimate the errors associated with three independent
data sets.

The 3CH method was originally developed as the “N-cornered hat”
method

The major assumption in all of the above methods is that the errors of
the three systems are uncorrelated. Correlations between
any or all of the three measurement systems will reduce the accuracy
of the error estimates. Other factors that can reduce the accuracy
include widely different errors associated with the three systems or a
small sample size. These factors can lead to negative estimates of
error variances, especially when the estimates are close to zero

Variations of Stoffelen's TC method have been
have been widely used in the fields of oceanography and
hydrometeorology

In this paper we estimate the effect of neglecting the error covariances using two or three simulated data sets for which the true error variances and covariances are known. We develop a model to simulate the data sets with random and bias errors using a set of assumed true profiles. We then calculate the true error variance and covariance terms in the simulated data sets and show the impact of neglecting these terms on the estimated error variances.

We assume we have three data sets,

The error variance of the data set

The error covariance of the data sets

In the 3CH method, the relationship between the error variances, the
mean square (

The last three covariance terms in
Eqs. (

The 2CH method uses only two data sets,

An alternative form of Eqs. (6a) and
(7a) that is used in some studies

In the 3CH method, the neglected error terms when computing VAR

In the 2CH method, the neglected error terms when computing VAR

We note that the neglected error terms in the 2CH method contain terms involving the product of true with errors, unlike in the 3CH method. Because true is typically an order of magnitude greater than the errors, these terms are likely much larger than the neglected terms involving only products of errors, as in the 3CH method. We also note that if the errors are random and uncorrelated, all of the error terms will be zero for an infinite sample size. However, for finite sample sizes, these terms will be non-zero even if these conditions are met.

We note that Eq. (

The AE is equivalent to the observation minus background
(O

We first generate a set of

We then look at the magnitude of the error terms in the 2CH and 3CH
methods with various assumed correlation coefficients between the
errors in

The assumed normalized random error model for

The error model is created based on error estimates of specific
humidity from several studies

In the calculation of the correlated errors, we first generate the random error profiles

Figure

The correlation coefficient

The relationship between the correlation coefficient and covariance
between

It can be shown that for this particular error model

The correlation between

Relationship between normalized error variances and standard
deviations of data sets

We use 2007 ERA specific humidity

Assume each vertical profile of

Generate three different and independent random error profiles

Generate

Add

Compute the normalized estimated error variance profiles of

We now derive expressions for the estimated values of the error
variances for

In the 3CH method, three covariance terms are neglected. These terms
are zero for an infinite sample size for uncorrelated errors between
the data sets. Correlation between the errors of two data sets will
lead to a non-zero covariance term, which becomes larger for larger
correlations. The error covariances COV

Vertical profiles of normalized error covariances (

For our model (error correlation of

Using Eq. (

Hence for

We next consider the effect of the

Substituting for the COV term from Eq. (

Thus the estimated error variance for

Lastly, we consider the effect of the

Substituting for the COV term from Eq. (

For the 3CH method, correlation between the data sets

VAR

VAR

VAR

Figure

Ratio of estimated

We next examine how error correlations in our error model affect the
2CH method. To estimate the error variance of

Terms involving products of true and (

Mean of products of true and

Figure

Two 2CH results each for the normalized error SD for

Figure

Estimated and exact normalized error standard deviations for 3CH method

We next consider the effect of the sample size on the error estimates from the 3CH and 2CH methods. We repeat the calculations from both methods by using a subset of the 1460 samples used in the above calculations. We created the subset by selecting every tenth sample from the complete set, giving a sample size of 146.

Same as Fig.

Figure

To investigate the effect of both random and systematic errors, we add
a known bias

For the 3CH method, Eq. (3a) becomes

This is not the case in the 2CH method, as shown
below. Equation (6a) becomes

Effect of adding a constant bias of 10 % to

As noted above, the bias term in the 3CH method can be close to zero
and therefore not affect the computed VAR

For the simulated data set, Eq. (

Figure

More generally, we can derive expressions for the effect of biases of
all three data sets with respect to true as follows
(Sergey Sokolovskiy, personal communication,
2018): let

The first term of Eq. (

We use co-located data of RO, RS and ERA at Mina, which is
one of the four RS stations
studied by

To compute error variances using the 2CH method,
Eq. (6a) is applied for various combinations of the
four data sets. Data pairs at each level are only used when data are
available for all four data sets. All data sets are interpolated to a
common 25 hPa grid. Figure

Estimated error variances for specific humidity:

The two model and the RO data sets are representative of similar
horizontal scales (

All four data sets have some degree of unknown bias for certain
locations, altitudes or atmospheric conditions; none of them
represent the ultimate “truth” and there is no standard atmospheric
data set for calibration. However, they have all been compared to
other models or observations to one degree or another. We
investigated the effect of biases in a related paper by

We use our results from the 3CH method for real data

The other panels in Fig.

2CH method error variances for ERA, RO and RS

Thus we find that the 2CH method produces estimates of the error variances for specific humidity that are quite different from those of the 3CH method. We suspected that the cause for the different behavior using real data might lie in the different treatment of bias errors in the 2CH and 3CH method. To investigate this hypothesis, we considered the effect of an empirically based bias in the simulated data.

As shown in Sect.

2 to

5 to

The bias used for

3 to 1 % from 1000 to 700 hPa

1 to

The respective bias is added to both

We use these specified biased data sets to compute error variances via
the 2CH method. Results are shown in Fig.

The error variances of the simulated biased data sets (bottom row) are quite similar to the error variances estimated with the real data (top row), indicating that biases are largely responsible for the results using real data and how they vary from the 3CH method results.

In this study we compared two methods for estimating the error variances of multiple data sets, the three-cornered hat (3CH) and two-cornered hat (2CH) methods. Using a simulated data set in which we could vary the degree of correlation between two data sets as well as specifying bias errors, we examined the sensitivity of the 3CH and 2CH methods to random and bias errors. For the error model, we added known random or bias errors to 1460 specific humidity profiles (considered truth) obtained from the ERA-Interim reanalysis (ERA) over a subtropical radiosonde (RS) station in Japan. We compared the effect of neglecting various error covariance and other error terms in the 3CH and 2CH methods on the estimated error variances and standard deviations. We also considered the effect of a finite sample size on the estimates by repeating the calculations using a subset of 146 of the total 1460 profiles. We found that the 3CH method was less sensitive to the neglected error terms for various random error correlations than the 2CH method. We also showed that the effect of bias errors on one of the data sets had a relatively small effect on the 3CH method, but produced much larger errors in the 2CH method.

We also compared the 3CH and 2CH methods using real RS, radio occultation (RO) and ERA data. We find that the 3CH method produces more consistent and accurate results than the 2CH method when using real data. The 2CH method produced very different estimates of the error variance of ERA depending on which observational data set (e.g., RS or RO) was used in the comparison. Using an empirical bias model based on observed RS and RO difference from ERA during 2007, we showed that these differences in error variance estimates were likely caused by different biases in the RS and RO data. The effect of bias errors is shown to give unrealistic results using the 2CH method.

Code will be made available by the author upon request.

Data can be made available from authors upon request.

In the 2CH method, sums
and differences of two data sets

Equation (

Defining the following expressions:

We then subtract

By squaring the expression

Similarly, we obtain

We note that the top lines in Eqs. (

Both authors contributed equally to the ideas and conceptual development. The first author computed the results.

The authors declare that they have no conflict of interest.

The authors were supported by NSF-NASA grant AGS-1522830. We thank
Shay Gilpin for Fig.