In this paper we show how multiple data sets, including observations and
models, can be combined using the “three-cornered hat” (3CH) method to
estimate vertical profiles of the errors of each system. Using data from
2007, we estimate the error variances of radio occultation (RO), radiosondes,
ERA-Interim, and Global Forecast System (GFS) model data sets at four radiosonde locations in the
tropics and subtropics. A key assumption is the neglect of error covariances
among the different data sets, and we examine the consequences of this
assumption on the resulting error estimates. Our results show that different
combinations of the four data sets yield similar relative and specific
humidity, temperature, and refractivity error variance profiles at the four
stations, and these estimates are consistent with previous estimates where
available. These results thus indicate that the correlations of the errors
among all data sets are small and the 3CH method yields realistic error
variance profiles. The estimated error variances of the ERA-Interim data set
are smallest, a reasonable result considering the excellent model and data
assimilation system and assimilation of high-quality observations. For the
four locations studied, RO has smaller error variances than radiosondes, in
agreement with previous studies. Part of the larger error variance of the
radiosondes is associated with representativeness differences because
radiosondes are point measurements, while the other data sets represent
horizontal averages over scales of

Estimating the error characteristics of any observational
system or model is important for many reasons. Not only are these errors of
scientific interest; they are also important for data assimilation systems and
numerical weather prediction. In many modern data assimilation schemes,
observations of a given type are weighted proportionally to the inverse of
their error variance

The error variance

The relationship between the apparent error variance

As discussed by

In this paper, we estimate the error variances of multiple data sets using
the “three-cornered hat” (3CH) method

We use five data sets from an entire year (2007) in this study.

We chose 2007 for the year of our study because the number of COSMIC
(Constellation Observing System for Meteorology, Ionosphere, and Climate) RO
observations was near a maximum at this time. Because the primary interest in

The ERA-Interim (hereafter ERA) reanalysis is a global model reanalysis
produced by the European Centre for Medium-Range Weather Forecasts (ECMWF)

We use the ERA analysis product, which assimilates both RS and RO data for the entire year of 2007; hence some correlation of model, RS, and RO errors is likely. However, there are many other observations going into the ERA reanalysis, and model correlations with any one observational data set are likely to be small.

The Global Forecast System (GFS) is a forecast model produced by the National
Centers for Environmental Prediction (NCEP). Data are available for download
through the NOAA National Operational Model Archive and Distribution System
(NOMADS). Forecast products and more information on GFS are available at

The GFS assimilated RS observations for the entire year 2007 but began
assimilating RO data on 1 May 2007, along with many other changes to the
model and analysis system

Number of co-located measurements for

The RO observations used in this study are re-processed data obtained from the UCAR
COSMIC Data Analysis and Archive Center (CDAAC).
Two methods for estimating the temperature and water vapor from the RO
refractivity are used. In the direct method, the GFS temperature is
used in the

A one-dimensional variational (1D-VAR) method is also used to estimate

RS data from Guam and three Japanese stations are used in this comparison. The RS data are given on nine main pressure levels between 1000 and 200 hPa, plus additional levels if atmospheric conditions are variable. The four stations use the following sensors: Guam: VIZ/Sippican B2; Ishigakijima: Meisei; Minamidaitōjima: Vaisala RS92; and Naze: Meisei. They are launched twice daily in the hour before noon and midnight, UTC.

Guam is located in the deep tropics at 13.7

Naze (Kagoshima Prefecture): 28.4

Mina (Okinawa Prefecture): 25.6

Ishi (Okinawa Prefecture): 24.2

The locations of the four radiosonde stations are chosen for the comparisons.
We use RO observations that are located within 600 km and 3 h of the
radiosonde launches. CDAAC provides GFS and ERA profiles that are already
linearly interpolated in space and time to the RO location and time. These
interpolated profiles, along with the RO observations, were corrected for
their time and spatial differences from the radiosonde data using a model
correction algorithm

The refractivity for the radiosonde and model data is computed from
Eq. (

The mean ERA profiles over 2007 at Guam and Mina of specific
humidity

The number of samples is limited by the number of RO observations that are
within the co-location criteria of 3 h and 600 km. Figure

Before showing the statistical comparisons of the normalized differences
between the data sets and their estimated errors, we present the mean ERA
profiles of

We next present a single example of soundings from the five data sets, to
illustrate how the profiles of the normalized differences of the variables
(which we use in all the following calculations) compare to the actual
profiles. Figure

Profiles of specific humidity

Same as Fig.

A comparison of Figs.

As in the apparent-error method, the 3CH error estimates include
representatives errors. Since four of the five data sets considered here are
representative of horizontal averages with a length scale of

In this section we summarize the derivation of the equations relating the
error variances and covariances among the data sets. The complete derivation
and a discussion of the limitations are given in Appendix

The error variance of a variable

As shown in Appendix

We use Eqs. (

The same procedure can be used to derive three equations for estimating the error variances for the other three data sets, RS, ERA, and GFS (equations not shown here).

So for each of the five data sets – RO-direct and RO 1D-VAR, RS, ERA, and
GFS – there are three independent ways to estimate their respective error
variances. This is the three-cornered hat method described in
Appendix

We first compute the estimated error variance for RO refractivity using GFS
and ERA data for comparison with the

The square root of VAR

Standard deviations of the apparent-error SD(RO–ERA) (black line),
estimated RO error SD(RO–true) computed from Eq. (

The results shown in Fig.

This section shows the estimated error variances for

The following plots show the estimated error variances computed from
Eqs. (

Estimated error variances (percent squared) of specific humidity at Mina:

Estimated error variances (percent squared) of relative humidity at Mina:

Figure

Estimated error variances (percent squared) of temperature at Mina:

The RS specific humidity error variance profiles at Mina
(Fig.

The error variance profiles from the two model sets (Fig.

Figure

Figure

The RO 1D-VAR results for temperature from all three equations give somewhat
larger results (Fig.

The RS temperature error variances (Fig.

Estimated error variances (percent squared) of refractivity at Mina:

Mean of the three estimates of error variance plots for

Figure

Figure

It is difficult to find previous results for RS temperature and specific
humidity error variances. However, previous studies comparing RO with RS and
models indicate that our estimates are reasonable and consistent with these
studies.

Normalized differences of zonal mean RO and ERA specific humidity in
the tropics for cloudy conditions

The VAR values in Table

The mean and SD error profiles for Naze, Ishi, and Guam corresponding to the
above results for Mina are presented in Appendix

The estimated error profiles are especially similar for the three Japanese stations. This close similarity may be due primarily to the fact that the three locations are relatively close together and two of the three use the same type of radiosonde (Meisei).

The results from Guam are also similar in general magnitudes and shapes of
the profiles to those from the three Japanese stations, but there are
somewhat greater differences in some of the profiles (e.g., GFS

We used the
three-cornered hat (3CH) method to estimate vertical profiles of error variances
of different observation and model data sets by computing the differences
among the data sets. We computed estimated error
variances of four variables (specific humidity

Although the neglect of the covariance terms affects the results to a
noticeable degree in some of the estimated profiles, there is strong
evidence that there is valid information in the estimated error
profiles that rises above the noise caused by the neglect of the
covariance terms and the limited data sample. This evidence is
summarized as follows:

There is generally good agreement in the three estimated error profiles of the four variables for each of the five data sets at all four locations. It is unlikely that this agreement would occur by chance if the neglected error covariance terms were large enough to invalidate the results, because they would have to somehow combine or cancel in each of the three equations to give the observed similar results.

There are large differences in the overall structure (shape
and magnitude) of the average vertical profiles of estimated error
variances for the five data sets (Fig.

The variability, or spread among the error estimates, is similar at most height levels for specific humidity, relative humidity, and temperature. If the error covariance terms were significant, they would almost certainly vary with height, giving different agreement in estimated error profiles with height. For example, we know that RO temperature and refractivity are most accurate in the upper troposphere and least accurate in the lower troposphere and that the weight given to RO in the models' data assimilation varies significantly with height, being largest in the upper troposphere and smallest in the lower troposphere. Thus one would expect the RO–ERA and RO–GFS error covariance terms to vary significantly with height. Also, the RS errors as well as the ERA and GFS model errors vary with height. It is therefore unlikely that all of the neglected error covariance terms are the same at all heights.

The general structure and magnitudes of the estimated error variance profiles are similar at the four locations. However, there are some small differences among the profiles at the four locations. In general, the differences among the three estimates (indicated by the SD about the mean), which are a measure of the effect of the neglected covariance terms as well as limited sample size, are smallest for Ishi, Naze, and Mina and largest for Guam. Since the three Japanese stations are close together, this suggests that there is a difference in the error variance of the Japanese RS observations compared to the Guam RS observations. There may also be small differences in the model errors over the Japanese stations, which are located in a data-rich area compared to Guam, which is located in a data-sparse region. The largest variability and largest error estimates occur at Guam, which uses a radiosonde that is thought to have large water vapor biases due to sensor malfunctions (Holger Vömel, personal communication, 2017).

The magnitudes of the estimated RO refractivity error variances are
supported by previously published studies, including

The estimated errors are smallest for the ERA-Interim
model data set, which is a reasonable result since ERA uses an
excellent model and data assimilation system that assimilates many
independent, quality-checked observations. In fact,

Our results show, in general, that the RO observations have
smaller errors than the radiosonde errors, in agreement with
previous studies. This difference is in part due to
representativeness errors associated with the RS, which are point
measurements while the other data sets are representative of
horizontal averages with a length scale of

Code will be made available by the author upon request.

Data can be made available from authors upon request.

In this appendix we summarize the 3CH method

Variations and enhancements of the 3CH method have been applied to
many diverse geophysical data sets. The 3CH method has been used to
estimate the stability of GNSS clocks using the measured frequencies
from multiple clocks

Closely related to the 3CH method is the triple-collocation (TC)
method, which was introduced by

The major assumption in the 3CH and TC methods is that the unknown errors of
the three systems are uncorrelated. Correlations between any or all of
the three measurement systems will reduce the accuracy of the error
estimates. Other factors that can reduce the accuracy of the error
estimates include widely different errors associated with the three
systems or a small sample size. These factors can lead to negative
estimates of error variances, especially when the estimates are close
to zero

In this section we summarize the derivation of the 3CH method as applied to
four meteorological data sets, RO, RS, GFS, and ERA. The error variance of a
variable

In the estimation of the error variances for the four data sets, we assume
that the RO errors and ERA errors are uncorrelated, so the error covariance
term in Eq. (

As noted by an anonymous reviewer, it is possible to derive infinitely many
linearly dependent equations by combining Eqs. (A8)–(A10) in different ways
by forming combinations of the form

If all the neglected COV

We also note that the error estimates contain any representativeness errors
caused by the different data sets representing different scales of
atmospheric structure

While it is not the intent of this paper to do a thorough comparison of the
3CH and TC methods, which are introduced above, in
response to a reviewer's comment we compared the two methods on a subset of
our data sets. A difference between the 3CH and TC method is that the TC
method corrects for additive and multiplicative biases among the three data
sets, as discussed by

In our application of the TC method we use the following combinations of data
sets: (ERA, RO, RS), (ERA, RO, GFS), and (ERA, GFS, RS). For the RO we
use two RO retrievals, the direct and 1D-VAR (see Sect.

The results of the specific humidity error variance estimates for
RO

Estimated RO and RS error variances for specific humidity at
Minamidaitōjima (Japan) using calibrated data as in the TC
method

Estimated ERA and GFS specific error variances for ERA at
Minamidaitōjima (Japan) using calibrated data as in the TC
method

Mean and standard deviations (shading) of the three estimates of
normalized specific humidity using RO-direct and RO 1D-VAR at

Mean and standard deviations (shading) of the three estimates of
normalized relative humidity using RO-direct and RO 1D-VAR at

Mean and standard deviations (shading) of the three estimates of
normalized temperature using RO-direct and RO 1D-VAR at

Mean and standard deviations (shading) of the three estimates of
normalized refractivity using RO-direct and RO 1D-VAR at

RA formulated the overall idea of this work, and TR performed all the calculations and contributed significantly to the discussion of the results.

The authors declare that they have no conflict of interest.

We acknowledge with thanks the insightful comments and advice on this study from Ian Culverwell and John Eyre (Met Office), Shay Gilpin (UCAR COSMIC), Sean Healy (ECMWF), Adrian Simmons (ECMWF), and Sergey Sokolovskiy (UCAR COSMIC). We thank the three anonymous reviewers for their constructive comments. Anthes and Rieckh were supported by NSF-NASA grant AGS-1522830. We thank Eric DeWeaver (NSF) and Jack Kaye (NASA) for their long-term support of COSMIC. Edited by: Ad Stoffelen Reviewed by: three anonymous referees