Remote sensing of atmospheric state variables typically relies on the inverse solution of the radiative transfer equation. An adequately characterized retrieval provides information on the uncertainties of the estimated state variables as well as on how any constraint or a priori assumption affects the estimate. Reported characterization data should be intercomparable between different instruments, empirically validatable, grid-independent, usable without detailed knowledge of the instrument or retrieval technique, traceable and still have reasonable data volume. The latter may force one to work with representative rather than individual characterization data. Many errors derive from approximations and simplifications used in real-world retrieval schemes, which are reviewed in this paper, along with related error estimation schemes. The main sources of uncertainty are measurement noise, calibration errors, simplifications and idealizations in the radiative transfer model and retrieval scheme, auxiliary data errors, and uncertainties in atmospheric or instrumental parameters. Some of these errors affect the result in a random way, while others chiefly cause a bias or are of mixed character. Beyond this, it is of utmost importance to know the influence of any constraint and prior information on the solution. While different instruments or retrieval schemes may require different error estimation schemes, we provide a list of recommendations which should help to unify retrieval error reporting.

Observations from remote sensing instruments are central to many studies in
atmospheric science. The robustness of the conclusions drawn in these studies
is critically dependent on the characteristics of the reported data, including
their uncertainty, resolution, and dependence on any a priori information.
Adequate communication of these data characteristics is therefore essential.
Further, when, as is increasingly the case, observations from multiple sensors
are considered, it is important that these characteristics be described in
a manner that allows for appropriate intercomparison of those characteristics
and the observations they describe. In the satellite community, however, the
definition of what constitutes “adequate communication” is far from uniform.
Currently, multiple retrieval methods are used by different remote sounding
instrument groups, and various approaches to error or uncertainty estimation
are applied. Furthermore, reported uncertainties are not always readily
intercomparable. For example, the metrics used as uncertainty values for a data
set might not be properly defined (as, say,

This paper discusses these issues and proposes a common framework for the appropriate communication of uncertainty and other measurement characteristics.

This review has been undertaken under the aegis of the Towards Unified Error Reporting (TUNER) project and was carried out by retrieval experts from the atmospheric remote sensing community (including active participation from eight different instrument science teams), who have come together to tackle the (arguably daunting) goal of establishing a consensus approach for reporting errors, hopefully enabling more robust scientific studies using the retrieved geophysical data products. This review paper, the first “foundational” paper from the TUNER team, is mainly addressed to the providers of remotely sensed data. Major parts of this work have been carried out from the perspective of passive satellite-borne limb-sounding and occultation observations, which accounts for a bias of the examples presented towards these techniques. The underlying theoretical considerations, however, should be applicable to a wider context. A paper addressed to the data users, guiding them through the correct use of the uncertainty information, is currently being written (Livesey et al., 2020).

Most concepts presented in this paper rely on the assumption that providing the user with the result of the retrieval, a measure of estimated error or uncertainty along with correlation information, and sensitivity to possible a priori information used is sufficient for most scientific uses. In other words, there is no need for more detailed discussion of the expected distribution of the retrieved values around a true value (or around the expectation value of the retrievals) to be provided. That said, we recognize that they might be useful for some specialized quantitative applications.

The well-informed reader will already be acquainted with most of the material
in this paper, although those less familiar with retrieval algorithms may find
it a useful introduction. Firstly we list conditions of adequacy of the
reporting of error and uncertainty (desiderata), which summarize the information that should be provided to the data user
(Sect.

With the ultimate goal of presenting a list of recommendations to the community of data providers, we first discuss a list of desired properties of diagnostic metadata from the point of view of a data user. By diagnostic metadata we mean error or uncertainty estimates and all information on the content of a priori data, spatial resolution, and the like. The list of possible metadata to characterize retrievals of atmospheric state variables is huge, but some of them are more useful than others. Here we define conditions of adequacy (CoA) for error and uncertainty reporting. These conditions will be used as criteria for which metadata are indeed essential and should thus find their way into the recommendations.

The error estimates should be intercomparable among different instruments, retrieval schemes, and/or error estimation schemes.

The estimated errors should be independent of the vertical grid in the sense that correct application of the established error propagation laws to the transformation of the data from one grid to another yields the same error estimates as the direct evaluation for a retrieval on the new grid would do. For characterization data not fulfilling this criterion, means should be provided for transformation from one grid to another.

The error budget and characterization data shall contain all necessary information needed by the data user to use the data in a proper way. The error budget shall be useable without detailed technical knowledge of the instrument or retrieval technique. This enables the data user to correctly apply error propagation laws and calculate uncertainty in higher-level data products.

The error analysis shall be traceable in a sense that all relevant underlying assumptions are documented.

In principle the error estimates should be empirically validatable. Empirical validation is achieved via comparison between independent measurements because the true values of the atmospheric state are unknowable. We consider error estimates to be empirically adequate if differences between independent measurements can be fully explained by the proper combination of their error bars, natural variability in the case of less-than-perfect collocations, different resolutions in time and space, and different amounts of possibly different prior information.

The data volumes associated with this reporting should be reasonable. This is particularly important because involved matrices (e.g., covariances and averaging kernels) exceed the data volume of the data themselves by orders of magnitude.

These conditions of adequacy comply in part with the principles issued by the
Quality Assurance Framework for Earth Observation (QA4EO) task team (2010)

Unification of error reporting is only achievable if at least a minimum agreement on terminology and the underlying concepts is achieved. Most of the terms used are largely self-explanatory and are introduced in the following sections. There are, however, two troublesome terminological issues. One consists of the dispute as to whether “estimated error” and “uncertainty” relate to the same concept and, if not, which concept is appropriate. The other is related to the exact connotation of these terms with respect to the underlying methodology. In the following, both issues will be briefly discussed.

A particularly troublesome terminological issue is the use
of the term “error” and the concept behind it. Given that the Joint
Committee for Guides in Metrology (JCGM) and the Bureau International
des Poids et Mesures (BIPM) aim to replace the concept of error
analysis with the concept of uncertainty analysis (Guide to the expression of uncertainty in measurement (GUM),

The GUM-stipulated framework, however, does present a dilemma when
seeking to unify terminology in the TUNER arena. On the one hand, we
are not in favor of brushing away the common interpretation whereby the term “estimated error” is used for a statistical quantity
that reflects the difference between the true value and the value
inferred from the measurement. It remains to be seen whether the new
terminology stipulated by the

For the purposes of the following discussion we define “error” as the difference between an unknown truth and a value inferred from measurements. “Uncertainty” describes the distribution of an error. This can be summarized with metrics such as the total squared error, which can be decomposed into systematic and random components that are reflected by bias and variance. We will often use the word “error” as a part of composite terms, (e.g., “parameter error”, “noise error”, “retrieval error”, “estimated error”). When we use a composite containing the term “error”, this does not imply that the uncertainty interpretation is excluded, and conversely, when we use a composite term containing the term “uncertainty”, this does not imply that the error interpretation is excluded. The use of the term “error” as a generic term in the sense of “measurement noise causes an error in the inferred quantity” is probably uncontroversial and can be accepted by both adherents of the error concept and adherents of the uncertainty concept.

We think that no particular terminology is per se better than another one, as long as it is clearly defined. Instead of further fueling the terminological conflict, we try to concentrate on the content and to lay down an error-reporting framework tailored to remote measurements of atmospheric temperature and constituents that is more detailed and specific than most of the previous literature.

Regardless of whether one prefers to call the estimated retrieval
error “uncertainty” or the uncertainty of the measurement “estimated
error”, there are still two different ways to evaluate this
quantity. One relies on generalized Gaussian error propagation or,
particularly in grossly nonlinear problems, on sensitivity studies,
either as case studies or in a Monte Carlo sense. Uncertainties of
input quantities are propagated
through the data analysis system to yield the uncertainties of the
target quantities. The other way relies on a statistical analysis of
the results, e.g., by comparison to other observations. Many different
terms are commonly used to distinguish between these different
approaches. In

Measurements – also most so-called direct measurements – invoke
inverse methods. The only exception is a direct comparison where the
measurand is directly accessible via human sensation, like length
measurement by comparison with a yardstick or determination of color by comparison with a color table. The inverse nature of most measurements is due to the fact that the measurand

In the macroscopic world, exempt from quantum processes, the measured effect
is thus, for given conditions, a deterministic unambiguous function of
the measurand. While microscopic processes can admittedly be indeterministic,
their statistical treatment for ensembles of sufficient size leads to
deterministic laws. Irreducibly non-deterministic components contribute to
the measurement noise. In contrast, the conclusion from the measured signal

In some cases, the inverse process can be quite trivial, e.g., in the
case of a temperature measurement with a mercury thermometer. The
causal process is the thermal expansion of mercury, and the inverse
conclusion goes from the volume of the mercury to the ambient
temperature. The scale of the mercury thermometer is simply a
pre-tabulated solution of the inverse process for various temperatures. In other applications, such as remote sensing of the
atmosphere from space, the inverse process is slightly more
complicated because an explicit

Remote sensing of the atmospheric state from space relies in one form
or another on the radiative transfer equation

Roughly following the notation of

Typically

The first publication of a least squares method was actually by

See below for a deeper discussion of this term.

See below for a deeper discussion of this term.

One major difference between our notation and Rodgers' notation refers to the
error covariance matrices

By explicitly assuming equally distributed, i.e., uniform prior, state values,

Normal distribution and Gaussian distribution are the same. The term “normal distribution” was probably coined by Karl E. Pearson in 1893. While this term evades the question of priority in its discovery, it “has the disadvantage of leading people to believe that all other distributions of frequency are in one sense or another

If the inverse problem is underdetermined (

If we represent the best known a priori statistics about the targeted
atmospheric state as

This equation, however, has a Bayesian interpretation only if the variability
of the atmospheric state is fairly well covered by a Gaussian probability
density function. To characterize the variability of highly variable trace gases, a log-normal probability density function can be more adequate. It avoids, e.g., that non-zero a priori probability densities are assigned to negative mixing ratios. Technically, this is achieved by using
Eq. (

For brevity, we define the gain matrix

Tables

Application of Eqs. (

At least on macroscopic scales, atmospheric state variables are construed as continuously varying in space and time. In the retrieval equations they are, however, represented by vectors with a finite number of elements. A frequent discretization is the representation of the atmospheric state at a limited number of grid points. The profile shape between these grid points depends on the interpolation scheme chosen. Often profiles are conceived as piecewise linear. The finite grid can be conceived as a surrogate regularization because it places a hard constraint on the shape of the profile between two grid points. If the discretization is too fine, a stronger regularization is needed to fight ill-posedness of the inversion, while a too coarse discretization can cause errors in the radiative transfer modeling and limits the spatial resolution of the solution. Also, the abrupt gradient changes tend to be more and more unphysical the coarser the grid is. In a maximum likelihood retrieval, the grid width is identical to the theoretical spatial resolution of the retrieval. However, if the grid width is chosen too fine, the useful resolution of the maximum likelihood retrieval will be much worse because the fine structures of the profile will be masked by the noise.

Alternatively, vertical profiles can be conceived as a set of layers, each represented by layer averages of atmospheric state values and/or partial column amounts of species. In this case no assumptions about the profile shape between the layer boundaries are obvious, but they would be implicit because the details of the averaging may depend on them.

In this context we note that the atmospheric state does not necessarily
need to be represented as vertical profiles where each
element of

Typically in real-world applications, the measurement error

This issue seems to be of particular importance when observation error covariance matrices are built in contexts where a data assimilation scheme uses radiance
measurements instead of retrieved state variables, as suggested by

While the measurement typically depends on a large number of geophysical state variables, only a few of them are actually dealt with as unknowns. The other variables are assumed to be known and are dealt with as constant parameters. For example, in an ozone profile retrieval the atmospheric temperature profile may be assumed to be known and thus not be included in the retrieval vector

Practical reasons typically force one to decompose the inverse problem, e.g., to reduce the size of the problem in order to achieve numerical efficiency. Often a part of the measurement is virtually insensitive to some of the atmospheric state variables. The general idea of decomposition is to isolate subsets of the entire set of measurements that are mainly sensitive to only a subset of the unknown variables. This decomposition can be made according to spectral or geometrical criteria (see below).

Decomposition of the inverse problem can be done either in an “optimal” or in a “non-optimal” way. The optimal decomposition solves the inverse problem sequentially, where at each step the retrieval is made for the full

More frequently used is non-optimal decomposition. Here the relevance
of some components of the state vector for the measurements is
temporarily ignored, and the retrieval solves the inverse problem only
for a part of the state values, using only a subset of the
measurements. This approach lends itself to problems where it is
adequate to assume that the Jacobian matrix

Not all spectral grid points or channels of a spectrometer or a multi-channel radiometer are equally sensitive to all unknown
variables. For example, the subset of the measurements used to
retrieve the ozone concentration may be insensitive to the
concentration of water vapor

Spectral decomposition is also often used for the retrieval of a single
species. For example,

In the case of nadir sounding, lines of sight referring to different ground pixels cross different parts of the atmosphere and can thus be analyzed independently without sizeable loss of information. In optical limb
sounding of the Earth's atmosphere, first suggested around the same time by

If the same air parcel is seen under multiple geometries, the measurements have a tomographic nature. Since the simultaneous retrieval of all these intertwined measurements easily exceeds available computational resources, often only a subset of the measurement geometries is analyzed in one step.

More specifically, the algorithm can be constructed such that only
a subset of the measurements is needed to retrieve the atmospheric state corresponding to a given subset of the state vector elements that
affects signals along the ray path of the considered measurement. The two most prominent examples are single-profile retrievals and onion peeling. Typical approaches to decompose the entity of measurements geometrically
are listed in Table

Satellite data processors: limb geometry (emission, occultation, and scattering).

Satellite data processors: nadir sounders.

In some cases, the geometric profile reconstruction is decoupled from the
spectral inversion. In order to gain numerical efficiency, the inversion can
be performed in sequential steps. Such an approach is realized for GOMOS
two-step inversion, which decomposes the retrievals into the spectral inversion followed by the vertical inversion using the concept of effective cross sections

Optimal decomposition techniques formally retrieve all relevant
variables

The vast majority of limb-sounding retrievals assume local spherical
homogeneity of the atmosphere, i.e., considering only vertical variations in the atmospheric state around the line of sight and neglecting horizontal variability

For limb measurements,

Most other limb-sounding retrieval schemes use the spherical homogeneity approximation, although this approach can be challenged for limb
sounders. For example,

In the case of nadir sounding at mid-infrared and longer wavelengths, single-profile or column density retrievals seem to be the natural thing to do, since a ray path associated with one geolocation intersects each altitude level only once. However, in the UV-VIS spectral range, where backscattered solar light is the source of the radiation, multiple scattering along with strong inhomogeneities in the surface reflection or cloud coverage might cause some interplay between the neighboring pixels.

One specific geometrical decomposition applied to nadir observations is the
retrieval of tropospheric column densities. Since the ray path also travels through the stratosphere, knowledge on the stratospheric column is needed to
model the measured signal correctly.

In the “onion peeling” approach

We are saying “quasi-triangular” here because, due to over-determination at each tangent altitude,

In the early era of limb sounding and solar occultation measurements,
onion peeling was the workhorse data analysis algorithm and was used, among others, in the following missions: LIMS

Approaches related to onion peeling are the Mill–Drayson method

The interleave method decomposes the limb scan into multiple disjoint subsets of measurements, e.g., such that one set contains the tangent altitudes with even numbers and the other those with the odd numbers. For each subset of measurements an independent onion-peeling retrieval is performed. Finally both the resulting profiles are merged to give one profile. The goal of this method is to get rid of the onion-peeling oscillations, which is achieved by having thicker layers and thus better sensitivity – at the cost of degraded vertical resolution – in each retrieval step. The interleave method has been used, e.g., for HALOE and SABER.

As will be seen below, rigorous error propagation for onion-peeling retrievals and its variants is tedious and thus rarely performed. Instead, Monte Carlo type sensitivity studies can be performed on the basis of simulated measurements superimposed with artificial noise, which are analyzed using the onion-peeling scheme. The error estimate is then provided by the variance of the ensemble results around the reference value at each altitude.

The Chahine relaxation method

Essentially, the measurement and state vectors have to be constructed in a
way that for each of its components the following linear relationship can be
considered an acceptable approximation:

In the original approach of Chahine, the measurement vector

The Chahine relaxation method is a nested iteration of the type

Similarly to the onion peeling, rigorous error propagation for the Chahine relaxation method is challenging, and the same approach as suggested for the onion-peeling method can be used instead.

The characteristic feature of differential optical absorption spectroscopy (DOAS) is that the information on the target quantity

When the DOAS principle is applied to limb measurements, data analysis
can be performed using a formalism such as that presented in
Eqs. (

Total column retrievals from nadir measurements can also be carried out in one step. In these approaches the total column is directly retrieved by fitting a forward-modeled differential spectrum to an observed differential spectrum. An example of these approaches is the weighting function DOAS (WFDOAS, e.g.,

More often, however, the retrieval is decomposed into a two-step retrieval

The radiative transfer equation is nonlinear. This problem can be remedied
by putting the retrieval equation used in an iterative context, e.g.,

To avoid seeking an

Many inverse radiative transfer problems are only “moderately
non-linear”

The issues discussed above still assume a nonlinear forward model, and only in the iterative inversion scheme is the forward model approximated by its tangent. If, however, the atmosphere is fairly transparent in the frequency range chosen, linear radiative transfer is justified, and the contributions of different atmospheric constituents become additive

There are multiple categories of errors and uncertainties in atmospheric state variables retrieved from satellite measurements. These are

errors caused by less-than-perfect measurements, which include measurement noise and calibration errors, and a less-than-perfect characterization of the instrument by the instrument model,

errors caused by inaccuracies of the radiative transfer model used in the data analysis, which include numerical approximations, missing physical processes, or uncertainties in the values used as constants by the model, particularly spectroscopic parameters,

errors caused by decomposing the inverse problem, giving rise to parameter errors, and

errors caused by the constraint applied to the retrieval, which does not allow the retrieval to produce the solution that is best compatible with the measurements.

In remote sensing a number of processing steps are necessary to obtain a calibrated signal in physical units from the raw data. The latter are usually referred to as the Level-0 data. Their units depend on the instrument type, and the related quantities can be detector voltages, photon counts, or similar. Level-1 processing transforms the Level-0 data into calibrated measurement data, which no longer depend on the particular measurement device used, such as radiance units or transmission. These are conventionally referred to as Level-1 data. If multiple processing steps are required, distinctions can be made between Level-1a, Level-1b, etc. data, but this distinction is of no relevance here. These Level-1 data come with auxiliary data describing the geolocation and time of the measurement, the measurement geometry, and so forth. The Level-1 data are the input to the retrieval of the atmospheric state. Estimates of the atmospheric state variables are referred to as the Level-2 data product. We use a convention that all uncertainties in the Level-1 data – including metadata – fall into the category “measurement uncertainties”. The main sources of measurement uncertainties include but are not limited to measurement noise, including discretization noise; zero calibration error (i.e., that the measurement signal is non-zero even though the true radiance signal is zero, which can be understood as an additive calibration error); gain calibration (this is a multiplicative calibration error); higher-order errors (e.g., nonlinear detector response); uncertainties in auxiliary data, such as measurement geometry in terms of tangent altitude or the exact time of the measurement; and stray light. Further, all these errors can be subject to a drift; i.e., there can be some time dependence.

Unless explicitly mentioned otherwise, we apply linear theory to error
estimation. This leads to generalized Gaussian error propagation of the type

Although often “measurement noise” is conceived as all errors which are
uncorrelated in successive measurements, we use a narrower definition. In
our terminology, noise encompasses only the statistical uncertainty of
the measured signal caused by the indeterministic or unpredictable nature of
radiative processes in the atmosphere or the instrument. Measurement noise
is described by the error variance of each single spectral data point. The
uncertainties are considered uncorrelated between the single components of the measurement vector, which implies a diagonal
noise covariance matrix. In some cases, however, the measurement noise
covariance matrix

According to generalized Gaussian error analysis, the mapping of
measurement noise

Error correlations between the elements of

For some instruments the error estimate is based on the analysis of
the residuals between the measurements and the best-fitting modeled spectrum.

Non-optimal decomposition of the inverse problem, such as single-profile retrieval or single-species retrieval (Sect.

The mapping of measurement noise into the retrieval domain depends on the
retrieval approach chosen. Naturally, noise has a larger effect when
regularization is kept small in order to get the best possible spatial
resolution, because noise and resolution are competing quantities. However, there are also other choices in the retrieval scheme which have bearing on the measurement noise as evaluated above. In the ideal case, when the retrieval vector represents the entire atmospheric state with all its relevant variables,

In the context of error propagation in the Levenberg–Marquardt algorithm (Sect.

If the Levenberg–Marquardt algorithm is used only to dampen each iteration step and the iteration is only truncated after full
convergence has been reached, then the

Sometimes the Levenberg–Marquardt iteration is intentionally stopped before full convergence is reached. The rationale is to use
the regularizing characteristics of the

Besides measurement noise, calibration uncertainties also contribute
to the measurement error

Among the satellite missions considered here, the following schemes to
assess the zero-level calibration error are in use, or at least
possible.

Propagation of the assumed zero-level calibration error in the retrieved target quantity

A zero-level correction is jointly fitted along with the target variables. In this case, this error component does not need to be assessed separately but is automatically included in the noise-induced error, at least if no constraint is applied to the zero-offset correction. Since this additional fit variable tends to destabilize the retrieval, noise-induced errors will become larger. This approach has been chosen for MIPAS-IMK, Odin/SMR, and some of the MLS data products.

The zero-level uncertainty is added as a fully correlated component to the measurement error covariance matrix

The zero-level uncertainty is deemed negligibly small and thus is not evaluated. This approach has been chosen by SAGE I, SAGE II, SAGE III, SCIAMACHY, ACE-FTS, and OMPS LP.

Similar arguments hold for the gain calibration uncertainty, and in theory the same methods can be applied. In emission spectroscopy, however, gain calibration uncertainty is much harder to distinguish from concentration
changes in the target species or temperature changes than offset calibration. For MIPAS-IMK the linear mapping method is used. By contrast, for many limb-scatter retrievals a normalization with respect to a higher tangent height is done. As a result, the gain correction,

Occasionally, application of Eq. (

Another issue is frequency calibration. A spectral shift translates into a radiometric error that is highly correlated across the spectral line. The impact of such an error on the retrieval result is highly dependent on the retrieval setup and the selection of microwindows. A spectral shift correction can be jointly fitted with the target variables as it can be done in the framework of the zero-level correction. Residual frequency calibration errors after correction are still an issue of the Level-2 error budget. Since the radiometric error induced by a spectral calibration error is antisymmetric to the line center, its effect on the retrieval results will be different when the microwindow contains only part of the line.

For Odin/SMR and MIPAS a frequency offset is fitted as a scalar value
characterizing a complete limb scan. Where necessary, for SCIAMACHY and OMPS
(IUP Bremen), in addition to the Level-1 correction from ESA or NASA,
respectively, a spectral
shift/squeeze correction is determined during the pre-processing step by
performing spectral fits for each line of sight and spectral window
individually. IUP-DOAS and BIRA retrievals also use a shift/squeeze correction.
For TES, the frequency calibration is performed as part of the
Level-1B processing and is not included in the error covariances supplied with
the Level-2 product. OMPS LP depends on the well-characterized Fraunhofer structure in the solar spectrum to establish and maintain its spectral
registration

Under instrument characterization errors we subsume incorrect estimates of measurement noise, instrument line-shape errors (uncertainties in the spectral response function of the instrument), uncertainties in the field-of-view characterization, and so forth. Which of the error sources in this category are relevant depends on the particular instrument under assessment.

Wrongly estimated instrument noise will not only lead to incorrect error
estimates, but will also directly affect the results. The reason is 2-fold. First, each element of the measurement vector

The preflight characterization of the spectral response function of the instrument typically relies on a monochromatic signal. Once in space, narrow spectral lines can be used to determine possible drifts in the instrument spectral line shape.

Depending on the field-of-view width and a shape of the response function, the field-of-view characterization can be of crucial importance for limb-scatter sensors, because the limb-scatter radiance varies by more than 5 orders of magnitude between tangent altitudes of 0 and 100 km. In this case, small errors in the field-of-view characterization may lead to large errors in the measured limb radiances at higher tangent altitudes. Also, limb-scanning emission and solar occultation measurements show a sizeable sensitivity to field-of-view uncertainties.

A number of instrument-specific Level-1 issues for nadir-viewing UV/Vis instruments are discussed in

Less-than-perfect correction of such instrumental issues leads to instrument characterization errors. These are, if at all, typically evaluated using linear mapping.

We understand auxiliary data errors to refer to quantities that come along
with the measurement data but are not usually thought of as part of the

In limb sounding, pointing errors propagate to the result for various reasons. Depending on the design of the retrieval scheme, different mechanisms may play a role. For example, the amount of air seen along the line of sight and the atmospheric state variables depend crucially on the tangent altitude. In the case of vertical gradients of atmospheric state variables, the assignment of a value which is per se correct to an erroneous altitude causes an error. Occultation measurements using the Sun as a background radiation source can depend on which part of the solar disk is seen by the instrument. The residual pointing error to be considered in the error estimation depends on the pointing correction schemes applied. For MIPAS-IMK limb emission measurements, the first step of the retrieval chain is the simultaneous retrieval of temperature and tangent altitudes

Other auxiliary data whose uncertainties need consideration are air density
profiles from external sources used for the conversion from pressure vertical
coordinates to geometrical height coordinates or vice versa as well as the conversion of mixing ratios to partial densities or vice versa

The radiative transfer model used in the retrieval solves the radiative transfer equation and usually involves an instrument model which makes the signal comparable to what the instrument would see. Depending on the instrument type, the instrument model will include the integration of the radiance field over the finite field of view, the convolution with the spectral instrument response function, etc.

A lot can go wrong in radiative transfer modeling, as our knowledge of related processes can be erroneous or inaccurate. Some known physics, such as non-local thermodynamic equilibrium, line coupling, or more sophisticated than usual line-shape functions, may be disregarded for reasons of computational efficiency. Time constraints can also lead to numerical integration being performed with limited precision or weak spectral transitions being ignored. The goal in formulating the radiative transfer model is to keep model errors from known sources much smaller than the measurement error while maintaining computational efficiency. Naturally, any unknown sources of model error are the hardest to quantify. In the following, the most relevant types of known model errors are discussed.

Some relevant physical processes included in

Critical issues in ultraviolet or visible remote sensing are scattering and polarization. Different levels of sophistication of models refer to the treatment of sphericity of the atmosphere and orders of scattering accounted for.

If a complete model is available but not used for the operational retrieval for reasons of computational efficiency, the effect of the missing processes can be assessed via sensitivity analyses based on the complete model and considered in the error budget. If the error is of a systematic nature, the related bias can even be corrected for, and only the residual scatter begs consideration in the error analysis.

In stellar occultation, the forward model for retrievals of trace gases from UV/Vis measurements does not include the deterministic description of stellar spectra perturbations due to scintillations. This omission is not only due to
complicated description of wave propagation in random media, but also to a stochastic nature of small-scale air density irregularities generated by
small-vertical-scale gravity waves and turbulence. These perturbations can be, however, characterized and added as an additional, correlated in wavelength component to the measurement noise, as shown in

If no complete model is available, then it can only be hoped that the related error is sufficiently small compared to the other error sources so that it has no bearing on the total error budget.

Not all effects of radiative transfer are always modeled according to their physical causes. Often it is more efficient to parameterize some effects and to add related parameters to the list of fit variables, i.e., to include them in the

The numerical solution of the radiative transfer equation requires a lot of integration, e.g., to integrate the spectral radiances over the field of view based on a finite number of so-called pencil beams; the spectral grid on which the radiative transfer is calculated has a finite width; radiative transfer through the atmosphere is in most models based on a finite number of layers or levels, just to name a few. Any improvement of computational accuracy goes along with increased computational effort. For most satellite data processors, the setting is chosen in a way that these issues produce a retrieval error which is so small compared to the leading error sources that it can be ignored in the error budget.

The main constants of relevance here include spectroscopic data, quenching rates, and refractive indices. The values of other constants (radius of the Earth, gas constant, molecular weights, etc.) are known at an accuracy which renders analysis of related retrieval errors unnecessary. Estimation of the impact of spectroscopic errors poses some serious problems.

A major problem in the propagation of spectroscopic data errors is that, in some cases, no uncertainties of cross sections are available. Also, when they are available, information on error correlations is not provided. If a retrieval uses, say, a large number of ozone lines, it would be of utmost importance to know whether errors in the intensity of these lines are correlated (e.g., because the uncertainties are attributed to uncertainties in the gas amount in the cell used in the lab where the spectroscopic parameters were measured) or uncorrelated (because errors are dominated by noise in the lab measurement or because the spectroscopic information stems from different lab measurements). In the uncorrelated case the errors would randomize, while in the correlated case they would fully survive the error propagation for a retrieval using multiple spectral lines.

To exemplify another issue, consider a gas-wise sequential retrieval
where

The usual way to estimate the propagation of spectroscopic data errors is to conduct sensitivity studies with perturbed spectroscopic data. Since, as stated above, the correlations between spectroscopic data errors are unknown and not reported in commonly used spectroscopic databases, these sensitivity studies render only a crude estimate of the related retrieval error.

In the case of retrievals of trace gas abundances, one might argue that
uncertainties of the line intensity can be mapped directly onto the
target concentration retrieval. Because both the line intensity and
abundance appear reciprocally in the exponent of Beer's law, the
nonlinearity of the radiative transfer equation has no bearing on the line intensity error propagation. It has, however, been shown that
it is not sufficient to restrict related error analysis to the line
intensities. For example, pressure broadening has a sizeable effect
in the infrared and microwave regions

The propagation of uncertainties of model constants follows the
same formalisms as proposed for uncertainties in atmospheric parameters
(Sect.

In the NASA ACOS/OCO-2 and OCO-3

We define parameter errors as those errors originating from the decomposition
of the full retrieval problem such that a part of the atmospheric state is
assumed to be already known and thus not included in the retrieval vector

The impact

If parameter

Depending on the source of the information on the parameter vector – climatology, preceding retrieval step, independent measurements, or whatsoever – the parameter errors can be correlated or uncorrelated in space and time.

Occasionally errors are of a mixed nature, e.g., if a quantity is jointly
retrieved along with the target quantity but strongly constrained. In this
case, the parameter actually is part of the retrieval vector

In onion peeling (Sect.

Alternatively, the onion-peeling retrieval error can be estimated using a Monte Carlo method. For the solution profile

In order to avoid wording that is too abstract, we assume that the retrieval vector represents vertical profiles of atmospheric state variables. However, with some adjustments the mathematical concept is applicable to 2D or 3D fields of atmospheric state variables as well. The framework is also applicable to column retrievals. In this case, the retrieval vector has only one element.

By performing regularized retrievals invoking Eq. (

The discrete averaging kernel presented above is only an approximation because it describes only the response to perturbations of the true atmosphere which can be represented in the discretization chosen. In the true atmosphere perturbations can occur on much finer scales, and, strictly speaking, the averaging kernel is a continuous function. An averaging kernel in a coarse discretization will not allow one to restore the averaging kernel on any finer grid.

If a joint retrieval of profiles of multiple different quantities is made, the above refers to the diagonal blocks of the averaging kernel matrix which refer to the quantity under consideration. The presence of non-negligible off-diagonal blocks indicates a significant interference between the species introduced by the regularization scheme.

The averaging kernel of a fully converged Levenberg–Marquardt retrieval equals that of the respective retrieval without the

When

Conversely, the derivative of the retrieved state with respect to the a
priori information is

Here a notation-related caveat is in order.

Usually regularization will entail that the retrieved state

As stated above, a retrieval can be understood as a smoothed estimate of the
truth or an estimate of the smoothed truth. In the first case, any deviation
between the estimate and the truth which is caused by the regularization of
the retrieval has to be included in the error budget.

While in principle this formulation is a direct consequence of generalized
Gaussian error propagation, the inclusion of the smoothing error in the
reported error budget has been critically discussed by

Since interpolation of profiles to other grids is a standard operation, it is not advisable to include the smoothing error in the error budget without a caveat. Instead, the averaging kernels should be communicated to the user, allowing them to evaluate the smoothing error on the final working grid.

In this context it should be mentioned that error estimates
according to Eq. (

Further,

Not all applications of a retrieval scheme of the type in Eq. (

A particular problem is the evaluation of the smoothing error difference
(occasionally, perhaps more adequately, called “smoothing difference error”) of a pair of measurements. For this purpose,

The criticism of the smoothing error as formulated above
(following Eq.

With the interpretation of the retrieved state as an estimate of the
smoothed truth, we accept that measurements can only provide a
finite-resolution representation of the truth and do not consider
this an error component of the measurement (not to mention the philosophical problems associated with what an infinitely resolved
atmospheric state shall be; see

If the contrast in resolution is large enough to consider the better-resolved measurement to be both practically ideal compared to the other one
and practically free of a priori information, then it is common practice
to apply the averaging kernel of the coarser-resolved measurement to the better-resolved measurement (see Sect.

Here index 1 refers to the better-resolved measurement and index 2 to the coarser-resolved one. The other indices are self-explanatory. To the best of our knowledge, this approach was first suggested by

Problems occur when linear theory is no longer adequate to
describe the problem. For example, Bernd Funke has, during the preparation of

In this context another caveat is in order: averaging kernels usually
depend on the units in which the atmospheric state is expressed. For
example, averaging kernels evaluated for volume-mixing ratios must not be applied to number density profiles. Some authors prefer to use so-called
“fractional averaging kernels” instead, which refer to the relative
instead of absolute change in the state variable and are thus unit-independent

Often the full information contained in the averaging kernel is summarized in simpler terms. The most important simple diagnostics that partially describe the content of the averaging kernel are vertical resolution, information displacement, and measurement response. We first discuss the concept of vertical resolution.

Vertical resolution of the retrieval, not to be confused with the vertical
sampling implied by the tangent altitude increments or the instantaneous field
of view of a limb sounder, describes the ability to distinguish separate
features in a vertical profile. It cannot be better than the width of the
vertical retrieval grid on which the results are presented. The latter should
thus be chosen not to limit the resolution of the measurement. All
information on the vertical resolution is included in the averaging kernel
matrix. Contrary to common belief, a
wide field of view or an observation geometry other than limb or with
coarse vertical sampling does not per se degrade the vertical resolution of the measurements. The altitude resolution of the retrieval
is determined only by the vertical grid and the regularization.
It goes without saying, however, that a wide field of view or any
sub-optimal observation geometry often forces the retrieval scientist to use
a stronger regularization to get useful results, which, in turn, will
degrade the altitude resolution. Thus, the field-of-view geometry or sampling has an indirect influence on the vertical resolution of the
retrieval, which is fully accounted for by the averaging kernel matrix
and does not need extra treatment. Vertical oversampling in limb sounding, i.e., the use of a tangent altitude spacing finer than the width of the instantaneous field of view of the instrument, still allows a useful vertical
resolution finer than the field of view

The

A drawback of the Backus–Gilbert spread is that it depends largely on the grid on which the retrieval is performed. The averaging kernel of a retrieval performed on a finer vertical grid will have more pronounced side lobes which are simply not resolved by an averaging kernel evaluated on a coarser grid. If we conceive the coarse-grid averaging kernel as a superposition of fine-grid averaging kernels, the side lobes cancel out. The Backus–Gilbert spread is very sensitive to such side lobes and will thus inadequately “punish” the fine-grid retrieval by giving large weight to these side lobes and thus assigning a large “spread” to them. It thus does not seem suitable for a largely grid-independent measure of the vertical resolution.

Obviously, the altitude resolution can be altitude dependent. Usually, the averaging kernel matrix is evaluated on the grid on which the retrieval is performed, because the Jacobians needed are often a by-product of the retrieval. The disadvantage of this approach, however, is that the averaging kernel does not represent any subgrid smoothing effects. Averaging kernels evaluated on a finer grid, which, by the way, are no longer square, can in principle be provided if the related Jacobians are made available, but this is hardly ever done. The ideal averaging kernel is the identity matrix. This averaging kernel matrix corresponds to a maximum likelihood retrieval where the weight of prior information is zero. Here the altitude resolution is equal to the grid width of the retrieval. In agreement with our intuition, the altitude resolution cannot be better than the width of the grid on which the retrieval is performed.

It is a common misconception that the averaging kernel characterizes
the vertical resolution of the estimated profiles

As is often the case, precision and resolution share a trade space in
remote sounding retrievals. We see from Eq. (

In the context of altitude resolution, a cautionary note is in
order. The altitude resolution is identical neither to the grid width nor to the information smearing. In a regularized retrieval the vertical
resolution is coarser than the retrieval grid. Only in an
unconstrained maximum likelihood retrieval is the vertical resolution
equal to the grid width. Conversely, the vertical resolution of measurements that are sensitive to a very small air parcel is only limited by the vertical grid, and
the sampling theorem

Another concept closely related to the concept of altitude resolution
is that of the degrees of freedom of the signal. This number is calculated
as the trace of the averaging kernel matrix

Ideally the maximum, the mean, and the median of the averaging kernel
coincide with the nominal altitude, but “it ain't necessarily so”

An example of the importance of this issue is found in

If the a priori profile

Secondly, regularization can cause a bias by pushing the results systematically
towards higher or lower values. Any such effect besides mere smoothing is
characterized by the measurement response function

In the case of a multi-species profile retrieval, the sum is calculated over the sub-block or the averaging kernel matrix referring to the profile under assessment. If the regularization of a retrieval provides a smoothed version of the truth, without systematically pushing results towards greater or smaller values, the sum of the elements over each row of the averaging kernel should be unity. Any deviation of the row sums from unity thus hints at an influence of the constraint that is beyond pure smoothing. The measurement response function is retrieval-unit-dependent.

Even if the averaging kernel matrix is far from unity, a measurement
response function close to unity indicates that the retrieval is,
putting measurement errors aside for a moment, a smoothed but unbiased
representation of the true profile. Conversely, values of the measurement
response function deviating by an appreciable amount from unity indicate
a large influence of the prior information not only on the profile shape, but also on the integrated values. Interpretation of the measurement response,
however, requires some caution. Any non-zero

The row sum of the averaging kernel, which makes up the measurement response,
consists of summands which refer to a perturbation by the same amount in each
layer, where, again, the “sameness” is unit-dependent. Such a perturbation can be fully realistic in one layer and fully unrealistic in other layers,
depending on the retrieval units. The evaluation of the measurement response is particularly problematic in cases where the profile values cover a wide
dynamic range. This is the case, e.g., for the

The discussion of the averaging kernel matrix and smoothing error was focused
on the retrieval of single quantities so far, e.g., vertical profiles of a
single state variable. Often, however, multiple different state variables are
jointly retrieved in one leap. In this case the regularization constraining
one state variable can affect the result of the other and vice versa. More
specifically, the smoothing error of one variable can propagate onto the
result of the other variable and thus give rise to regularization crosstalk.
If the full (multi-variable) averaging kernel matrices are stored, the
resulting parameter errors can be evaluated using Eq. (

In order to avoid problems due to formal regularization, often regularization
by means of a coarse discretization is used

The retrieval of vertical column densities in cases when no sufficient
information on the vertical distribution of the state variable is available
pushes this rationale to extremes (see Sect.

In the context of averaging kernels and vertical resolution, a few further remarks are in order.

Time series of state values at a given altitude are particularly problematic when the averaging kernel is time-dependent in itself. Here it may help to remove the prior information from the data along with resampling on a coarser grid in order to achieve

While averaging kernels of maximum likelihood retrievals are unity on the native grid on which the retrieval has been performed, any interpolation to finer grids will entail non-unity averaging kernels.

Averaging optimal estimates will not usually create optimal averages. This is particularly true when the prior information is the same for each retrieval, e.g., a climatological data set. This is because the weight of the prior information will be too large in the average

Even if the prior information can be conceived as the frequency distribution of true states, any deviation of the assumed frequency distribution from the true one is an additional error source which is not typically considered in estimated error budgets.

Error estimation will never be perfect, not only because the input
variables of error estimation are uncertain in themselves, but also
because there always are error sources that those responsible for the
error estimation may not be aware of.

The only way known to us to gain confidence that all relevant error
sources have been considered is to compare multiple independent
measurements based on different measurement systems where we can
fairly safely assume that they are not all affected by the same type of
systematic effect. If the discrepancies between the results of
different instruments can be explained by the combined error budgets,
we have reason to believe that the error budgets of the instruments
under comparison are fairly complete

It goes without saying that natural variability in a sense that
the atmospheric state at place

Instrument drift we understand is a false trend in the derived state
variables which is caused by an unstable instrument. At first order, a drift
can be avoided if regular and frequent calibrations are performed or if the self-calibrating measurement procedures are employed. However, higher-order
effects, e.g., related to the nonlinearity of the calibration curve, can lead to noticeable drifts.

Whenever ex ante drift estimates are available, they should of course be communicated to the data user. Since, however, drifts usually can be determined only reliably towards the end of a mission, it does not make sense to require drift estimates in data characterization papers, which are typically written in the early phase of a mission. A deeper discussion of drifts is found, e.g., in

Space-borne UV measurements are typically affected by particularly severe instrumental degradation, i.e., loss of throughput. This is usually caused by optical coatings degrading when exposed to UV radiation. If a tangent altitude normalization approach or another self-calibration approach is used, this degradation is not necessarily a big problem, but the signal-to-noise ratio will decrease over time.

The SBUV/2 instruments use an onboard calibration system to track relative spectral and temporal changes in diffuser reflectivity using a mercury lamp

Intrinsically self-calibrating measurement geometries such as solar and
stellar occultation or regular calibration measurements using internal sources at first order remove this error. This does, however, not apply to drifts of the shape of the nonlinear detector response function as discussed above. To date, these drifts have not been evaluated as part of the routine error analysis of the Level-2 product, but they are assessed by careful comparison with other instruments. While it is not easily possible to get absolute drift estimates
from this, at least the relative drifts between instruments can be estimated

Within linear theory, errors of different sources combine additively and follow Gaussian error propagation. We have

Some data providers publish total error estimates. This practice is
also endorsed by

The goal of the TUNER effort has always been to bring the atmospheric remote sensing community together to enable better science. While a great deal of work has been performed over many decades, certain questions about the intercomparability of different data sets continue to linger and can only be answered if the data provided satisfy the conditions of adequacy described in this paper. While TUNER is not the first attempt at achieving this lofty goal (and may not be the last), we believe that the TUNER group is well suited to this task. With the aim of establishing a consensus on error reporting, the TUNER group is comprised of remote sensing retrieval experts representing instruments with well over a century of combined operational time and experience. Comprising both data providers and data users, the TUNER consortium aims to “practice what they preach” in the hopes that data from past, present, and future instruments may finally be used in a consistent and intercomparable fashion.

Based on the framework and consensus terminology outlined above,
and in response to the conditions of adequacy formulated in Sect.

The language and notation used to describe the error budget must be clearly defined.

This can be accomplished either by explicit definitions of all terms and
symbols used or by reference to any available document that lays down a
self-consistent terminology. We hope that this paper serves that purpose and that the terminology and notation introduced here will be found
useful

In the scientific community, it is often desirable to have a citeable source regarding notation and terminology so as to be consistent. The authors do not want to dictate what language to use and thus do not provide such a recommendation about the notation and terminology in this paper. The decision is left to the reader.

(CoA 1, CoA 3, CoA 4).Every effort should be made to make the error budget as complete as possible in the sense that all sizeable sources of uncertainty are included, either via linear mapping, sensitivity studies, or whatever is appropriate for the particular case under assessment.

The choice of which error estimation scheme is adequate depends on the
instrument and the specific retrieval scheme. Thus, no one-size-fits-all error estimation scheme is recommended here. The responsibility for judging
which treatment of uncertainties is adequate lies with the retrieval scientist, because only they can judge which error sources and error propagation mechanisms are relevant for a particular instrument or data product. An overview of the most commonly used retrieval schemes is given is Sects.

Substantive contributions from each relevant error component should be reported separately.

The reason for this recommendation is that an estimated error component due to one particular error source can be of a random characteristic in one application and of a systematic characteristic in another application. For example, errors due to uncertain strengths of spectral lines are random if, say, the chlorine budget is calculated from multiple chlorine-containing constituents, each having its own uncertainty due to spectroscopic data. Conversely, in the analysis of a time series of one species, the estimated errors due to erroneous line intensities act as a systematic error. The data user is able to consider the relevant error components only if the error contributions are reported separately. If, in addition, the total error is reported, it should include the systematic and random components. Some error sources can contribute both to the random and systematic error components (CoA 5, CoA 4).

For each error source, it is often necessary to know whether the resulting error components are independent between two subsets of data within a certain domain (time, space, species, etc.).

For example, the error component due to tangent altitude uncertainties can be correlated between different species retrieved from the same measurement. The error component due to spectroscopic data may be correlated in the altitude domain but uncorrelated between different species, etc. We recommend that data providers describe the correlation within each relevant domain either qualitatively or quantitatively, wherever possible. The need for this is illustrated by the example already described under Recommendation R 3. Other examples are quasi-systematic errors which are random in the long run only but can be highly correlated on shorter timescales (CoA 5; CoA 3).

When instrument groups make the error components available, they should also indicate which of them contribute primarily to the random error and which contribute primarily to the systematic error.

Classification and combination of errors is most helpful to the data user if it is made by their systematic versus random nature rather than by origin (CoA 5; CoA 3). This is important, e.g., in the context of validation. If estimated errors are reported as aggregated parameter errors and some of them are of a systematic nature while the others are of a random nature, the data user will not be able to judge which fraction of the bias or the standard deviation of the differences between two measurement systems is explained by the systematic or random error, respectively. On the face of it, this recommendation looks redundant with Recommendation R 4 applied to the time domain, but it is not. Components of the error budget may be strongly autocorrelated in the time domain but still lead to zero bias and thus contribute to the random error only. Again it should be kept in mind that some error sources can contribute both to the random and systematic error components.

The meaning of the reported uncertainties shall be clarified.

Do they refer to

For all error components, the assumed ingoing uncertainties shall be reported in the relevant documentation. It should also be reported which correlation characteristics were assumed (e.g., scalar perturbation of a profile, individual perturbation of its elements, or consideration of its full covariance matrix).

Without reporting assumptions about ingoing errors, error propagation would not be traceable. With this information at hand, a data user can re-scale error estimates if there is some doubt about the assumption about ingoing uncertainties (e.g., the

If the retrieval uses prior information in the sense of
Eq. (

This allows the data user to apply Eq. (

In addition to the error budget, averaging kernels
(Eq.

If a certain retrieval scheme does not give direct access to averaging kernels (e.g., onion peeling), then averaging kernels shall be determined by sensitivity studies based on perturbations of the profile. For retrieval approaches using truncated singular value decomposition or related approaches, the final altitude resolution shall be expressed as averaging kernels. For global fit maximum likelihood retrievals (no regularization) the averaging kernels are by definition unity, but only in the native retrieval grid. In such cases, regridding of data will give rise to non-unity averaging kernels. At the very least, the original grid and the interpolation scheme shall be reported. The data provider should calculate the averaging kernels on the final grid on which the data are provided to the user. To avoid any misinterpretation of the averaging kernel and taking the averaging kernel matrix for its transpose, it should be indicated which index refers to the columns and which to the rows of the averaging kernel matrix (CoA 1, CoA 5, CoA 2, CoA 3).

The space to which the averaging kernel applies (e.g., linear/logarithmic, mixing ratio/density, absolute/relative) shall be reported.

This is particularly important when data are reported in a form that differs from that of the retrieval state vector. For example, the averaging kernels resulting from a retrieval of the logarithms of mixing ratios must not be applied to the mixing ratios themselves. It is thus of utmost importance to communicate to the data user to which quantities the averaging kernels refer. If the averaging kernel made available to the data user underwent some transformation, the user should also be informed in which space the averaging kernel was initially calculated (CoA 1, CoA 5, CoA 3, CoA 4).

The smoothing error should not be included in the error budget. Instead the data users should be provided with the averaging kernel matrices calculated on a sufficiently fine grid allowing them to evaluate the smoothing error on their working grid to which they will transform the data.

Error propagation of the smoothing error in the context of interpolation to
finer grids will usually fail to produce the full smoothing error on the fine
grid

The discretization must be specified. If the retrieval is reported as state value on any vertical grid, the applicable interpolation rule must be reported. If an altitude-resolved retrieval is performed in any other space
than state value over altitude, pressure, or likewise (e.g., if eigenvectors
or similar are used, see Sect.

While these alternative representations certainly have their advantages, the data producer is in a better position than the data user to provide the diagnostic data for a profile representation (CoA 5, CoA 2, CoA 3).

Retrieval scientists should judge whether evaluation of error budgets and averaging kernels for a limited number of representative cases is adequate. If averaging kernels are only provided for a few representative cases, one might still consider to show at least the vertical resolution profiles for each profile.

Communication of a complete error budget for each profile, broken down to all components with all correlation information, along with averaging kernels and a priori information used, is not always technically feasible and often creates unnecessary data traffic (CoA 6).

The following recommendations R 14–17 are applicable to the case when only representative diagnostic data are available.

In this context we would like to mention that there exist methods to convey
the information content of a measurement at drastically reduced data volume

If representative error estimates are reported instead of error estimates
for each single profile or data point, it is of utmost importance to tell the
data user whether the nature of each error component is chiefly additive (i.e., independent of the actual state value reported) or is chiefly relative
(i.e., a scaling factor). For the first type, the estimated errors shall be
reported in the same units as the state variable (e.g., Kelvin,
ppmv, molec cm

With this information, the data user can adjust the error estimates to the particular scientific study. For example, measurement noise often leads to an additive error component; i.e., the estimated error is approximately of the same size, regardless how large the mixing ratio of the target gas is. Conversely, errors representing spectroscopic uncertainties are often multiplicative. That is to say, larger profile values have larger errors (CoA 3, CoA 6).

If certain estimated errors or other characterization data are known or suspected to depend systematically on time, latitude, or other parameters, this dependence should be reported, particularly if only representative errors are reported.

For example, in infrared emission spectroscopy the precision of concentration retrievals is usually worse for a colder atmosphere. With this information a data user who is using a retrieval of a particular cold day which is not well represented by the sample error estimates is warned that the actual precision may be worse than the reported one (CoA 3).

If, for application to mean profiles, mean averaging kernels are provided in conjunction with mean profiles instead of individual ones, then the correlation profiles between the averaging kernels and the retrieved profiles shall be provided.

The reason is that the mean averaging kernel applied to the mean profile does
not equal the mean of individual averaging kernels applied to individual
profiles

If, in order to reduce the data volume of profile data characterization, only standard deviations are reported for the individual profiles instead of the full covariance matrices, then a representative random error correlation pattern in the altitude domain (correlation matrix) shall be made available.

With this, the user can approximate individual covariance matrices (CoA 3, CoA 5, CoA 6).

The error estimates should explain observed differences between measurements of the same air mass.

The final criterion of adequacy of error reporting is whether
discrepancies between measurements of the same atmospheric state
variable by independent measurement systems can be explained by
the error estimates. This practical and empirical criterion of
completeness of the error budget does not require knowledge of
the unknowable true value of the measurand (CoA 1, CoA 5). In this context we distinguish between random and systematic errors.

We consider random error estimation schemes adequate if a combination of the deduced error and the less-than-perfect spatial
or temporal coincidences between two data sets and natural
variability together explain the observed standard deviation of the
differences between two data sets. If predicted random errors fail
to explain observed differences, they should be reassessed. Methods
to find out which of the compared data sets has an inadequate random
error estimate have been described in, e.g.,

We consider estimates of the systematic errors to be adequate if they, along with sampling biases and after accounting for different vertical/horizontal/temporal resolutions and content of a priori information, explain the observed biases between independent instruments (CoA 5).

On the face of it, the list of recommendations appears quite weak, leaving a lot of freedom to the data provider. This is, however, not the case. Recommendation R 2, that the error budget should be as complete as possible, along with Recommendation R 18, which gives a criterion for the completeness of the error budget, quickly make the apparent freedom disappear.

Admittedly, these recommendations will not guarantee perfect compliance
with the conditions of adequacy, but due to the competing needs of rigor
versus practicability the problem seems overconstrained. In other words,
you “can't always get what you want”

In this paper we have discussed conventional (as opposed to machine-learning- and artificial-intelligence-based approaches) error estimation methods for Bayesian and non-Bayesian retrieval methods. The choice of the retrieval method is a dilemma. If likelihood-based methods are chosen, the retrieval lacks a probabilistic interpretation and ad hoc constraints will imply a bias, at least if the retrieval is conceived as a smoothed estimate of the true state. This horn of the dilemma is avoided by Bayesian methods, which use probabilistic constraints. Adherents of likelihood-based methods, however, will point out the second horn of the dilemma, which is that it is never warranted that the a priori statistics chosen indeed represent the true background state. Further, they will raise the concern that Bayesian methods, even if based on the true background statistics, may render bias-free estimates in the long run but may be off the true atmospheric state in a single case. The decision for the acceptance of the one or the other horn of the dilemma is a philosophical one, and in most cases it cannot be based on scientific grounds. The only recommendation we can offer in this respect is a plea for mutual tolerance. Regardless of which approach is chosen, the data characterization has to be consistent with the retrieval method chosen. This paper tries to provide the scientific basis for this.

This paper is mainly addressed to providers of Level-2 data, i.e., data on
atmospheric state variables. Some data users, however, prefer to work
directly with Level-1 data, i.e., with measured radiances or transmissions.
For example, the direct data assimilation of measured signals is sometimes
preferred over the assimilation of retrieved state variables

In some fields of remote sensing of the atmosphere, retrieval methods
based on artificial intelligence, neural networks, and machine learning are explored

But even with the conventional retrieval and error estimation schemes there is a lot of homework to do. We hope that this review paper has identified the most relevant problems in this field and provides a conceptual framework to adequately characterize remotely sensed atmospheric temperature and composition data.

No data sets were used in this article.

TvC, DAD, and NJL organized the project. TvC, NJL, and RD wrote major parts of the text. All the authors contributed to the discussion of the paper and particularly the recommendations.

TvC, CvS, and GPS are associate editors of AMT but have not been involved in the evaluation of this paper.

This article is part of the special issue “Towards Unified Error Reporting (TUNER)”. It is not associated with a conference.

The World Meteorological Organization (WMO) has provided travel support through the Stratosphere-troposphere Processes And their Role in Climate (SPARC) project, who have selected the TUNER project as a SPARC activity. The International Space Science Institute (ISSI) has funded two International Team meetings in Berne at their venue. Part of this work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with NASA. The authors would like to thank Andreas Richter for providing information on UV-VIS nadir sounding products. One of the reviewers stood out as particularly knowledgeable, thorough, thoughtful, and constructive. The authors highly appreciate their suggestions. We acknowledge support by the KIT-Publication Fund of the Karlsruhe Institute of Technology.

This research has been supported by the Karlsruher Institut für Technologie (KIT-Publikationsfonds). The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association.

This paper was edited by Helen Worden and reviewed by Clive Rodgers and two anonymous referees.