Characterization of errors and sensitivity in remotely
sensed observations of greenhouse gases is necessary for their use in
estimating regional-scale fluxes. We analyze 15 orbits of the simulated Orbiting Carbon Observatory-2 (OCO-2)
with the Atmospheric Carbon Observations from Space (ACOS) retrieval, which
utilizes an optimal estimation approach, to compare predicted versus actual
errors in the retrieved

The Orbiting Carbon Observatory-2 (OCO-2) was launched in July 2014 and began providing science data in
September 2014, with the goal of estimating

Cressie et al. (2016) estimates the size of second-order terms of the error analysis. The second-order terms contain derivatives of the averaging kernel, gain matrix, and Jacobians with respect to state parameters. Cressie et al. (2016) estimates that the errors resulting from second-order error analysis are on the order of 0.2 ppm, but this analysis was dependent on the states and sizes of deviations used to calculate the second-order derivatives. Cressie et al. (2016) found that second-order terms can cause both larger errors and biased results.

This paper explores the errors in the full physics retrieval system using a simulated system with no mismatches in the retrieval versus true state vector and no spectroscopy or instrument errors. The actual error covariance of (retrieved minus true) for this retrieval system is about twice the predicted errors. The linear analysis of Connor et al. (2016) does not explain the higher errors in this work, because the simulations in this work do not include unaccounted errors sources. Cressie et al. (2016) also does not explain the higher actual errors, because Cressie et al. (2016) estimates the second-order error as about 0.2 ppm, whereas the unaccounted error is about 0.8 ppm in this paper. In order to identify the source of the unaccounted error, actual errors are compared to the predicted linear errors for a series of setups.

The ACOS Level 2 (L2) full physics retrieval algorithm used to estimate

The retrieved

Simulation studies can be used to understand and probe retrieval results.
There are many different ways to assess errors, listed here in order of
increasing complexity and nonlinearity.

Linear estimates of errors, which assume moderate linearity of the retrieval system (Connor et al., 2008, 2016), useful for surveying impacts of different errors with linear assumptions.

Error estimates from nonlinear retrievals of simulated radiances using a fast, simplified radiative transfer, called the “surrogate model” (Hobbs et al., 2017). This system does not result in the discrepancy of larger actual versus predicted error.

Error estimates from nonlinear retrievals of simulated radiances generated using the operational L2 forward model, called the “simplified true state”, which has the advantage that the true state is within the span of the retrieval vector and the linear estimate should be valid.

Error estimates from nonlinear retrievals of simulated radiances using a more complex and accurate radiative transfer model to generate the observed radiances (e.g., Raman scattering, polarization handling, surface albedo changes effects) and discrepancies between the true and retrieved state vectors (e.g., aerosol type mismatches between the true and retrieval state vector, albedo shape variations) (e.g., O'Dell et al., 2012).

The ACOS optimal estimation approach is described in O'Dell et al. (2012, 2018) and Crisp et al. (2012). In this section we review the parameters in the retrieval vector and the equations for error estimates. The retrieved parameters for this simulation study are shown in Table 1.

Retrieved parameters in this simulation study.

All non-

The a priori covariance matrix for

The predicted errors, found in the OCO-2 L2 product as
“

The full gain matrix,

The averaging kernel,

The linear estimate describes the response of the retrieval system to instrument errors and incorrect a priori inputs, based on the strengths of the Jacobians (representing sensitivity of the radiances to the retrieval state) and constraints (how much pressure is applied to parameters to stay near the a priori inputs). The linear estimate in Eq. (1) is used to estimate the errors, and for simulations, where we know all the inputs, it is useful to test each component of Eq. (1).

After an inversion is complete, the pressure weighting function

One useful diagnostic is an estimate of how well the modeled radiances match
the observed radiances for each of the three OCO-2 spectral bands:

In reality, Eq. (1) would contain many additional error terms that are not considered in these simulations, e.g., spectroscopy, instrument characteristics, aerosol mismatch errors (i.e., picking the wrong aerosol type to retrieve). These are discussed in detail in Connor et al. (2016) as linear error estimates. The results reported here only address errors in the full nonlinear retrieval system for the actual retrieved variables; they do not include errors from unincluded physics or other error sources (such as spectroscopy error). In the analysis presented in Sect. 3, each of the diagnostics given in Eqs. (1) through (5) will be used to examine the error estimates on the simulations and compared to previously published results on real OCO-2 data.

The simulated dataset analyzed in this study is comprised of a set of realistic retrievals using the ACOS b3.4 version of the retrieval algorithm. It is a slightly modified version of that described in detail in O'Dell et al. (2012) (which discussed b2.9) and described more fully in O'Dell et al. (2018). Table 2 shows the most important changes to the L2 retrieval algorithm between b2.9 and b3.4.

Updates in the simulated retrieval system since O'Dell et al. (2012).

Although newer versions of the OCO-2 L2 algorithm exist (currently b8 as of time of writing), the work presented here was initially begun prior to the launch of OCO-2 in July 2014. In addition, certain tests, where the L2 true state is directly related to the retrieval vector, were simplified by using the older version of the retrieval algorithm which contains a less complicated aerosol scheme. In the older L2 algorithm versions (pre-B3.5), also used in this work, the state vector for all soundings always included the same four aerosol types: cloud water, cloud ice, Kahn 1 (a mixture of coarse- and fine-mode dust aerosols), and Kahn 2 (carbonaceous-mode aerosols) (described more in Nelson et al., 2016). Both Kahn 1 and 2 types contain some sulfate and sea salt aerosols as well. Newer versions of the OCO-2 L2 retrieval include a more complicated scheme in which each sounding includes water and ice, and they pick the two most likely aerosol types based on a MERRA monthly climatology for the particular sounding location. The aerosol fits use a Gaussian-shaped vertical profile for each of the four types, as described in O'Dell et al. (2018).

Inputs to the b3.4 L2 retrieval algorithm include simulated L1b radiances and meteorology (taken from ECMWF) that were generated using the CSU/CIRA simulator (O'Brien et al., 2009). The simulator is driven by satellite two-line elements which are used to provide the satellite time and position. The code calculates relevant solar and viewing geometry and polarization and takes surface properties from MODIS. Only a single day's worth of orbits (15 orbits on 17 June 2012) at reduced temporal sampling (1Hz instead of the operational 3 Hz) and with only one footprint per frame (instead of the operational eight) is presented in this work. This yields approximately 2700 soundings per orbit, totaling about 40 000 soundings. Unlike real OCO-2 viewing modes (see Crisp et al., 2017), the simulations were generated with nadir viewing over land and glint viewing over water. Our simulations do not include nadir-water, glint-land, or target mode simulations, which are additional observation modes used in real OCO-2 data (Crisp et al., 2017). The spectral error for these simulations assumes Gaussian random noise, following the OCO-2 noise parameterization as described in Rosenberg et al. (2017).

Although the simulations do include realistic clouds and aerosols from a CALIPSO/CALIOP (Winker et al., 2010) monthly climatology, the radiative transfer portion of the simulator code allows clouds and aerosols to be switched off, making it easy to generate clear-sky radiances used in this research. The OCO-2 instrument model, described in detail in O'Brien et al. (2009), was used to add realistic instrument noise to the radiances prior to running the L2 retrieval for the noiseless simulations. The operational OCO-2 dispersion, instrument line shape (ILS), and polarization sensitivity were used to sample the top-of-atmosphere radiances. The same solar model as used in the operational retrieval was used in the L1b simulations. In addition, the A-band preprocessor code described in Taylor et al. (2016) was run on the cloudy-sky L1b simulations to provide realistic cloud screening prior to running the L2 retrieval. It is important to test the system from end to end with radiances containing a variety of cloud conditions, because the cloud screening is never 100 % accurate, sometimes letting through cloudy cases, and because quality flags can sometimes flag cloudy cases being as good quality without clouds.

This error analysis ideally would use the exact same forward model in both the L1b simulations and the L2 retrieval algorithm, as our analysis assumes that Eq. (1) should be valid, without errors from forward model differences. However, in reality these two code bases are very similar but not identical. For example, the number of vertical levels within the two code bases differs. Reasonable attempts were made to put the L1b simulations on the same footing with the L2 forward model, but minor model mismatches may remain. We do not believe these minor differences affect our primary results.

Our goal in this work is to compare linearly predicted vs. actual errors in

Configurations used in this work.

Results from different configurations are intercompared to validate the individual measurement, smoothing, and retrieval errors. These predicted errors are compared to the true errors resulting from nonlinear retrievals, which are the retrieved minus true values.

Similar to retrievals from real observations, the simulated retrieval
results need screening to remove cloudy scenes (e.g., see O'Brien et al., 2016;
Polonsky et al., 2014). Because prescreening is not perfect, the

Number of cases for each configuration. The clouds in
true

Table 4 shows the effects of applying postprocessing quality screening for the different configurations from Table 3. The results are separated into land and ocean scenes; approximately one-third pass postprocessing quality screening for cloudy cases; about 80 % pass postprocessing quality screening for cloud-free cases. For configuration (c) in Table 3 (simulations that include clouds), 11 % and 28 % of cases passing prescreening for ocean and land, respectively. Post-processing screening identifies 25 % and 43 % of cases for ocean and land, respectively. These are low compared to OCO-3 simulation studies (Eldering et al., 2019), where 25 %–30 % of cases passed prescreening and 50 %–70 % of cases passed postscreening. Some of the quality flags used for the OCO-3 studies (particularly the preprocessing flags) are not available in our study, so it is hard to directly compare throughput. The lower throughput suggests that the cloud cases or other aspects of this study were harder than the OCO-3 simulation studies.

Table 5 shows

Mean bias and standard deviation between retrieved and true, sorted by standard deviation. The bold entries are the nominal cases most closely simulating actual OCO-2 retrievals.

In the screened data, the main concern is the

Figure 1 shows a scatter plot of the retrieved versus true

Scatter plots of

For real OCO-2 v8 data, comparisons to TCCON for single-observation land
nadir and ocean glint have errors of 1.0 and 0.8 ppm, respectively (Kulawik
et al., 2019a), meaning that the real errors are comparable to these
simulated data. Real OCO-2 data have a systematic error on the order of 0.5–0.6 ppm (Wunch et al., 2017; Kulawik et al., 2019a). Correlated biased errors are
seen in real OCO-2 data, with correlations in time, e.g.,

In this section the different error components that were introduced in Sect. 2.1 are isolated as much as possible to evaluate each one separately. The averaging kernel and Jacobians, introduced in Sect.2.1, are used as diagnostics. In addition, the linearity, or lack thereof, of the system is explored.

To test the system linearity the linear estimate, using Eq. (1) and discussed
in Sect. 2.1 is compared to the nonlinear retrieval result. The inputs to
Eq. (1) include the instrument noise (if on), a priori covariance, and
Jacobians at the final retrieved state. Table 6 shows the results for cases
passing postprocessing quality screening, clouds, and no measurement error
(Table 3, case d) using (1) the first two terms on the right side of Eq. (1)
(i.e., only the

Difference of linear estimate versus nonlinear retrieval, noise-free, cloud, quality-screened cases. SD denotes standard deviation.

Another test of the system linearity is the consistency of the sensitivity
of the system to changes in

To calculate an error resulting from varying Jacobians requires calculating
second-order terms, like

Figure 3 shows the Jacobian magnitude (the

The right panel of Fig. 3 compares the Jacobian magnitude between matched
results from configuration (c) and (d) in Table 3 for land cases with
postprocessing screening. The

To validate the measurement error, results from runs with and without noise
(cases (c) and (b) from Table 3) are analyzed. The standard deviation of the

Histogram of difference between

Figure 4 shows the baseline and predicted measurement error. For land nadir, the average error is 0.35 ppm and the average predicted is 0.29 ppm. For ocean glint, the average error is 0.14 ppm and the average predicted error is 0.21 ppm. The bias difference between the runs with and without noise was 0.01 ppm for ocean and 0.03 ppm for land nadir.

The predicted error ranged from 0.14 to 0.70 ppm for land and 0.12 to 0.35 ppm for ocean. The correlation between the predicted error and the absolute value of the error is 0.27 for land and 0.08 for ocean, so the scene-to-scene variations in the predicted error are not very useful.

Adjacent observations are averaged, and then the error of this averaged quantity is calculated. If the error reduces with the square root of the number of observations averaged, then the error is a random, not correlated, error. A random error is highly desirable for assimilation and other uses. For land nadir the error is shown in Table 7.

Error versus averaging for measurement error.

If the error is random, then the

In summary, for these simulated cases, the measurement error is overpredicted for land by 0.06 ppm and overpredicted for ocean by 0.07 ppm, but the measurement error appears to average randomly and does not introduce a bias.

Smoothing error occurs when the averaging kernel deviates from the identity
matrix, and it is calculated using the averaging kernel, the true state, and the prior state. The smoothing error terms from Eq. (1) are

The predicted smoothing error is 0.12 ppm for ocean glint and 0.16 ppm for
land nadir. Comparison between retrieved

Comparison of the retrieved

Previous studies by Merrelli et al. (2015) and O'Brien et al. (2016) have
found that clouds and aerosols can contribute errors larger than predicted.
We look at the relationship between errors in retrieved interferents versus
errors in

The error in

Predicted (red line) and true error (red dots) for two
interferents: band 2 albedo slope

We look at (retrieved minus true

Predicted and true errors.

Figure 6 shows the predicted versus true errors, including correlations. The
true error is calculated from

Predicted and actual errors for interferents and correlations
between interferents and

Another useful diagnostic of interferent error is the predicted error
correlation between each interferent and

The true effect of interferent error on

Postprocessing analysis of real ACOS OCO-2 retrieval results has uncovered
linear relationships between

Do the bias correction for dP and

What is the effect of bias correction on

Figure 7 shows errors in

Error in retrieved

These results naturally lead to the following question: what is the effect of placing

The quantity dP is the difference between retrieved and prior surface pressure and is used as a postprocessing bias correction for OCO-2. In this section, we explore results from dP in the simulated dataset to try to understand why bias correction based on this parameter is useful.

Although it is typically assumed that the surface pressure is determined
solely from the O2 A band, the strong and weak

Figure 8 shows the actual error covariances and biases for three different
subsets of run (d): dP <

Normalized actual error covariances and biases of retrieved
parameters for dP <

Next we looked at the possibility of screening incorrect surface pressure
results using

The albedo errors and correlations (purple box) particularly stand out, with
correlations with many retrieved parameters. The albedo terms are, in order,
O2 A-band mean, O2 A-band slope, weak mean, weak slope, strong mean, and strong slope. Based
on the O2 A-band mean albedo and the surface pressure Jacobians, a change in
retrieved surface pressure of

Errors in the retrieved

Error in the lower partial column (LMT), upper partial column (

The retrieval system must match the mean photon path length for the O2 A-band
channel using retrieved parameters like surface pressure, albedo, water,
temperature, aerosol pressure heights, and aerosol optical depths. Also note
that the

As seen in Fig. 8b, for dP <

To trace the interferent errors to an error for

Figure 8d–f shows the bias patterns for these different groups. Comparing Fig. 7d, e, and f reveals patterns that could be used for screening: e.g., a low bias in Kahn 1 aerosol optical depth and low biases in all albedo means as well as high biases in all albedo slope indicate a negative surface pressure error, whereas a high bias in Kahn 1 aerosol pressure and width and a high bias in the strong band albedo slope indicate a positive surface pressure error. In real retrievals, a high albedo bias could not be distinguished from high true albedo; however, the pattern of biases observed in Fig. 8 could be used to identify low-quality retrievals (e.g., albedo higher than expected, aerosol OD larger than expected, and surface pressure lower than expected) and implies a bad result.

Predicted and actual errors and biases in raw and bias-corrected
simulated data run with configuration (b) from Table 3. Similar to
operational retrievals, bias-corrected

It is interesting to note that the system appears to be able to compensate
for and pass postprocessing quality screening, using albedo and aerosols, for
low surface pressure biases down to

Another important question is the following: how does bias correction within the

The errors for

Similar to findings with real data, the

As seen in Sect. 3.1, nonlinearities from interferents affect both
partial columns similarly. This would result in a positive error correlation
(since the correlation is strongly negative and results in a less negative
correlation than predicted) and explain the larger actual versus predicted

The 15 orbits of simulated retrievals result in

Comparing runs with and without measurement noise added to the radiances
showed that the predicted measurement error is accurate. Comparing the
retrieved results to the linear estimate using only the

The linear estimate does not predict the nonlinear retrievals to better
than 0.9 ppm (much worse when quality flags are not used), indicating this
level of nonlinearity in the retrieval system. The interferent error is
underpredicted by a factor of 4, based on the relationship of

Two bias correction terms are explored:

The theoretical basis for dP is complicated because so many other retrieval parameter errors are correlated to errors in dP. This makes sense from a fundamental radiative transfer perspective which ties together the surface and scattering properties with the amount of atmospheric column for any particular sounding. The retrieval system appears to use albedo and aerosols to compensate for errors in dP. In these simulated results the dP bias correction has a similar slope as seen in real OCO-2 data for land but not for ocean. The results with dP errors had marginally higher radiance residuals but not high enough to easily screen.

Similar to the findings in Kulawik et al. (2017), the

These results suggest a few possible strategies (a) isolating the primary
interferent parameters via preretrievals of aerosols with surface pressure,

In summary, the simulated retrievals have many of the same attributes of real data, with the advantage of knowledge of the true state and ability to see what is going on under the hood. These simulation studies suggest attention should be given to nonlinearity, because the ability to estimate errors and make incremental improvements depends on the accuracy of the linear estimate, which has an accuracy of only about 0.9 ppm in these simulation studies.

Data are available at

SSK set the direction of the research, did the analysis, made figures, and was the primary manuscript writer. RRN and TET generated simulated OCO-2 true states, radiances, and retrievals. CO'D guided the work of the simulation system and advised on the analysis. All authors participated in the manuscript writing and editing.

The authors declare that they have no conflict of interest.

Plots were made using Matplotlib (Hunter, 2007).

This research has been supported by the NASA Roses (grant no. NMO710771/NNN13D771T, “Assessing Oco-2 Predicted Sensitivity And Errors”.

This paper was edited by Justus Notholt and reviewed by Sourish Basu and one anonymous referee.