A new Orbiting Carbon Observatory 2 cloud flagging method and rapid retrieval of marine boundary layer cloud properties

The Orbiting Carbon Observatory 2 (OCO-2) carries a hyperspectral A-band sensor that can obtain information about cloud geometric thickness (H ). The OCO2CLDLIDAR-AUX product retrieved H with the aid of collocated CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) lidar data to identify suitable clouds and provide a priori cloud top pressure (Ptop). This collocation is no longer possible, since CALIPSO’s coordination flying with OCO-2 has ended, so here we introduce a new cloud flagging and a priori assignment using only OCO-2 data, restricted to ocean footprints where solar zenith angle< 45. Firstly, a multi-layer perceptron network was trained to identify liquid clouds over the ocean with sufficient optical depth (τ > 1) for a valid retrieval, and agreement with MODIS– CALIPSO (Moderate Resolution Imaging Spectroradiometer) is 90.0 %. Secondly, we developed a lookup table to simultaneously retrieve cloud τ , effective radius (re) and Ptop from A-band and CO2 band radiances, with the intention that these will act as the a priori state estimate in a future retrieval. Median Ptop difference vs. CALIPSO is 12 hPa with an inter-decile range of [−11,87]hPa, substantially better than the MODIS–CALIPSO range of [−83,81]hPa. The MODIS–OCO-2 τ difference is 0.8[−3.8,6.9], and re is −0.3[−2.8,2.1]μm. The τ difference is due to optically thick and horizontally heterogeneous cloud scenes. As well as an improved passive Ptop retrieval, this a priori information will allow for a purely OCO-2-based Bayesian retrieval of cloud droplet number concentration (Nd). Finally, our cloud flagging procedure may also be useful for future partial-column above-cloud CO2 abundance retrievals.

Abstract. The Orbiting Carbon Observatory 2 (OCO-2) carries a hyperspectral A-band sensor that can obtain information about cloud geometric thickness (H ). The OCO2CLD-LIDAR-AUX product retrieved H with the aid of collocated CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) lidar data to identify suitable clouds and provide a priori cloud top pressure (P top ). This collocation is no longer possible, since CALIPSO's coordination flying with OCO-2 has ended, so here we introduce a new cloud flagging and a priori assignment using only OCO-2 data, restricted to ocean footprints where solar zenith angle < 45 • . Firstly, a multi-layer perceptron network was trained to identify liquid clouds over the ocean with sufficient optical depth (τ > 1) for a valid retrieval, and agreement with MODIS-CALIPSO (Moderate Resolution Imaging Spectroradiometer) is 90.0 %. Secondly, we developed a lookup table to simultaneously retrieve cloud τ , effective radius (r e ) and P top from A-band and CO 2 band radiances, with the intention that these will act as the a priori state estimate in a future retrieval. Median P top difference vs. CALIPSO is 12 hPa with an inter-decile range of [−11, 87] hPa, substantially better than the MODIS-CALIPSO range of [−83, 81] hPa. The MODIS-OCO-2 τ difference is 0.8 [−3.8, 6.9], and r e is −0.3[−2.8, 2.1] µm. The τ difference is due to optically thick and horizontally heterogeneous cloud scenes. As well as an improved passive P top retrieval, this a priori information will allow for a purely OCO-2-based Bayesian retrieval of cloud droplet number concentration (N d ). Finally, our cloud flagging procedure may also be useful for future partial-column above-cloud CO 2 abundance retrievals.

Introduction
Hyperspectral O 2 A-band measurements near λ = 0.78 µm, such as those taken by the Orbiting Carbon Observatory-2 (OCO-2), may provide unique new information about boundary layer clouds by retrieving their geometric thickness (H ) or droplet number concentration (N d ), provided coincident information about effective radius (r e ) from other channels. They are able to do this because the spectrum responds to the photon path length between the Sun, Earth and the sensor. Increased H or decreased N d with all other cloud properties held constant leads to increased distance between within-cloud scattering events and therefore a longer photon path length and decreased transmittance in wavelengths where O 2 absorbs. This leads to spectrally varying changes in observed A-band spectra that can allow for joint retrievals of cloud optical depth (τ ), cloud top pressure (P top ) and H , provided there is sufficient spectral resolution and low enough noise (O'Brien and Mitchell, 1992;Richardson and Stephens, 2018).
An OCO-2-based retrieval of τ , P top and H has been developed (OCO2CLD-LIDAR-AUX, available at http://www.cloudsat.cira.colostate.edu/data-products/ level-aux/oco2cld-lidar-aux, last access: 1 September 2020), which uses lidar-based retrievals from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite to help identify cloudy scenes and constrain prior P top (Richardson et al., 2019). This retrieval is targeted at single-layer liquid clouds over the ocean whose response, both to warming and aerosols, is a major source of uncertainty in climate simulations (e.g. Bony et al., 2005;Bodas-Salcedo et al., 2019;Zelinka et al., 2020). Independent information about cloud structure may help to address timely questions, where other sensors which rely on different retrieval approaches and assumptions can lead to apparently contradictory conclusions (Rosenfeld et al., 2019;Toll et al., 2019).
With CALIPSO leaving the A-Train constellation in 2018, collocation between OCO-2 and CALIPSO footprints is no longer possible. Our future retrievals require a new cloud flagging method plus a priori cloud top information for our iterative Bayesian optimal-estimation (OE) retrieval (Rodgers, 2000). This paper describes a new pre-processor for OCO-2-based liquid cloud property retrievals that provides the requisite cloud flagging and a priori information. Details of OCO2CLD-LIDAR-AUX are summarised in Table 1, which also lists the main changes introduced in this study.
We do not use the published OCO-2 cloud flag, as it was not developed for ocean nadir scenes (Taylor et al., 2016), since they were considered too dark for OCO-2's main mission of column CO 2 (XCO 2 ) retrievals (Crisp, 2008;Crisp et al., 2004;Eldering et al., 2016). Therefore we train a multi-layer perceptron network to rapidly identify liquid cloud scenes using collocated CALIPSO and Moderate Resolution Imaging Spectroradiometer (MODIS) retrievals. For the prior cloud property retrieval we develop lookup tables (LUTs) that jointly retrieve τ , r e and P top using OCO-2 O 2 A-band and strong-CO 2 -band (λ ∼ 2.06 µm) radiances. These are similar to the Nakajima-King tables used in MODIS cloud retrievals (Nakajima and King, 1990) but add an A-band absorption ratio that is sensitive to P top .
Our OCO-2 OE retrievals are computationally expensive due to the complex radiative transfer (RT), so we aim to avoid footprints which are unlikely to yield good retrievals. The cloud flagging and prior LUT retrieval developed here are a necessary step in excluding these footprints, and we further exclude those where solar zenith angle (SZA) > 45 • based on OCO2CLD-LIDAR-AUX's retrieval statistics. It is possible that a future partial-column (i.e. above-cloud) XCO 2 retrieval could be developed, which would likely be targeted at columns above optically thick clouds, so the pre-processor developed here could find wider use (Schepers et al., 2016;Vidot et al., 2009). A further development is that our past retrievals used a fixed r e , and the addition of varying r e is eased by a new Python RT interface using the ReFRACtor (Reusable Framework for Retrieval of Atmospheric Composition) software described in Sect. 2.3. Our new LUT retrieval of a prior r e will allow for a more appropriate r e to be assumed in the iterative OE.
The paper is organised as follows: Sect. 2 describes the relevant OCO-2 details, data selection and radiative-transfer calculations before detailing the methodology. Section 3 reports the performance statistics of the classifier, compares LUT retrieved cloud properties vs. MODIS and CALIPSO where the instrument footprints overlap, and compares the final pre-processor throughput against that of OCO2CLD-LIDAR-AUX. Section 4 discusses and contextualises the results and proposes actionable future work that could address identified biases, and Sect. 5 concludes.
2 Methods and data

Instruments and data selection
The OCO-2 measurement approach and instrumentation are detailed in Bösch et al. (2017); the Level 2 Full Physics (L2FP) RT's application to clouds is detailed in Richardson et al. (2017); and the MODIS-CALIPSO-OCO-2 matchup data are as used in Taylor et al. (2016). The datasets used here are listed in Table 2; in particular, from the OCO-2 Level 1b Science (L1bSc) data we obtain calibrated radiances and RT inputs such as solar zenith angle (SZA) and instrument characteristics.
The OCO-2 satellite flies in the Sun-synchronous A-Train constellation (L'Ecuyer and Jiang, 2010) and measures during the daytime ascending node with an Equator crossing time near 13:30. Its orbits are committed primarily to either glint or nadir view, and we use nadir-only orbits to provide complementary vertical information on clouds that are too low or thin to be adequately profiled by CloudSat's nadirview radar. Glint-view footprints would preclude our use of the nadir-only CALIPSO lidar data, and atmospheric photon path lengths would be longer, thereby reducing the retrieval sensitivity. Given our retrieval's computational expense we limit to nadir orbits to optimise the likelihood of good retrievals.
OCO-2 carries three co-boresighted grating spectrometers centred over the O 2 A-band (λ ∼ 0.78 µm), weak CO 2 band (λ ∼ 1.68 µm) and strong CO 2 band (λ ∼ 2.06 µm). The satellite operates in a push-broom fashion with a swath of eight footprints whose orientation relative to the track rotates through the orbit as the satellite angles to optimise solar power generation. The subsequent parallelogram-like footprints are nominally near 1.4 km × 2.2 km at nadir. The channels' wavelengths vary across the track due to the manner in which the optics focus light onto the focal plane array (FPA), and wavelength also drifts throughout an orbit due to Table 1. Summary of methods for determining properties in OCO2CLD-LIDAR-AUX and changes introduced in this study. OCO2CLD-LIDAR-AUX is a full optimal-estimation (OE) retrieval that combined CALIPSO and OCO-2 information to obtain its prior state. This study is intended to provide OCO-2-only prior information for a future OE retrieval.

Property
OCO2CLD-LIDAR-AUX This study Cloud flagging 1. CALIPSO single-layer cloud Multi-layer perceptron network classification based 2. CALIPSO P top > 680 hPa on OCO-2 radiances and radiance ratios 3. OCO-2 radiances exceed static thresholds 4. OCO-2 weak and A-band radiance ratio above fixed threshold for given A-band radiance Cloud effective radius (r e ) Fixed r e = 12 µm in retrieval Estimated from OCO-2 radiances via 3-D lookup OCO2CLD-LIDAR-AUX OCO-2 CALIPSO combined Pre-processor throughput, retrieval χ 2 and retrieval cloud retrieval cloud properties Doppler shift. This causes issues for a LUT developed from a fixed set of channels, since the wavelengths sampled by those channels will differ between each measurement. Furthermore, some sensor pixels are damaged, and we only include channel indexes where all eight swath soundings are classed as good, which reduces the A-band sample from 1016 to 853 channels. Section 2.2 describes how we use a channelaveraging approach to reduce the consequences of this wavelength shift in the cloud classifier, and Sect. 2.4 details our related channel selection for the LUT. For classifier training and validation, we require spatial overlap between OCO-2, MODIS and CALIPSO data. The ascending OCO-2 ground track is approximately 200 km to the east of Aqua's and therefore within the MODIS swath, so we select the 1 km MODIS retrieval footprint whose centre is closest at the surface to the centre of the OCO-2 footprint. However, CALIPSO only measures once at nadir, so only one OCO-2 footprint in each swath can be collocated. Furthermore, even during formation flying the satellites drifted within their control boxes, and some CALIPSO measurements occurred outside the OCO-2 swath. We only include footprints with a CALIPSO-OCO-2 matchup distance of < 1.5 km at the surface. Finally, the dataset was further restricted to footprints with the surface type of water, SZA < 45 • and valid radiances. Between 6 September 2014 and 30 April 2018 the MODIS-CALIPSO-OCO-2 matchup dataset has 5909 nadir orbits, of which 4743 contain valid matchups. This is reduced to N = 3907 orbits through 31 December 2016 when we also require an OCO2CLD-LIDAR-AUX retrieval.

Cloud classifier data selection and training
For the first step of rapidly identifying footprints that contain liquid clouds over the ocean we select a machine learning classifier which is trained on a set of collocated MODIS-CALIPSO footprints before being validated against an independent set of MODIS-CALIPSO data. The footprints which pass this classifier will be forwarded to the LUT estimator to generate the a priori cloud property estimate.
We generated independent sets of training (N = 100 000) and validation (N = 250 000) footprints by randomly selecting orbits and taking all their valid footprints until we had those sample sizes. We assign a cloud flag value of 1 to a footprint when the following conditions are all met, else it is 0: i. CALIPSO Feature_Classification_Flag = 2 (cloud present and liquid) ii. CALIPSO (retrieves a single layer) iii. MODIS Cloud_Optical_Thickness > 1 (cloud present and sufficiently optically thick).
As input we take the continuum radiances (I c ) from all three OCO-2 bands and correct for illumination geometry via µ −1 0 I c , where µ 0 = cos(SZA) plus a number of A-band ratios described below. From Python's "sklearn" package (Pedregosa et al., 2011) we selected a multi-layer perceptron network (sklearn.neural_networks.MLPClassifier) with hidden layer sizes of (100, 50, 25). These selections are justified in Sect. S1 in the Supplement.
In these bands the ocean is dark and reflectance increases monotonically with τ , so the µ −1 0 I c helps to identify optically thick clouds. Ice also absorbs more strongly than water in the higher wavelength bands, which aids phase discrimination.
We calculate A-band absorption ratios by dividing a noncontinuum (i.e. absorbing) channel radiance I abs, O 2 by I c, O 2 . Clouds tend to increase I abs, O 2 /I c, O 2 values, since photons scattered from the clouds encounter fewer O 2 molecules than those that travel all the way to the surface. This principle has been exploited to improve detection of clouds over bright snow and ice surfaces with the A-and B-band channels of the Earth Polychromatic Imaging Camera (EPIC) on board the Deep Space Climate Observatory (DSCOVR) (Zhou et al., 2020). Consider where k O 2 (z) is the O 2 absorption coefficient, which is then integrated over the photon path dz. Begin by considering a δ-function distribution of photon path lengths along the beam that is scattered from a single layer with constant k O 2 (z) = k O 2 , then at nadir the path can be decomposed into the path from the top of the atmosphere (TOA) to the layer top, µ −1 0 z, and from the layer top to TOA z making k z the subject We take the right-hand side of Eq.
(3) as our observable. If we select channel combinations with near-constant k O 2 , then the observable is proportional to z. Lower values should be associated with high (i.e. more likely ice) clouds, and high values are associated with clear scenes. This assumes similar scattering properties for the I abs, O 2 and I c, O 2 , which is justified by the A band's small wavelength range. The k O 2 sampled by individual channels varies for three main reasons: 1. The central wavelength of each channel depends on the cross-track position due to the way in which the optics focus light on the FPA.
2. The wavelengths sampled change due to Doppler shift induced by relative Earth-satellite and Earth-Sun motion.
3. The strength of O 2 absorption varies due to line broadening induced by atmospheric conditions.
We use a method from Richardson et al. (2017) to address these factors. The 853 undamaged channels are ranked from brightest to darkest, and a non-overlapping 10-channel mean is taken, resulting in 85 full "super channels". These are combined with I c, O 2 and µ 0 using Eq. (3), and we selected every 10th super channel from the 35th onwards (Sect. S1 shows little improvement from additional super channels). This is illustrated in Fig. 1, with Fig. 1a showing an example cloudy spectrum and the damaged channels, Fig. 1b showing the ranked super channels and those used in the classifier, and Fig. 1c comparing I abs, O 2 /I c, O 2 for the original spectrum (CALIPSO P top = 827 hPa) and for a spectrum with similar µ 0 and I c, O 2 but with CALIPSO P top = 403 hPa. The brightest super channels show little response to scattering layer altitude, so they contain little information, and they are excluded from the classifier. The higher-altitude cloud has a brighter I abs, O 2 due to the shorter mean path length. As stated previously, this aids in the phase classification and also in discriminating between cloudy and clear scenes, since very low I abs, O 2 /I c, O 2 is more likely associated with photons scattered from the surface.

Radiative-transfer simulations and ReFRACtor interface
The forward RT simulations used to generate the LUTs are performed with the ReFRACtor RT code, which implements (c) Comparison of the ranked ratio I /I c between this cloud, and one at a higher altitude (P top = 740 hPa). The SZA is within 0.01 • , and the continuum radiance is within 1 % between each spectrum; the differences are largely due to the shorter path length resulting in less absorption for the higher-altitude cloud.  (Natraj and Spurr, 2007;Spurr, 2006). This assumes a plane-parallel atmosphere with a correction to the direct beam to account for Earth's sphericity. Angular output is calculated with eight streams for predefined bins in gas optical depth, while single-stream calculations are done for pre-selected wavenumbers at a mean separation of ν ∼ 0.04 cm −1 , with smaller separation within absorption bands. The high-and low-stream outputs are combined using O'Dell's (2010) low-stream interpolation to rapidly and accurately reproduce high-stream output at all wavenumbers. These are then interpolated onto a uniform ν = 0.01 cm −1 grid and convolved with the instrument line shapes (ILSs) to obtain channel radiances. The selected numbers of streams were found to reproduce cloudyscene radiances given MODIS and CALIPSO cloud properties (Richardson et al., 2017) and also match the selection in Vidot et al. (2009). OCO2CLD-LIDAR-AUX used OCO-2 L2FP RT and required the input of L1bSc and meteorology files plus a file containing pressure level and cloud information. Each footprint's output was saved to a file for every OE iteration, adding to a read-write bottleneck. Further inefficiency arose if any footprint in an orbit included a type of scatterer (e.g. water clouds with r e = 10 µm, which we term wc_010), as its scattering properties had to be assigned for every profile in the orbit. For example, if one footprint contained a wc_010 cloud, every other footprint in the orbit that did not contain a wc_010 cloud would need an assigned wc_010 profile with extinction = 0.
Here we use the new ReFRACtor, which handles footprints as individual objects. Inputs are assigned uniquely to that object, and it stores the RT output and updated properties internally, so no external reading or writing is required for intermediate OE iterations.
For LUT input we take an ocean footprint near 25 • S from the L1bSc file for orbit 16094a on 11 July 2017 for instrument and satellite properties, although we manually vary SZA. We used the mean OCO-2 cloudy profiles for tropical (20 • S-20 • N) footprints from Richardson and Stephens (2018). The high-latitude case is excluded, as its surface temperature is near 0 • C, so it will mostly represent ice and mixed-phase clouds, and using the midlatitude (20-50 • S or N) case had little effect on the retrieval statistics.
The RT code takes input on levels and then linearly interpolates to generate vertically homogeneous layers. We use 16 pressure levels: 3 assigned linearly in P from TOA to 500 hPa, 10 from 500 hPa to P top and 2 from P top to cloud bottom (P bottom ), and the final level is the surface. This was found to reliably reproduce OCO-2 L2FP RT standard outputs, which use 20 levels but with faster processing.
The cloud extinction is assigned to the level at the cloud centre, whose neighbouring levels are at P top and P bottom , and the layer interpolation results in a vertically homogeneous cloud with constant τ (z) and r e (z). Rozanov and Kokhanovsky (2004) showed that a vertically uniform assumption may introduce radiation biases, relative to our target marine boundary layer clouds, which tend to be vertically stratified with increasing extinction towards the cloud top Grosvenor et al., 2018;Painemal and Zuidema, 2011), but quantifying such a bias requires extensive testing that we intend to perform separately. For now, the cloud H is calculated as in Szczodrak et al. (2001), where H ∝ √ τ r e and is converted to P by assuming z/ P ≈ 10 m hPa −1 . Where this would result in P bottom > P surf , the cloud is compressed while maintaining the same P top .
For surface reflection ReFRACtor does not currently allow for a Cox-Munk surface, so we assume it is Lambertian with albedo that varies by band and SZA. The band-and SZAdependent values are derived from a set of OCO-2 radiances as described in Sect. S2 and range from 0.010 to 0.054.
Gaseous absorption is from the absorption coefficient (ABSCO) version 5.0 tables used in OCO-2's latest XCO 2 retrieval, version 9. These tables account for line changes due to temperature, pressure and water vapour. Cloud properties are pre-calculated using Mie theory at integer micrometre values of r e following an assumed gamma droplet size distribution with width parameter of γ = 1/9. This follows the standard OCO-2 XCO 2 retrieval aerosol input file but with an update to correct an error in water absorption in the CO 2 bands (Aronne Merrelli, personal communication, 2019).

Lookup table development and retrieval
The LUT is designed to produce prior cloud property estimates for our future OE retrieval, which specifically targets marine boundary layer clouds and aims to provide additional information about their H or N d . We therefore limit the range of the LUT properties to cover the majority of these clouds, with properties τ from 1 to 50, r e from 4 to 32 µm and P top from 650 to 970 hPa and SZA spanning 20-45 • inclusive (see Table 3 in the Supplement for selected values). The simulated outputs are I c, O 2 in the O 2 A band, I c, st in the strong CO 2 band and an A-band ratio of I abs, O 2 /I c, O 2 .
We take the mean of 5 channels for each of I c, O 2 and I c, st and 10 channels for I abs, O 2 , and fixed channel indices are required to consistently convert the RT-simulated spectra into LUT radiances. The selected channels minimise the root mean square error (RMSE) across a large sample of footprints against the L1bSc continua (for I c, O 2 and I c, st ) and the 60th super channel for I abs, O 2 (as defined in Sect. 2.2). The 60th super channel is picked, as it showed the greatest sensitivity to CALIPSO P top in Richardson et al. (2017). The selection algorithm is described in Sect. S4; the error statistics are in Table 4 in the Supplement; and the channel indices are in Table S5. The error statistics show that our selection is valid for a range of meteorological conditions, illumination geometries, Doppler shifts and for all eight cross-track sounding positions.
The LUT channels are highlighted in Fig. 2, which shows mean spectra from a large sample of cloudy footprints. The channel means with 2σ ranges are shown as shaded bands and are compared with the truth as solid lines. The truth for I c is the mean of the sample L1bSc radiance continua, which represent the brightest channels in each footprint and whose channel indices may change with the footprint, while the I abs, O 2 truth is the spectrum's 60th super channel. The estimators are consistent with the truth in each case, with the best agreement for I c, O 2 and a negative bias in I c, st . We found that scaling the L1bSc I c, st value by 0.9804 resulted in similar error statistics to using our selected channels, so we use scaled L1bSc I c, st in our LUT retrieval, since those radiances are already loaded for the classifier. The individual I abs, O 2 channels show a large spread, but the channel selection algorithm accounts for anti-correlation in their radiances such that the 10-channel mean is consistent with the 60th super channel across all test footprints.
For each SZA, ReFRACtor is run for all combinations of input cloud properties to generate A-band and strong-CO 2band radiances for these selected LUT channels. A LUT is generated at each 5 • in SZA from 20 to 45 • inclusive, and the retrieval works as follows: i. From an L1bSc file, load the SZA plus radiances to get I abs, O 2 , I c, O 2 and I c, st .
ii. Apply the classifier to identify appropriate cloudy footprints; pass only these to the next step.
iii. Convert these into the LUT observables I c, O 2 , I c, st and I abs, O 2 /I c, O 2 .
iv. Scale observables onto the nearest LUT SZA using the appropriate µ-related scaling.
v. Interpolate within the LUT to simultaneously estimate τ , r e and P top .
If the observed radiances are outside the LUT values, then a NaN (not a number) is returned, and the footprint is flagged as not retrievable. The footprint is also flagged as likely to contain ice if L2Met T (P top,retrieved ) < 0 • C. We refer to NaN or T top < 0 • C outputs as not being passed by the LUT, since these footprints will not be attempted in our future OE retrieval.

Pre-processor prior validation
The pre-processor is run on the 3907 orbits used in OCO2CLD-LIDAR-AUX  1 264 449). The primary analysis is in the pairwise differences between the LUT retrieved properties and MODIS τ or r e and CALIPSO P top . The MODIS P top is also evaluated against that of CALIPSO.

Comparison with the OCO2CLD-LIDAR-AUX pre-processor
The OCO2CLD-LIDAR-AUX matchups are separated into three sets: those that are not flagged by the classifier, those that are flatted but do not pass the LUT retrieval (due to outof-range cloud properties or implied T top < 0 • C), and those that fully pass the pre-processor. Throughput and agreement are compared with the OCO2CLD-LIDAR-AUX cloud flag, retrieval χ 2 , retrieved τ and P top discrepancy vs. CALIPSO. The pre-processor performs well if it successfully passes those footprints with small posterior χ 2 and P top discrepancy while avoiding those with larger values. The OCO2CLD-LIDAR-AUX cloud flag was based on simple thresholds in µ −1 0 I c, O 2 and µ −1 0 I c,wk combined with a phase discrimination based on their combination, and finally there is a requirement for valid CALIPSO single-layer clouds with P top > 680 hPa occurring within 10 km. This flag did not have an SZA cutoff at 45 • , so we will also specifically consider comparisons between the outputs of the two pre-processors where SZA < 45 • .
These are normalised such that TP+FP+FN+TN = 100 %. These can be summarised in a confusion matrix, as is done in Fig. 3a for the N = 250 000 non-training sample. Its trace is the accuracy score of 90.0 %, and the off-diagonal elements represent potential misclassifications. Figure 3b shows that the FNs are largely clouds of lower MODIS τ than those identified by the classifier, with 29.4 % of FNs having MODIS τ < 3, compared with 7.3 % of TPs. Some of these "missed" clouds may be due to collocation error; for example a cloud may average τ > 1 over the 1 km MODIS footprint but not over the larger OCO-2 footprint. The classifier will also have errors: it maximises the accuracy score, and detecting lower τ clouds may require passing darker scenes, which could increase the prevalence of FPs. Figure 3c shows the distribution of CALIPSO T top where retrieved and shows far more cold-topped clouds in the FP case compared with the TPs, although there is also a T top < 0 • C peak in the FN case. This suggests that the classifier misidentifies some ice clouds as liquid and also that some of the FNs may in reality be mixed-phase clouds that CALIPSO has nevertheless identified as liquid. For example, 24.6 % of FNs have T top < −10 • C, compared with 7.6 % of the TP sample.
Among the false positives, we expected that there would be a larger occurrence of broken or multi-layered clouds, where thick broken clouds were sufficiently bright to trigger detection or where overlying thin ice clouds have too little effect on the radiances to be flagged as ice. We describe a scene as broken when the MODIS partially cloudy retrieval exists (Cloud_Optical_Thickness_PCL > 0) and a scene as multi- layered when CALIPSO retrieves more than one cloud layer, although strictly this can only detect multiple layers when the upper layer does not fully attenuate the lidar. While 11.3 % of the full sample is multi-layered, 40.1 % of the FP cases are, and while 12.2 % of scenes are partially cloudy, 30.4 % of FP footprints are. Overall, 69.4 % of FPs are associated with multi-layer or broken clouds or both. variance is explained by P top , with spread largely due to changes in within cloud scattering. For example, for P top = 970 hPa, the optically thickest clouds were artificially compressed to prevent them from extending below the surface, thereby reducing the in-cloud path and increasing the maximum I abs, O 2 /I c, O 2 .

Lookup table matchup performance
The OCO-2 LUT retrievals are compared with those of MODIS and CALIPSO in Fig. 5a-c, and the MODIS and OCO-2 P top differences relative to CALIPSO are in Fig. 5d. We consider the OCO-2 value minus the other product's value and report median [10th, 90th] percentiles instead of standard deviation, as these distributions are commonly non-Gaussian. There is good correlation between OCO-2 and other products, with a τ difference of 0.77 [−3.77, 6.93], an r e difference of −0.25 [−2.78, 2.13] µm and a P top difference of 12 [−11, 87] hPa. As can be seen in Fig. 5d, the LUT P top retrieval outperforms that of MODIS, whose difference relative to CALIPSO is −17 [−83, 81] hPa; i.e. the OCO-2 inter-decile range is approximately 40 % smaller than that of MODIS.
We also divide the τ and r e differences by the MODIS reported uncertainty (σ τ,MODIS , σ re,MODIS ). If the OCO-2 and MODIS retrievals were independent Gaussian with equal variance, then the standard deviation of OCO-2-MODIS differences would be √ 2 ≈ 1.41σ MODIS . We find values of 1.26σ τ,MODIS and 0.37σ re,MODIS , indicating that the r e retrievals are not independent and that our differences are within the MODIS-reported uncertainties.
We acknowledge discrepancies in the median retrieved τ and r e and refer to these as biases. The τ bias grows both with OCO-2 retrieved τ and with the horizontal variability of the scene as displayed in Fig. 6. For this figure, the samples were split into deciles according to the LUT retrieved τ or the MODIS sub-pixel index (SPI) at λ = 0.66 µm, which is the standard deviation of the 250 m footprint radiances with a 1 km cloud retrieval divided by the mean of those radiances. Spatial variability and greater optical depths appear to drive much of the τ bias, but we could not identify a dominant factor consistently correlated with the small r e bias. These issues are further discussed in Sect. 4.2.

Pre-processor throughput
The multi-layer perceptron classifier passes 5.5 % of all OCO-2 footprints as τ > 1 liquid clouds, of which 0.9 % return invalid cloud properties from the LUT, and a further 0.8 % have implied T top < 0 • C, resulting in a final throughput of 3.8 %. This is smaller than OCO2CLD-LIDAR-AUX, which attempted to retrieve 14.1 % of all soundings. However, most of the difference is due to SZA, and when we restrict the denominator to all footprints with SZA < 45 • , the throughputs are 13.1 % for OCO2CLD-LIDAR-AUX and 11.7 % for the new classifier or 8.1 % after the LUT thresholds. Figure 7 displays histograms of selected OCO2CLD-LIDAR-AUX outputs for SZA < 45 • retrievals split into footprints where the new pre-processor passes the footprint   (blue), where the LUT returns invalid properties or T top < 0 • C (orange), or where the classifier does not pass the footprint (green). The new classifier identifies "better" retrievals that ended with smaller fit errors: median χ 2 = 7.2 × 10 −7 vs. 1.3×10 −4 for those not passed (among those with SZA < 45 • ). The LUT filtering further improves the statistics, with median χ 2 = 9.8 × 10 −7 for those not passed by the LUT retrieval vs. χ 2 = 6.6 × 10 −7 for those successfully retrieved with T top > 0 • C. The perceptron network also tends to pass clouds that are more optically thick (median τ = 8.6 vs. 2.4) and to show a smaller spread in the difference between OCO-2 and CALIPSO P top (standard deviation of differences of σ = 33 hPa vs. 55 hPa).
The OCO2CLD-LIDAR-AUX footprints that are excluded by the new pre-processor are consistent with optically thinner clouds and with poorer quality retrievals. Among the footprints that are passed by the new pre-processor, 17.1 % were not attempted in OCO2CLD-LIDAR-AUX.

Cloud classifier and pre-processor throughput
The cloud classifier's agreement of 90.0 % with MODIS-CALIPSO is similar in performance to the original OCO-2 operational cloud flagging for ocean glint used in the L2FP XCO 2 retrieval (Taylor et al., 2016). Furthermore, the multilayer perceptron network is lightweight (size < 250 kB) and fast. It throughputs 11 %-13 % of ocean soundings where SZA < 45 • , of which under a quarter are poor retrieval candidates according to MODIS-CALIPSO. These cases are consistently (∼ 69 %) broken or multi-layered cloud scenes, while the missed MODIS-CALIPSO cloud scenes are commonly optically thinner (∼ 4 times likelier to be τ < 3) and colder (∼ 3 times likelier to have T top < −10 • C) than the hit cloud scenes. These thinner and colder samples are also likely to be poor candidates for our target future retrieval of droplet number concentration in warm-topped clouds.
Applying the LUT retrieval further reduces the number of footprints that are taken to be liquid clouds with τ > 1. The OCO2CLD-LIDAR-AUX retrieval attempted 13.1 % of SZA < 45 • footprints; the new classifier LUT pre-processor passes 8.1 %. Figure 7 shows that the excluded footprints tended to be more optically thin, have a larger discrepancy in retrieved P top relative to CALIPSO and have a higher χ 2 . This suggests that the new pre-processor will pass better retrieval candidates to the OE code, thereby improving efficiency. Of those that are now passed, 17 % were not passed by OCO2CLD-LIDAR-AUX. These likely include cases of misidentification that will result in poor-quality retrievals but may also include true cloud cases that were not identified in OCO2CLD-LIDAR-AUX. For example, retrievals were previously classified using the nearest CALIPSO footprint up to 10 km away, and if a cloud was in the OCO-2 field of view but not the CALIPSO field of view, it would not previously have been passed. Overall, the new pre-processor shows good performance in terms of identifying scenes which likely contain liquid clouds with sufficient τ .

Lookup table cloud property retrieval
The LUT retrieval shows good correlation with MODIS τ and r e plus CALIPSO P top in Fig. 5. Compared to CALIPSO, the LUT-based P top retrievals have a smaller-magnitude bias and 40 % smaller inter-decile range than MODIS. The 12 hPa P top bias represents OCO-2 retrieved clouds that are lower in the atmosphere than retrieved by CALIPSO. These statistics may include cases of broken clouds, either above a lower Figure 7. Normalised histograms of OCO2CLD-LIDAR-AUX outputs where SZA < 45 • , separated into whether the soundings pass the new pre-processor flag and retrieval or not. The "passed both" set are those that returned valid cloud properties from the LUT along with T top > 0 • C; the "passed-classifier" case gave invalid cloud values or had T top < 0 • C; and the "failed-classifier" set are those that were attempted in OCO2CLD-LIDAR-AUX but are not passed by the new classifier. (a) Logarithm of χ 2 ; (b) retrieved τ ; (c) P top minus the closest CALIPSO retrieved P top . cloud or above the surface; 3-D cloud effects or combined scattering from multiple cloud layers could lead to longer mean photon path lengths and thereby a larger OCO-2 P top , assuming that CALIPSO tends to identify the highest layer. We consider full 3-D radiative-transfer treatments to be beyond the scope of this study but point readers to a wide literature on this topic (Davis and Knyazikhin, 2005;Heidinger and Stephens, 2002;Kokhanovsky et al., 2007;Várnai and Marshak, 2002).
Aerosol is ignored in these simulations, as previous analysis using CALIPSO aerosol products showed no change in OCO2CLD-LIDAR-AUX P top bias in response to CALIPSO-identified aerosol (Richardson et al., 2019). Furthermore, above-cloud scattering aerosol would tend to reduce photon path length and therefore have an opposite effect on P top to our observed bias.
Retrieved P top could also change due to the assumed cloud vertical structure and meteorological profile used in the LUT development. If the cloud vertical structure used in the RT differs from reality, then this could lead to incorrectly simulated within-cloud photon paths. Firstly, if the simulated cloud is too geometrically thin (low H ) for a given τ or r e , then the within-cloud path length will be too small, and the above-cloud path must be lengthened to compensate, resulting in a positive P top bias and vice versa for too-high simulated H . This study improves on the OCO2CLD-LIDAR-AUX prior realistically varying H with r e in addition to τ , but a bias may remain. In particular, shallow marine clouds tend to have extinction weighted towards the top, which affects the exiting radiance and may introduce P top biases which vary with geometry and cloud properties. We intend to perform a separate and more detailed analysis of how realistic vertical cloud profiles affect simulated OCO-2 radiances and determine how to account for such a vertical-structure bias.
With regards to meteorology, a warmer and moister profile broadens the O 2 absorption lines, and we expect stronger resultant absorption in the selected I abs, O 2 channels. Our tropical meteorology may lead to too-strong absorption in nontropical scenes such that the retrieved cloud is lifted (i.e. lower P top ) to compensate, but the observed bias is opposite to this. We also retrieved using a LUT developed with the Richardson and Stephens (2018) midlatitude meteorology, where surface temperature is approximately 10 • C cooler. The retrieved P top distribution shifts as expected with median P top bias increasing from 12 to 15 hPa.
Overall the OCO-2 LUT gives better P top retrieval statistics than MODIS for these shallow marine clouds, where for these clouds MODIS retrievals rely on brightness temperature at λ ∼ 11 µm and so may misassign P top when a temperature inversion is present (Baum et al., 2012). However, OCO-2 has a larger footprint and smaller swath and only retrieves during nadir-view orbits. The P top bias relative to CALIPSO is concerning for a future optimal-estimation retrieval, since biased prior properties may subsequently bias the posterior retrieved state in unpredictable ways. We confidently exclude aerosol and meteorology as the main factors in the observed bias and propose that the main candidate processes are a combination of horizontal variability, OCO-2-CALIPSO collocation error and potentially vertical cloud structure. In the future OE retrieval we would expect horizontally non-uniform clouds to produce spectra that are more difficult to match under our RT assumptions, so such cases may be identified by the posterior χ 2 statistics. For vertical structure biases, we plan a detailed future investigation.
Retrieved τ and r e show good correlation with those from MODIS, and the variance of the differences is smaller than implied by the MODIS-reported uncertainties if the LUT and MODIS uncertainties are independent Gaussian with MODIS' reported variance. However, OCO-2's I c, O 2 instrumental noise is lower than MODIS' (single-channel signalto-noise ratio -SNR -of 300-1200 vs. the MODIS-band-4specified SNR of 228), so the instrumental uncertainty contribution to the error budget should be smaller for OCO-2. There are also common characteristics between the retrievals, such as the use of fixed droplet size distribution variances, so individual footprint error will covary between the two. Such covariance should further reduce the intersatellite difference in retrieved τ and r e . A quantitative analysis would require a thorough calculation controlling for individual terms in the error budget; we simply conclude that there is no evidence of substantial unexpected variance in our retrieved τ and r e .
Of greater concern is the residual OCO-2 minus MODIS differences of 0.77 in τ and to a lesser extent −0.25 µm in r e . For τ the bias increases both with horizontal inhomogeneity and with τ , and we expect to be able to identify these clouds scenes using retrieved τ and either OCO-2 developed metrics of spatial variability or future retrieval χ 2 .
For the r e bias we briefly assessed several factors. Horizontal variability tends to increase retrieved r e (Werner et al., 2018;Zhang et al., 2012), but we found no evidence of a strong dependence on spatial variability according to MODIS SPI. We also ran the LUT retrieval with a −1 % scaling of I c, st , which changes median r e by +0.2 µm. Such a radiance shift could be necessary due to errors in calibration or in our derived scaling factor of 0.9804, which we used to relate the L1bSc file I c, st to our lookup table channels. We could therefore reduce our r e bias by further scaling the I c, st radiances, but the scaling was derived from directly comparing the channel radiances rather than as a post hoc correction to improve retrieval results. If the r e bias is due to other factors, then this post hoc correction could result in compensating errors which hide other flaws in the retrieval. Instrumentally, the MODIS band 7 used in these r e retrievals begins at λ = 2.105 µm, outside the strong CO 2 band. Changes in CO 2 or, more likely, temperature-and vapour-driven broadening or vapour absorption could affect retrieved r e . When retrieving with the midlatitude profile LUT described above, the median retrieved r e increases by 0.17 µm. Given that the r e discrepancy is small, we make no further efforts to explain or reduce it.

Summary and conclusions
Here we developed a new pre-processor for a future optimalestimation retrieval using the OCO-2 A-band to provide new estimates of droplet number concentration in marine water clouds. This future retrieval aims to address limitations in the previously published OCO2CLD-LIDAR-AUX product by (1) removing the requirement for collocated CALIPSO data now that the satellites are no longer formation-flying and (2) adding OCO-2 information about r e to extend the analysis to droplet number concentration. The pre-processor must identify footprints that likely contain liquid clouds of sufficient τ and provide prior properties for the future cloud retrieval. It may also be useful for identifying appropriate footprints on which other researchers could conduct partialcolumn XCO 2 retrievals.
The pre-processor first flags potentially cloudy scenes using a multi-layer perceptron network fed with continuum radiances across all three OCO-2 bands plus a set of absorption band radiances from the O 2 A band. The next stage of the retrieval is to use a 3-D lookup table that that jointly retrieves τ , r e and P top using radiances from two bands plus an A-band absorption ratio. Footprints whose radiances are inconsistent with the lookup table or whose implied P top occurs where T < 0 • C can also be excluded from future retrievals. These footprints were associated with worse fit statistics in OCO2CLD-LIDAR-AUX, implying that the new pre-processor will minimise the waste of computational resources on poor-quality retrievals.
This pre-processor flag shows excellent agreement with MODIS and OCO-2, and the lookup table τ and r e compare well with MODIS, while its P top shows better retrieval statistics than MODIS, when taking CALIPSO as the truth. Many of the inter-satellite differences are associated with known factors: false positives from the classifier occur when scenes contain broken or multi-layered clouds, and the τ retrieval bias grows with the horizontal heterogeneity of the scene.
A main concern is that the median OCO-2 retrieved P top is closer to the surface than CALIPSO's by approximately 12 hPa (∼ 120 m). The assumed mean cloud extinction or its profile will affect photon paths lengths and so could introduce a bias in retrieved P top , and we propose that a detailed analysis of cloud vertical structure is the next and final step before the development of a new OCO-2 cloud retrieval. If successful, this new retrieval would add independent information on cloud droplet number concentration, allowing for attempts to resolve apparent disagreements about low-cloud processes.
Author contributions. MR contributed to study design, ran the analyses and wrote the paper. JM set up ReFRACtor for cloudy-scene simulations and provided technical support. MDL and GLS contributed to the study design, analysis and drafting of the paper .
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. This work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Government sponsorship is acknowledged. Mark Richardson gratefully acknowledges Aronne Merrelli for his updated cloud scattering property tables that corrected an error in CO 2 band absorption and removed resultant biases in retrieved r e . Mark Richardson also thanks Annmarie Eldering and Michael Gunson for helpful input related to the OCO-2 mission.
Financial support. This research has been supported by the National Aeronautics and Space Administration, Jet Propulsion Laboratory (grant no. 80NM0018D0004-80NM0018F0631).
Review statement. This paper was edited by Alexander Kokhanovsky and reviewed by two anonymous referees.