A new OCO-2 cloud flagging and rapid retrieval of marine boundary layer cloud properties

. The Orbiting Carbon Observatory-2 (OCO-2) carries a hyperspectral A-band sensor that can obtain information about cloud geometric thickness ( H ). The OCO2CLD-LIDAR-AUX product retrieved H with the aid of collocated CALIPSO lidar data to identify suitable clouds and provide a priori cloud-top pressure ( P top ). This collocation is no longer 10 possible since CALIPSO’s coordination flying with OCO-2 has ended, so here we introduce a new cloud flagging and a priori assignment using only OCO-2 data, restricted to ocean footprints where solar zenith angle < 45°. Firstly, a multi-layer perceptron network was trained to identify liquid clouds over ocean with sufficient optical depth ( t > 1) for a valid retrieval, and agreement with MODIS-CALIPSO is 90.0 %. Secondly, we developed a lookup table to simultaneously retrieve cloud t , effective radius ( r e ) and P top from A-band and CO 2 band radiances, with the intention that these will act as the a priori in a 15 future retrieval. Median P top difference versus CALIPSO is 12 hPa with interdecile range [-11,87] hPa, substantially better than the MODIS-CALIPSO [-83,81] hPa. The MODIS-OCO-2 t difference is 0.8 (-3.8,6.9) and r e is -0.3 [-2.8,2.1] µ m. The t difference is due to optically thick and horizontally heterogeneous cloud scenes. As well as an improved passive P top retrieval, this a priori information will allow a purely OCO-2 based Bayesian retrieval

wavelengths where O2 absorbs. This leads to spectrally varying changes in observed A-band spectra that can allow joint 30 retrievals of cloud optical depth ( ), cloud top pressure (Ptop) and H, provided there is sufficient spectral resolution and low enough noise (O'Brien and Mitchell, 1992;Richardson and Stephens, 2018). Equivalently, with knowledge of a relevant effective radius (re), the Nd could in principle be retrieved along with and Ptop.
The basic principle of A-band absorption for cloud height is well established (Fischer and Grassl, 1991;Rozanov and Kokhanovsky, 2004;Yamamoto and Wark, 1961) and numerous spaceborne A-band instruments retrieve cloud properties 35 (Koelemeijer et al., 2001;Kokhanovsky et al., 2005;Lindstrot et al., 2006;Loyola et al., 2018;Preusker et al., 2007;Vanbauce et al., 1998), but most lack the spectral resolution or noise characteristics to obtain H (e.g. Schuessler et al. (2014)). Others rely on multi-angle (Ferlay et al., 2010) or combined A-and B-band information (Yang et al., 2013), although these tend to contain little information on low-altitude and relatively thin clouds like marine stratocumulus (Davis et al., 2018;Merlin et al., 2016). 40 An OCO-2 based retrieval of , Ptop and H has been developed (OCO2CLD-LIDAR-AUX, available at www.cloudsat.cira.colostate.edu/data-products/level-aux/oco2cld-lidar-aux), which uses lidar-based retrievals from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite to help identify cloudy scenes and constrain prior Ptop (Richardson et al., 2019). This retrieval is targeted at single-layer liquid clouds over the ocean whose response, both to warming and aerosols, are a major source of uncertainty in climate simulations (e.g. Bony et al. (2005); 45 Bodas-Salcedo et al., (2019); Zelinka et al. (2020)). Independent information about cloud structure may help to address timely questions where other sensors which rely on different retrieval approaches and assumptions can lead to apparently contradictory conclusions (Rosenfeld et al., 2019;Toll et al., 2019).
With CALIPSO leaving the A-Train constellation in 2018, collocation between OCO-2 and CALIPSO footprints is no longer possible. Our future retrievals require a new cloud flagging method plus a priori cloud top information for our iterative 50 Bayesian optimal estimation (OE) retrieval (Rodgers, 2000). This paper describes a new pre-processor for OCO-2 based liquid cloud property retrievals that provides the requisite cloud flagging and a priori information. Details of OCO2CLD-LIDAR-AUX are summarised in Table 1, which also lists the main changes introduced in this study.
We do not use the published OCO-2 cloud flag as it was not developed for ocean nadir scenes (Taylor et al., 2016), since they were considered too dark for OCO-2's main mission of column CO2 (XCO2) retrievals (Crisp, 2008;Crisp et al., 2004;55 Eldering et al., 2016). Therefore we train a multi-layer perceptron network to rapidly identify liquid cloud scenes using collocated CALIPSO and Moderate Resolution Imaging Spectroradiometer (MODIS) retrievals. For the prior cloud property retrieval we develop lookup tables (LUTs) that jointly retrieve , re and Ptop using OCO-2 O2 A-band and strong CO2 band (l ~ 2.06 µm) radiances. These are similar to the Nakajima-King tables used in MODIS cloud retrievals (Nakajima and King, 1990) but add an A-band absorption ratio that is sensitive to Ptop. 60 Our OCO-2 OE retrievals are computationally expensive due to the complex radiative transfer (RT), so we aim to avoid footprints which are unlikely to yield good retrievals. The cloud flagging and prior LUT retrieval developed here are a necessary step in excluding these footprints, and we further exclude those where solar zenith angle, SZA > 45° based on OCO2CLD-LIDAR-AUX's retrieval statistics. It is possible that a future partial-column (i.e. above cloud) XCO2 retrieval could be developed, which would likely be targeted at columns above optically thick clouds, so the pre-processor developed 65 here could find wider use (Schepers et al., 2016;Vidot et al., 2009). A further development is that our past retrievals used a fixed re and the addition of varying re is eased by a new Python RT interface using the ReFRACtor (Reusable Framework for the Retrieval of Atmospheric Composition) software described in Section 2.3. Our new LUT retrieval of a prior re will allow a more appropriate re to be assumed in the iterative OE.
The paper is organised as follows: Section 2 describes the relevant OCO-2 details, data selection and radiative transfer 70 calculations before detailing the methodology. Section 3 reports the performance statistics of the classifier, compares LUT retrieved cloud properties versus MODIS and CALIPSO where the instrument footprints overlap, and compares the final pre-processor throughput against that of OCO2CLD-LIDAR-AUX. Section 4 discusses and contextualises the results as well as proposing actionable future work that could address identified biases and Section 5 concludes.

Instruments and data selection
The OCO-2 measurement approach and instrumentation are detailed in Bösch et al. (2017), the L2FP RT's application to clouds in Richardson et al. (2017), and the MODIS-CALIPSO-OCO-2 matchup data are as used in Taylor et al. (2016). The datasets used here are listed in Table 2, in particular, from the OCO-2 Level 1b Science (L1bSc) data we obtain calibrated radiances and RT inputs such as solar zenith angle (SZA) and instrument characteristics. 80 The OCO-2 satellite flies in the Sun-synchronous A-Train constellation (L'Ecuyer and Jiang, 2010), and measures during the daytime ascending node with an equator crossing time near 1:30 pm. Its orbits are committed primarily to either glint or nadir view, and we use nadir only orbits. It carries three co-boresighted grating spectrometers centred over the O2 A-band (l~0.78 µm), weak CO2 band (l~1.68 µm) and strong CO2 band (l~2.06 µm).
The satellite operates in a pushbroom fashion with a swath of 8 footprints whose orientation relative to the track rotates 85 through the orbit as the satellite angles to optimise solar power generation. The subsequent parallelogram-like footprints are nominally near 1.4 km´2.2 km at nadir. The channels' wavelengths vary across the track due to the manner in which the optics focus light onto the focal plane array (FPA), and wavelength also drifts throughout an orbit due to Doppler shift. This causes issues for a LUT developed from a fixed set of channels, since the wavelengths sampled by those channels will differ between each measurement. Furthermore, some sensor pixels are damaged and we only include channel indexes where all 8 90 swath soundings are classed as good, which reduces the A-band sample from 1,016 to 853 channels. Section 2.2 describes how we use a channel averaging approach to reduce the consequences of this wavelength shift in the cloud classifier and Section 2.4 our related channel selection for the LUT.
For classifier training and validation, we require spatial overlap between OCO-2, MODIS and CALIPSO data. The ascending OCO-2 ground track is approximately 200 km to the east of Aqua's and therefore within the MODIS swath, so we select the 1 km MODIS retrieval footprint whose centre is closest at the surface to the centre of the OCO-2 footprint.
However, CALIPSO only measures once at nadir so only one OCO-2 footprint in each swath can be collocated.
Furthermore, even during formation flying the satellites drifted within their control boxes and some CALIPSO measurements occurred outside the OCO-2 swath. We only include footprints with a CALIPSO-OCO-2 matchup distance of <1.5 km at the surface. Finally, the dataset was further restricted to footprints with surface type = water, SZA < 45° and with 100 valid radiances. Between 2014-09-06 and 2018-04-30 the MODIS-CALIPSO-OCO-2 matchup dataset has 5,909 nadir orbits of which 4,743 contain valid matchups. This is reduced to N = 3,907 orbits through 2016-12-31 when we also require an OCO2CLD-LIDAR-AUX retrieval.

Cloud Classifier Data Selection and Training
For the first step of rapidly identifying footprints that contain liquid clouds over the ocean we select a machine learning 105 classifier which is trained on a set of collocated MODIS-CALIPSO footprints before being validated against an independent set of MODIS-CALIPSO data. The footprints which pass this classifier will be forwarded to the LUT estimator to generate the a priori cloud property estimate.
We generated independent sets of training (N=100,000) and validation (N=250,000) footprints by randomly selecting orbits and taking all their valid footprints until we had those sample sizes. We assign a cloud flag value of 1 to a footprint when the 110 following conditions are all met, else it is 0: (i) CALIPSO Feature_Classification_Flag = 2 (cloud present and is liquid) (ii) CALIPSO retrieves a single layer (iii) MODIS Cloud_Optical_Thickness > 1 (cloud present and is sufficiently optically thick) As input we take the continuum radiances (Ic) from all 3 OCO-2 bands and correct for illumination geometry via µ0 -1 Ic where 115 µ0=cos(SZA), plus a number of A-band ratios described below. From Python's sklearn package we selected a multi-layer perceptron network (sklearn.neural_networks.MLPClassifier) with hidden layer sizes of (100,50,25). These selections are justified in Supplementary §1.
In these bands the ocean is dark and reflectance increases monotonically with t, so the µ0 -1 Ic help to identify optically thick clouds. Ice also absorbs more strongly than water in the higher wavelength bands, which aids phase discrimination. 120 We calculate A-band absorption ratios by dividing a non-continuum (i.e. absorbing) channel radiance Iabs,O2 by Ic,O2. Clouds tend to increase Iabs,O2/Ic,O2 since photons scattered from the clouds encounter fewer O2 molecules than those that travel all the way to the surface. This principle has been exploited to improve detection of clouds over bright snow & ice surfaces with the A-and B-band channels of the Earth Polychromatic Imaging Camera (EPIC) on board the Deep Space Climate Observatory (DISCOVR) (Zhou et al., 2020). Consider: 125 Where kO2(z) is the O2 absorption coefficient, which is then integrated over the photon path òdz. Considering a d-function distribution of photon path lengths along the beam that is scattered from a single layer with constant kO2(z) = kO2, then at nadir the path can be decomposed into the path from TOA to the layer top, 9 :; Δ , and from the layer top to TOA Δ : We take the right-hand side of Eq. (3)  The kO2 sampled by individual channels varies for three main reasons: The central wavelength of each channel depends on the cross-track position due to the way in which the optics focus light on the FPA, The wavelengths sampled change due to Doppler shift induces by relative Earth-satellite and Earth-Sun motion, 140 The strength of O2 absorption varies due to line broadening induced by atmospheric conditions.
We use a method from Richardson et al. (2017) to address these factors. The 853 undamaged channels are ranked from brightest to darkest and a non-overlapping 10-channel mean is taken, resulting in 85 full "super channels". These are combined with Ic,O2 and µ0 using Equation ( super channels show little response to scattering layer altitude, so they contain little information and they are excluded from the classifier. The higher altitude cloud has brighter Iabs,O2 due to the shorter mean path length. As stated previously, this aids 150 in the phase classification, and also to discriminating between cloudy and clear scenes since very low Iabs,O2/Ic,O2 is more likely associated with photons scattered from the surface.

Radiative Transfer Simulations and ReFRACtor Interface
The forward RT simulations used to generate the LUT are performed with the ReFRACtor RT code, which is based on LIDORT with a polarisation correction for low orders of scattering (Natraj and Spurr, 2007;Spurr, 2006). This code makes a 155 semi-infinite plane-parallel assumption, with a correction to the direct beam to account for Earth's sphericity. Angular output is calculated for a handful of wavelengths with 8 streams, with the rest of the spectrum interpolated using a single stream https://doi.org/10.5194/amt-2020-140 Preprint. Discussion started: 18 May 2020 c Author(s) 2020. CC BY 4.0 License. using the method of O'Dell (2010), which reliably reproduces the higher-stream output. This number of streams was found to reproduce cloudy scene radiances given MODIS and CALIPSO cloud properties (Richardson et al., 2017) and also matches the selection in Vidot et al. (2009). 160 OCO2CLD-LIDAR-AUX used OCO-2 L2FP RT and required input L1bSc and meteorology files plus a file containing pressure level and cloud information. Each footprint's output was saved to a file for every OE iteration, adding to a readwrite bottleneck. Further inefficiency arose as if any footprint in an orbit included a type of scatterer (e.g. water clouds with re = 10 µm, which we term wc_010), its scattering properties had to be assigned for every profile in the orbit. For example, if one footprint contained a wc_010 cloud, every other footprint in the orbit that didn't contain a wc_010 cloud would need an 165 assigned wc_010 profile with extinction = 0.
Here we use a new ReFRACtor, which handles footprints as individual objects. Inputs are assigned uniquely to that object and it stores the RT output and updated properties internally, so no external reading or writing is required for intermediate

OE iterations.
For LUT input we take an ocean footprint near 25 °S from the L1bSc file for orbit 16094a on 2017-07-11 for instrument and 170 satellite properties, although we manually vary SZA. We used the mean OCO-2 cloudy profiles for tropical (20 °S-20 °N) footprints from Richardson and Stephens (2018). The high latitude case is excluded as its surface temperature is near 0 °C, so will mostly represent ice and mixed-phase clouds, and using the midlatitude (20-50 °S/N) case had little effect on the retrieval statistics.
The RT code takes input on levels and then linearly interpolates to generate vertically homogeneous layers. We use 16 175 pressure levels: 3 assigned linearly in P from TOA to 500 hPa, then 10 from 500 hPa to Ptop, 2 from Ptop to cloud bottom (Pbottom), and the final level is the surface. This was found to reliably reproduce OCO-2 L2FP RT standard outputs which use 20 levels, but with faster processing.
The cloud extinction is assigned to the level at the cloud centre, whose neighbouring levels are at Ptop and Pbottom and the layer interpolation results in a vertically homogeneous cloud with constant t(z) and re(z). Rozanov and Kokhanovsky (2004) 180 showed that a vertically uniform assumption may introduce radiation biases, relative to our target marine boundary layer clouds which tend to be vertically stratified with increasing extinction towards cloud top (Bennartz, 2007;Grosvenor et al., 2018;Painemal and Zuidema, 2011), but quantifying such a bias requires extensive testing that we intend to perform separately. For now, the cloud H is calculated as in Szczodrak et al. (2001) where ∝ F H , and is converted to DP by assuming Dz/DP»10 m hPa -1 . Where this would result in Pbottom > Psurf, the cloud is compressed while maintaining the same 185 Ptop.
For surface reflection ReFRACtor does not currently allow for a Cox-Munk surface, so we assume Lambertian with albedo that varies by band and SZA. The band-and SZA-dependent values are derived from a set of OCO-2 radiances as described in Supplementary §2 and range from 0.010-0.054. https://doi.org/10.5194/amt-2020-140 Preprint. Discussion started: 18 May 2020 c Author(s) 2020. CC BY 4.0 License.
Gaseous absorption is from the absorption coefficient (ABSCO) version 5.0 tables used in OCO-2's latest XCO2 retrieval, 190 version 9. These tables account for line changes due to temperature, pressure and water vapour. Cloud properties are precalculated using Mie theory at integer micron values of re following an assumed Gamma droplet size distribution with width parameter g = 1/9. This follows the standard OCO-2 XCO2 retrieval aerosol input file, but with an update to correct an error in water absorption in the CO2 bands [Aronne Merrelli, pers. comm.].

Lookup table development and retrieval 195
The LUT is designed to produce prior cloud property estimates for our future OE retrieval, which specifically targets marine boundary layer clouds and aims to provide additional information about their H or Nd. We therefore limit the range of the LUT properties to cover the majority of these clouds, with properties t from 1-50, re from 4-32 µm and Ptop from 650-970 hPa, and SZA spans 20-45° inclusive (see Supplementary Table 3 for selected values). The simulated outputs are Ic,O2 in the O2 A-band, Ic,st in the strong CO2 band and an A-band ratio Iabs,O2/Ic, O2. 200 We take the mean of 5 channels for each of Ic,O2 and Ic,st and 10 channels for Iabs,O2, and fixed channel indices are required to consistently convert the RT simulated spectra into LUT radiances. The selected channels minimise the root mean squared error (RMSE) across a large sample of footprints against the L1bSc continua (for Ic,O2 and Ic,st) and the 60 th super-channel for Iabs,O2 (as defined in Section 2.2). The 60 th super channel is picked as it showed the greatest sensitivity to CALIPSO Ptop in Richardson et al. (2017). The selection algorithm is described in Supplementary  §4, the error statistics are in Supplementary 205 Table 4 and the channel indices in Supplementary Table 5. The error statistics show that our selection is valid for a range of meteorological conditions, illumination geometries, Doppler shifts and for all 8 cross-track sounding positions.
The LUT channels are highlighted in Figure 2, which shows mean spectra from a large sample of cloudy footprints. The channel means with 2s ranges are shown as shaded bands and are compared with the truth as solid lines. The truth for Ic is the mean of the sample L1bSc radiance continua, which represent the brightest channels in each footprint and whose channel 210 indices may change with footprint while the Iabs,O2 truth is the spectrum's 60 th super channel. The estimators are consistent with the truth in each case, with the best agreement for Ic,O2 and a negative bias in Ic,st. We found that scaling the L1bSc Ic,st value by 0.9804 resulted in similar error statistics to using our selected channels, so we use scaled L1bSc Ic,st in our LUT retrieval since it those radiances are already loaded for the classifier. The individual Iabs,O2 channels show a large spread, but the channel selection algorithm accounts for anti-correlation in their radiances such that the 10-channel mean is consistent 215 with the 60 th super channel across all test footprints.
For each SZA, Refractor is run for all combinations of input cloud properties to generate A-band and strong CO2 band radiances for these selected LUT channels. A LUT is generated at each 5° in SZA from 20-45° inclusive, and the retrieval works as follows: If the observed radiances are outside the LUT values then a NaN is returned and the footprint is flagged as not retrievable.
The footprint is also flagged as likely to contain ice if L2Met T(Ptop,retrieved) < 0 °C. We refer to NaN or Ttop < 0 °C outputs as 225 not being passed by the LUT, since these footprints will not be attempted in our future OE retrieval.

Pre-processor prior validation
The pre-processor is run on the 3,907 orbits used in OCO2CLD

Comparison with OCO2CLD-LIDAR-AUX pre-processor 235
The OCO2CLD-LIDAR-AUX matchups are separated into three sets: those that are not flagged by the classifier, those that are flatted but do not pass the LUT retrieval (due to out-of-range cloud properties or implied Ttop < 0 °C), and those that fully pass the pre-processor. Throughput and agreement are compared with the OCO2CLD-LIDAR-AUX cloud flag, retrieval c 2 , retrieved t and Ptop discrepancy versus CALIPSO. The pre-processor performs well if it successfully passes those footprints with small posterior c 2 and Ptop discrepancy while avoiding those with larger values. 240 The OCO2CLD-LIDAR-AUX cloud flag was based on simple thresholds in µ0 -1 Ic,O2 and µ0 -1 Ic,wk combined with a phase discrimination based on their combination, and finally a requirement for valid CALIPSO single-layer clouds with Ptop > 680 hPa occurring within 10 km. This flag did not have a SZA cutoff at 45°, so we will also specifically consider comparisons between the outputs of the two pre-processors where SZA < 45°.

Cloud classifier test statistics
As in Section 2.2 the classifier output is 1 when we expect a single layer liquid cloud with t > 1 and 0 otherwise and the validation data, which we also term "truth", is the MODIS-CALIPSO classification. We use the following terms: Which are normalised such that TP + FP + FN + TN = 100 %. These can be summarised in a confusion matrix, as is done in Figure 3(a) for the N = 250,000 non-training sample. Its trace is the accuracy score of 90.0 % and the off-diagonal elements represent potential misclassifications. Figure 3(b) shows that the FNs are largely clouds of lower MODIS t than those 255 identified by the classifier, with 29.4 % of FNs having MODIS t < 3, compared with 7.3 % of TPs. Some of these "missed" clouds may be due to collocation error, for example a cloud may average t > 1 over the 1 km MODIS footprint, bur not over the larger OCO-2 footprint. The classifier will also have errors: it maximises the accuracy score, and detecting lower t clouds may require passing darker scenes which could increase the prevalence of FPs. compared with the TPs, although there is also a Ttop < 0 °C peak in the FN case. This suggests that the classifier misidentifies some ice clouds as liquid, and also that some of the FNs may in reality be mixed phase clouds that CALIPSO has nevertheless identified as liquid. For example, 24.6 % of FNs have Ttop < -10 °C, compared with 7.6 % of the TP sample.
Among the false positives, we expected that there would be a larger occurrence of broken or multi-layered clouds, where thick broken clouds were sufficiently bright to trigger detection or where overlying thin ice clouds have too little effect on 265 the radiances to be flagged as ice. We describe a scene as broken when the MODIS partially cloudy retrieval exists (Cloud_Optical_Thickness_PCL > 0) and a scene as multi-layered when CALIPSO retrieves more than 1 cloud layer, although strictly this can only detect multiple layers when the upper layer does not fully attenuate the lidar. While 11.3 % of the full sample is multi-layered, 40.1 % of the FP cases are, and while 12.2 % of scenes are partially cloudy, 30.4 % of FP footprints are. Overall, 69.4 % of FPs are associated with multi-layer or broken clouds, or both. 270 The OCO-2 LUT retrievals are compared with those of MODIS and CALIPSO in Figure 5(a-c) and the MODIS and OCO-2 Ptop differences relative to CALIPSO are in Figure 5(d). We consider the OCO-2 value minus the other product's value, and report median [10 th , 90 th percentiles] instead of standard deviation as these distributions are commonly non-Gaussian.
We also divide the t and re differences by the MODIS reported uncertainty (st,MODIS, sre,MODIS). If the OCO-2 and MODIS retrievals were independent Gaussian with equal variance then the standard deviation of OCO-2-MODIS differences would 285 be √2 ≈ 1.41sMODIS. We find values of 1.26st,MODIS and 0.37sre,MODIS, indicating that the re retrievals are not independent and that our differences are within the MODIS-reported uncertainties.
We acknowledge discrepancies in the median retrieved t and re, and refer to these as biases. The t bias grows both with OCO-2 retrieved t and with the horizontal variability of the scene as displayed in Figure 6. For this figure, the samples were split into deciles according to the LUT retrieved t or the MODIS sub-pixel index at l = 0.66 µm, which is the standard 290 deviation of the 250 m footprint radiances with a 1 km cloud retrieval, divided by the mean of those radiances. Spatial variability and greater optical depths appear to drive much of the t bias but we could not identify a dominant factor consistently correlated with the small re bias. These issues are further discussed in Section 4.2.

Pre-processor throughput 295
The multi-layer perceptron classifier passes 5.5 % of all OCO-2 footprints as t > 1 liquid clouds, of which 0.9 % return invalid cloud properties from the LUT and a further 0.8 % have implied Ttop < 0 °C, resulting in a final throughput of 3.8 %. This is smaller than OCO2CLD-LIDAR-AUX, which attempted to retrieve 14.1 % of all soundings. However, most of the difference is due to SZA, and when we restrict the denominator to all footprints with SZA < 45° the throughputs are 13.1 % for OCO2CLD-LIDAR-AUX and 11.7 % for the new classifier, or 8.1 % after the LUT thresholds. 300 Figure 7 displays histograms of selected OCO2CLD-LIDAR-AUX outputs for SZA < 45° retrievals split into footprints where the new pre-processor passes the footprint (blue), where the LUT returns invalid properties or Ttop < 0 °C (orange), or the classifier does not pass the footprint (green). The new classifier identifies "better" retrievals that ended with smaller fit errors: median c 2 = 7.2´10 -7 , versus 1.3´10 -4 for those not passed (among those with SZA < 45°). The LUT filtering further improves the statistics, with median c 2 = 9.8´10 -7 for those not passed by the LUT retrieval versus c 2 = 6.6´10 -7 for those 305 successfully retrieved with Ttop > 0 °C. The perceptron network also tends to pass clouds that are more optically thick (median t = 8.6 vs. 2.4) and to show smaller spread in the difference between OCO-2 and CALIPSO Ptop (standard deviation of differences, s = 33 hPa vs. 55 hPa).
The OCO2CLD-LIDAR-AUX footprints that are excluded by the new pre-processor are consistent with optically thinner clouds and with poorer quality retrievals. Among the footprints that are passed by the new pre-processor, 17.1 % were not 310 attempted in OCO2CLD-LIDAR-AUX.

Cloud classifier and pre-processor throughput
The cloud classifier's agreement of 90.0 % with MODIS-CALIPSO is similar in performance to the original OCO-2 operational cloud flagging for ocean glint used in the L2FP XCO2 retrieval (Taylor et al., 2016). Furthermore, the multi-315 layer perceptron network is lightweight (size <250 kB) and fast. It throughputs 11-13 % of ocean soundings where SZA < 45°, of which under a quarter are poor retrieval candidates according to MODIS-CALIPSO. These cases are consistently (~69 %) broken or multi-layered cloud scenes, while the missed MODIS-CALIPSO cloud scenes are commonly optically thinner (~4 times likely to be t < 3) and colder (~3 times likelier to have Ttop < -10 °C) than the hit cloud scenes. These thinner and colder samples are also likely to be poor candidates for our target future retrieval of droplet number 320 concentration in warm topped clouds.
Applying the LUT retrieval further reduces the number of footprints that are taken to be liquid clouds with t > 1. The OCO2CLD-LIDAR-AUX retrieval attempted 13.1 % of SZA < 45° footprints, the new classifier-LUT pre-processor passes 8.1 %. Figure 7 showed that the excluded footprints tended to be more optically thin, have larger discrepancy in retrieved Ptop relative to CALIPSO, and to have higher c 2 . This suggests that the new pre-processor will pass better retrieval 325 candidates to the OE code, thereby improving efficiency. Of those that are now passed, 17 % were not passed by OCO2CLD-LIDAR-AUX. These likely include cases of mis-identification that will result in poor quality retrievals, but may also include true cloud cases that were not identified in OCO2CLD-LIDAR-AUX. For example, retrievals were previously classified using the nearest CALIPSO footprint up to 10 km away, and if a cloud was in the OCO-2 field of view but not the CALIPSO field of view, it would not previously have been passed. Overall, the new pre-processor shows good performance 330 in terms of identifying scenes which likely contain liquid clouds with sufficient t.

Lookup table cloud property retrieval
The LUT retrieval shows good correlation with MODIS t and re plus CALIPSO Ptop in Figure 5. Versus CALIPSO, the LUT-based Ptop retrievals have a smaller-magnitude bias and 40 % smaller inter-decile range than MODIS. The 12 hPa Ptop bias represents OCO-2 retrieved clouds that are lower in the atmosphere than retrieved by CALIPSO. These statistics may 335 include cases of broken cloud, either above a lower cloud or above the surface. Three-dimensional (3d) cloud effects, or combined scattering from multiple cloud layers could lead to longer mean photon path lengths and thereby a larger OCO-2 Ptop, assuming that CALIPSO tends to identify the highest layer. We consider full 3d radiative transfer treatments to be beyond the scope of this study but point readers to a wide literature on this topic (Davis and Knyazikhin, 2005;Heidinger and Stephens, 2002;Kokhanovsky et al., 2007;Várnai and Marshak, 2002). 340 Aerosol is ignored in these simulations, as previous analysis using CALIPSO aerosol products showed no change in OCO2CLD-LIDAR-AUX Ptop bias in response to CALIPSO-identified aerosol (Richardson et al., 2019). Furthermore, https://doi.org/10.5194/amt-2020-140 Preprint. Discussion started: 18 May 2020 c Author(s) 2020. CC BY 4.0 License.
above-cloud scattering aerosol would tend to reduce photon path length and therefore have an opposite effect on Ptop to our observed bias.
Retrieved Ptop could also change due to the assumed cloud vertical structure and meteorological profile used in the LUT 345 development. If the cloud vertical structure used in the RT differs from reality, then this could lead to incorrectly simulated within-cloud photon paths. Firstly, if the simulated cloud is too geometrically thin (low H) for a given t, re then the withincloud path length will be too small and the above-cloud path must be lengthened to compensate, resulting in a positive Ptop bias, and vice versa for too-high simulated H. This study improves on the OCO2CLD-LIDAR-AUX prior realistically varying H with re in addition to t, but a bias may remain. In particular, shallow marine clouds tend to have extinction 350 weighted towards the top which affects the exiting radiance and may introduce Ptop biases which vary with geometry and cloud properties. We intend to perform a separate and more detailed analysis of how realistic vertical cloud profiles affect simulated OCO-2 radiances, and determine how to account for such a vertical-structure bias.
With regards to meteorology, a warmer and moister profile broadens the O2 absorption lines and we expect stronger resultant absorption in the selected Iabs,O2 channels. Our tropical meteorology may lead to too-strong absorption in non-tropical scenes 355 such that the retrieved cloud is lifted (i.e. lower Ptop) to compensate, but the observed bias is opposite to this. We also retrieved using a LUT developed with the Richardson and Stephens (2018) midlatitude meteorology where surface temperature is approximately 10 °C cooler. The retrieved Ptop distribution shifts as expected with median Ptop bias increasing from 12 hPa to 15 hPa.
Overall the OCO-2 LUT gives better Ptop retrieval statistics than MODIS for these shallow marine clouds, where for these 360 clouds MODIS retrievals rely on brightness temperature at l~11 µm and so may mis-assign Ptop when a temperature inversion is present (Baum et al., 2012). However, OCO-2 has a larger footprint, smaller swath and only retrieves during nadir view orbits. The Ptop bias relative to CALIPSO is concerning for a future optimal estimation retrieval, since biased prior properties may subsequently bias the posterior retrieved state in unpredictable ways. We confidently exclude aerosol and meteorology as the main factors in the observed bias, and propose that the main candidate processes are a combination 365 of horizontal variability, OCO-2-CALIPSO collocation error and potentially vertical cloud structure. In the future OE retrieval we would expect horizontally non-uniform clouds to produce spectra that are more difficult to match under our RT assumptions, so such cases may be identified by the posterior c 2 statistics. For vertical structure biases, we plan a detailed future investigation.
Retrieved t and re show good correlation with those from MODIS and the variance of the differences is smaller than implied 370 by the MODIS-reported uncertainties, if the LUT and MODIS uncertainties are independent Gaussian with MODIS' reported variance. However, OCO-2's Ic,O2 instrumental noise is lower than MODIS' (single channel signal-to-noise ratio, SNR of 300-1200, versus the MODIS band 4 specified SNR of 228), so the instrumental uncertainty contribution to the error budget should be smaller for OCO-2. There are also common characteristics between the retrievals, such as the use of fixed droplet size distribution variances, so individual footprint error will covary between the two. Such covariance should further reduce the inter-satellite difference in retrieved t and re. A quantitative analysis would require a thorough calculation controlling for individual terms in the error budget, we simply conclude that there is no evidence of substantial unexpected variance in our retrieved t and re.
Of greater concern is the residual OCO-2 minus MODIS differences of 0.77 in t and to a lesser extent -0.25 µm in re. For t the bias increases both with horizontal homogeneity and with t, and we expect to be able to identify these clouds scenes 380 using retrieved t, and either OCO-2 developed metrics of spatial variability or future retrieval c 2 .
For the re bias we briefly assessed several factors. Horizontal variability tends to increase retrieved re (Werner et al., 2018;Zhang et al., 2012) but we found no evidence of a strong dependence on spatial variability according to MODIS SPI. For a hypothetical calibration bias we ran the LUT retrieval with a -1 % shift in Ic,st, and this shifts median re by +0.2 µm.
Instrumentally, the MODIS band 7 used in these re retrievals begins at l = 2.105 µm, outside the strong CO2 band. Changes 385 in CO2, or, more likely, temperature and vapour-driven broadening or vapour absorption could affect retrieved re. When retrieving with the midlatitude profile LUT described above, the median retrieved re increases by 0.17 µm. Given that the re discrepancy is small, we make no further efforts to explain or reduce it.

Summary and Conclusions
Here we developed a new pre-processor for a future optimal estimation retrieval using the OCO-2 A-band to provide new 390 estimates of droplet number concentration in marine water clouds. This future retrieval aims to address limitations in the previously published OCO2CLD-LIDAR-AUX product, by (1) removing the requirement for collocated CALIPSO data now that the satellites are no longer formation flying and (2) adding OCO-2 information about re to extend the analysis to droplet number concentration. The pre-processor must identify footprints that likely contain liquid clouds of sufficient t, and provide prior properties for the future cloud retrieval. It may also be useful for identifying appropriate footprints on which other 395 researchers could conduct partial column XCO2 retrievals.
The pre-processor first flags potentially cloudy scenes using a multi-layer perceptron network fed with continuum radiances across all 3 OCO-2 bands plus a set of absorption band radiances from the O2 A-band. The next stage of the retrieval is to use a three-dimensional lookup table that that jointly retrieves t, re and Ptop using radiances from two bands plus an A-band absorption ratio. Footprints whose radiances are inconsistent with the lookup table, or whose implied Ptop occurs where T < 0 400 °C can also be excluded from future retrievals. These footprints were associated with worse fit statistics in OCO2CLD-LIDAR-AUX, implying that the new pre-processor will minimise the waste of computational resources on poor quality retrievals.
This pre-processor flag shows excellent agreement with MODIS and OCO-2, and the lookup table t and re compare well with MODIS, while its Ptop shows better retrieval statistics than MODIS, when taking CALIPSO as the truth. Many of the inter-satellite differences are associated with known factors: false positives from the classifier occur when scenes contain broken or multi-layered clouds, and the t retrieval bias grows with the horizontal heterogeneity of the scene.
A main concern is that the median OCO-2 retrieved Ptop is closer to the surface than CALIPSO's, by approximately 12 hPa (~120 m). The assumed mean cloud extinction or its profile will affect photon paths lengths and so could introduce a bias in retrieved Ptop, and we propose that a detailed analysis of cloud vertical structure is the next and final step before the 410 development of a new OCO-2 cloud retrieval. If successful, this new retrieval would add independent information on cloud droplet number concentration, allowing attempts to resolve apparent disagreements about low cloud processes.
Code availability sklearn is on github at: https://github.com/scikit-learn/scikit-learn and ReFRACTOR is at: https://github.com/ReFRACtor/framework Data availability OCO2CLD-LIDAR-AUX can be downloaded from the CloudSat Data Processing Center at: http://www.cloudsat.cira.colostate.edu/data-products/level-aux/oco2cld-lidar-aux Author contributions MR contributed to study design, ran the analyses and wrote the paper. JM set up ReFRACtor for cloudy scene simulations and provided technical support. MDL and GLS contributed to study design, analysis and paper drafting.

Competing interests
The authors declare no competing interests. 425 Table 1. Summary of methods for determining properties in OCO2CLD-LIDAR-AUX, and changes introduced in this study.

580
OCO2CLD-LIDAR-AUX is a full optimal estimation (OE) retrieval that combined CALIPSO and OCO-2 information to obtain its prior state. This study is intended to provide OCO-2 only prior information for a future OE retrieval.