Probabilistic retrieval of volcanic SO2 layer height and partial column density using the Cross-track Infrared Sounder (CrIS)

During most volcanic eruptions and many periods of volcanic unrest, detectable quantities of sulfur dioxide (SO2) are injected into the atmosphere at a wide range of altitudes, from ground level to the lower stratosphere. Because the fine ash fraction of a volcanic plume is, at times, colocated with SO2 emissions, global tracking of volcanic SO2 is useful in tracking the hazard long after ash detection becomes dominated by noise. Typically, retrievals of SO2 vertical column density (VCD) have relied heavily on hyperspectral ultraviolet measurements. More recently, infrared sounders have provided additional VCD measurements and estimates of the SO2 layer altitude, adding significant value to real-time monitoring of volcanic emissions and climatological analyses. These methods can provide fast and accurate physics-based retrievals of VCD and altitude without regard to solar irradiance, meaning that they are effective day and night and can observe high-latitude SO2 even in the winter. In this study, we detail a probabilistic enhancement of an infrared SO2 retrieval method, based on a modified trace gas retrieval, to estimate SO2 VCD and altitude probabilistically using the Cross-track Infrared Sounder (CrIS) on the Joint Polar Satellite System (JPSS) series of satellites. The methodology requires the characterization of real SO2-free spectra aggregated seasonally and spatially. The probabilistic approach replaces altitude and VCD estimates with probability density functions for the layer height and the partial VCD at multiple heights, fully quantifying the retrieval uncertainty and allowing the estimation of SO2 partitioning by layer. This framework adds significant value over basic VCD and altitude retrieval because it can be used to assign probabilities of SO2 occurrence to different atmospheric intervals. We highlight analyses of several recent significant eruptions, including the 22 June 2019 eruption of Raikoke volcano, in the Kuril Islands; the mid-December 2016 eruption of Bogoslof volcano, in the Aleutian Islands; and the 26 June 2018 eruption of Sierra Negra volcano, in the Galapagos Islands. This retrieval method is currently being implemented in the VOLcanic Cloud Analysis Toolkit (VOLCAT), where it will be used to generate additional cloud object properties for real-time detection, probabilistic characterization, and tracking of volcanic clouds in support of aviation safety.

Abstract. During most volcanic eruptions and many periods of volcanic unrest, detectable quantities of sulfur dioxide (SO 2 ) are injected into the atmosphere at a wide range of altitudes, from ground level to the lower stratosphere. Because the fine ash fraction of a volcanic plume is, at times, colocated with SO 2 emissions, global tracking of volcanic SO 2 is useful in tracking the hazard long after ash detection becomes dominated by noise. Typically, retrievals of SO 2 vertical column density (VCD) have relied heavily on hyperspectral ultraviolet measurements. More recently, infrared sounders have provided additional VCD measurements and estimates of the SO 2 layer altitude, adding significant value to real-time monitoring of volcanic emissions and climatological analyses. These methods can provide fast and accurate physics-based retrievals of VCD and altitude without regard to solar irradiance, meaning that they are effective day and night and can observe high-latitude SO 2 even in the winter.
In this study, we detail a probabilistic enhancement of an infrared SO 2 retrieval method, based on a modified trace gas retrieval, to estimate SO 2 VCD and altitude probabilistically using the Cross-track Infrared Sounder (CrIS) on the Joint Polar Satellite System (JPSS) series of satellites. The methodology requires the characterization of real SO 2 -free spectra aggregated seasonally and spatially. The probabilistic approach replaces altitude and VCD estimates with probability density functions for the layer height and the partial VCD at multiple heights, fully quantifying the retrieval uncertainty and allowing the estimation of SO 2 partitioning by layer. This framework adds significant value over basic VCD and altitude retrieval because it can be used to assign probabilities of SO 2 occurrence to different atmospheric intervals.
We highlight analyses of several recent significant eruptions, including the 22 June 2019 eruption of Raikoke volcano, in the Kuril Islands; the mid-December 2016 eruption of Bogoslof volcano, in the Aleutian Islands; and the 26 June 2018 eruption of Sierra Negra volcano, in the Galapagos Islands. This retrieval method is currently being implemented in the VOLcanic Cloud Analysis Toolkit (VOL-CAT), where it will be used to generate additional cloud object properties for real-time detection, probabilistic characterization, and tracking of volcanic clouds in support of aviation safety.

Introduction
During most volcanic eruptions and many periods of volcanic unrest, detectable quantities of sulfur dioxide (SO 2 ) are injected into the atmosphere at a wide range of altitudes, from ground level to the lower stratosphere. Often early in eruptions the fine ash fraction of a volcanic plume is colocated with SO 2 emissions, and thus ash tracking can be performed by proxy; however, later ash and SO 2 tend to evolve along different trajectories due to subtly differences in altitude and removal processes (Karagulian et al., 2010;Corradini et al., 2010;Sears et al., 2013;Moxnes et al., 2014). Early colocation of SO 2 and ash is highly significant for informing forward trajectory models (e.g., HYSPLIT) of volcanic clouds as is performed in response to Volcanic Ash Advisories (VAAs)  Ash Advisory Centers (VAACs). Because fine ash and SO 2 eventually diverge along different trajectories, due in large part to wind shear, layer height estimates are critical for ash and SO 2 cloud estimates. Although volcanic ash presents a demonstrated threat to aviation (e.g., ICAO, 2012;Casadevall, 1994;Prata and Rose, 2015;Guffanti et al., 2010), SO 2 also presents an aviation safety concern, mainly as a human health hazard and through damage by sulfuric acid, as well as via impacts on global climate and air quality (Chin and Jacob, 1996;Prata, 2009;Carn et al., 2009;Robock, 2000).
Globally, measurements of SO 2 vertical column density (VCD, here in Dobson Units, DU, 1 DU = 2.69 × 10 16 molec. cm −2 ) have previously relied heavily on lowearth-orbiting hyperspectral ultraviolet (UV) instruments including the Ozone Monitoring Instrument (OMI), Ozone Mapping and Profiler Suite (OMPS), and Global Ozone Monitoring Experiment-2 (GOME-2) (e.g., Krotkov et al., 2010;Carn et al., 2017;Li et al., 2017;Theys et al., 2013). More recently, efforts to improve UV methods have focused on high-cadence UV measurements made from the Deep Space Climate Observatory -Earth Polychromatic Imaging Camera (DSCOVR-EPIC) at the Earth-Sun (L1) Lagrange point (Carn et al., 2018), as well as from high spatial resolution SO 2 VCD and limited layer height measurements from the Tropospheric Monitoring Instrument (TROPOMI) (Theys et al., 2019;Hedelt et al., 2019). In the last decade, infrared sounders such as the Infrared Atmospheric Sounding Interferometer (IASI) have provided additional SO 2 VCD measurements and estimates of the layer altitude, providing significant added value to real-time monitoring of volcanic emissions and climatological analyses (Walker et al., 2011Carboni et al., 2012Carboni et al., , 2016Clarisse et al., 2014;Bauduin et al., 2016). These methods can provide fast and accurate physics-based retrievals of VCD and altitude. Furthermore, because these techniques principally rely on thermal contrast in the atmosphere and not solar irradiance (as in UV measurements), they are effective day and night and can observe high-latitude SO 2 in the winter months when UV techniques are unavailable. This twice-daily global coverage makes IR-based SO 2 retrieval a highly useful tool for operational support of aviation safety and for creating a truly continuous global analysis of SO 2 from volcanic eruptions.
In this study, we detail a probabilistic enhancement of the infrared SO 2 retrieval method of Clarisse et al. (2014), based on a modified trace gas retrieval (Walker et al., 2011) to estimate SO 2 VCD and altitude probabilistically utilizing the Cross-track Infrared Sounder (CrIS) currently aboard the Suomi-NPP (SNPP) and NOAA-20 satellites as part of the Joint Polar Satellite System (JPSS), having a local time ascending node (LTAN) of 13:30 with NOAA-20 operating approximately 50 min ahead of SNPP. Similar to IASI, CrIS is a Fourier transform Michelson interferometer covering three regions of the infrared spectrum: long-wave infrared (LWIR) (650-1095 cm −1 ), mid-wave IR (MWIR) (1210-1750 cm −1 ), and short-wave IR (SWIR) (2155-2550 cm −1 ) . Of the two principal SO 2 absorption features in the infrared (ν 1 : 1000-1200 cm −1 ; ν 3 : 1300-1410 cm −1 ), only the ν 3 band is covered by CrIS (MWIR). As highlighted by Carboni et al. (2012) and Clarisse et al. (2014), ν 3 is the stronger of the two absorption bands; however, it does contain significant interference from water vapor, limiting this retrieval's ability to characterize SO 2 features at very low altitudes. However, it is exactly the variable amounts of interference with water vapor at different heights that gives this technique its ability to retrieve SO 2 altitude information . As these studies pointed out, although clouds pose similar absorption features to water vapor, it is only in a broadband sense, allowing the finer absorption lines of SO 2 to be distinguished even in scenes with overlying meteorological clouds, as long as the clouds are not nearly opaque. Although the interference from water vapor is a significant theoretical limitation of this approach, in practice we have been able to extract some information on low-altitude SO 2 clouds even in the tropics which is detailed later. Despite these limitations, the ν 3 band absorption lines are only minimally influenced by ash and dust, making this height retrieval method especially useful early in the evolution of volcanic eruption clouds where there is typically colocation between SO 2 and ash clouds.
Both CrIS instruments are currently operating in full spectral resolution mode (FSR), providing MWIR spectra at 0.625 cm −1 spectral resolution since December 2015 for SNPP CrIS (excepting a major outage 26 March-1 August 2019) and February 2018 for NOAA-20 CrIS. CrIS scans consist of 30 fields of regard (FOR) in 3.3 • steps between ±48.3 • scan angle, each of which contains 9 circular fields of view (FOV) arranged in a square (3 × 3) array that rotates and stretches as the mirror moves away from nadir towards edge of scan . CrIS granules are collected into 6 min granules of 45 scans, resulting in 12 150 MWIR FSR spectra collected every 6 min. The CrIS swath width is 2200 km; however, because of the rotating FORs, some ground points are measured by multiple FOVs even within the same scan, and some gaps exist due to the square FOR layout of circular FOVs and the presence of short gaps between scans. The FOV at the center of each FOR (number 5) is a 14 km diameter circle at nadir, extending out to an 43.6 km × 23.2 km (major and minor axes) ellipse for the first and last FORs on the edge of the swath Wang et al., 2013). Although CrIS FOVs are slightly larger than IASI FOVs (14 km versus 12 km at nadir), there are many more of them per scan since CrIS FOR are 3 × 3 arrays, whereas IASI FOR are 2 × 2 with larger gaps between FOV, FORs, and scan lines but the same swath width (e.g., Sun et al., 2018). Consequently, CrIS makes many more measurements per area than IASI does, resulting in greater overall resolution than IASI. Lastly, all of the CrIS MWIR channels used in the present study are, in general, very low noise (noise equivalent differential radiance, NEdN < 0.05 mW m −2 sr −1 cm) with the exception of SNPP CrIS FOV 7, which is above specification. Later in this study we will show some retrievals from this FOV; however, these are considered to be of very low quality and are not considered reliable. They are shown here only to elucidate how strong instrument noise is propagated to the retrieval.
The NOAA Unique CrIS/ATMS Processing System (NU-CAPS) already includes a retrieval of SO 2 from CrIS data (Gambacorta, 2013); however, it is based on a heritage algorithm designed to estimate many trace gases from cloudcleared radiances in one retrieval, whereas we focus more specifically on the problem of retrieving SO 2 in any background atmosphere from all available CrIS measurements. The methodology requires the characterization of the background mid-wave infrared spectrum of the SO 2 -free atmosphere, which is done by collecting the statistics of more than 360 million SO 2 -free CrIS spectra aggregated seasonally and spatially. The probabilistic approach replaces altitude and VCD estimates with a nonparametric probability density function (PDF) for the layer height and estimates (with uncertainty) of the partial VCD at multiple heights, fully quantifying the retrieval uncertainty and allowing the estimation of SO 2 partitioning by layer (Fig. 1). This framework adds significant value because it can be used to assign probabilities of SO 2 occurrence in different intervals of the atmosphere, which could prove very useful for aviation safety in the future when changing aviation hazard priorities will require such information (ICAO, 2019).
In this study we analyze several recent significant eruptions, including the 22 June 2019 eruption of Raikoke volcano, in the Kuril Islands; the mid-December 2016 eruption of Bogoslof volcano, in the Aleutian Islands; and the 26 June 2018 eruption of Sierra Negra volcano, in the Galapagos Islands. This retrieval method is currently being implemented in the VOLcanic Cloud Analysis Toolkit (VOLCAT, https://volcano.ssec.wisc.edu/, last access: 26 October 2020; Pavolonis et al., 2013Pavolonis et al., , 2015aPavolonis et al., ,b, 2018, where it will be used to generate additional cloud object properties for real-time detection, probabilistic characterization, and tracking of volcanic clouds in support of aviation safety. 2 Probabilistic SO 2 layer retrieval theory 2.1 Classical methods for height retrieval As a preliminary we discuss several methods which we describe here as "classical". In fact these methods are relatively recent; however, they do not make full use of the probability spaces that we will exploit here. Previous analyses of the height and distribution of volcanic SO 2 plumes using data from IASI by Carboni et al. (2012Carboni et al. ( , 2016 and Clarisse et al. (2014) utilized trace gas methods modified from the method originally outlined by Walker et al. (2011). The analysis of Carboni et al. (2012) imposed an a priori Gaussian vertical distribution over pressure coordinates for the SO 2 concentra-tion, retrieving the total SO 2 VCD, the mean pressure, and the standard deviation pressure (spread, only if VCD sufficiently strong). By contrast, Clarisse et al. (2014) developed a system in which the SO 2 is assumed to exist in a narrow box profile layer and an iterative retrieval is performed for the VCD concentration conditional on the retrieved SO 2 layer altitude. The principal differences between these methods and the method detailed here are summarized in Table 1. Employing the notation of Rodgers (2000), this retrieval relates a set of parameters governing the concentration of a trace gas (the true state, x ∈ R M , SO 2 in this case) to a set of measurements y ∈ R N (typically brightness temperature spectra) by a forward radiative transfer model F : R M → R N . In the following exposition, we focus on and enhance the method of Clarisse et al. (2014).
The infrared trace gas methods of Walker et al. (2011) andClarisse et al. (2014) rely on the ability to write the data and true state each as a sum of their respective climatological background averages (y bg , x bg ) and their anomalies ( y, x). Linearizing around the climatological average gives where u is a collection of all auxiliary parameters necessary to the forward model including atmospheric pressure, temperature, and water vapor profiles, as well as the state of the surface and instrument. K ∈ R N×M is the Jacobian of F linearized around x bg , and ε tot is the total error associated with the measurement, linearization, and other surface and atmospheric properties (including clouds) that influence the measured radiances. Because it is inferred that y bg = F (x bg ; u), the equation for the error is reduced simply to a relationship between the state and measurement anomalies: From this formula, the retrieval of x can proceed either by maximum likelihood estimation, iterative methods such as Levenberg-Marquardt and gradient descent algorithms, or other methods. For a 1 km thick box profile layer, the concentration of anomalous SO 2 can be represented by two parameters: the total VCD of the gas (x) and the height of the layer center (h). Using such a profile, the model spectrum is then a function of the VCD and the layer height. In Clarisse et al. (2014) and this paper, the Jacobian is represented a function of height but only contains a VCD perturbation. Because this Jacobian only measures the sensitivity of the forward model to the presence of SO 2 at each height independently, it is best viewed as a set of vectors in the same space as y rather than as a matrix and thus we write it hereafter as K(h) ∈ R N . Additionally, the model Jacobian for the trace gas retrieval is calculated at the background state (zero for SO 2 ), and the Jacobian is approximated as a finite difference at each altitude:  where the perturbation ( ) is taken as 5 DU.
In the present study, we pre-compute a limited database of Jacobians for 1 km thick SO 2 layers centered at 28 altitudes between 0 and 32 km ( Fig. 2b; Fig. 1 of Clarisse et al., 2014). We use standard profiles of pressure, temperature, and water vapor for a tropical atmosphere, summer and winter midlatitude atmospheres, and summer and winter subarctic atmospheres, resulting in a total of 140 Jaco-bians to be used in the retrieval (Clough et al., 2005). As in Pavolonis (2010), all radiative transfer model simulations used here were performed using the LBLDIS tool (Turner, 2005), which utilizes the Line-by-Line Radiative Transfer Model (LBLRTM; Clough and Iacono, 1995;Clough et al., 2005) to compute gaseous absorption and the Discrete Ordinate Radiative Transfer (DISORT) model to complete the radiative transfer calculation (including multiple scattering). Following Clarisse et al. (2014), a height-dependent Jacobian can be used to calculate a statistical z score, measuring the relative confidence in the presence of SO 2 at each height: where the mean climatological background spectrum y bg and error covariance matrix S are built from spectral residuals from a database of measurements with low or no detectable SO 2 present. The construction of this database of SO 2 -free spectra is detailed in Sect. 2.5. Note that because K(h) is a vector, the factor K T (h)S −1 K(h) is a scalar at each height h. Here, z(h; y, y bg ) is the statistical z score (number of standard deviations from the mean) of finding the SO 2 anomaly at altitude h given the data y and the SO 2 -free background spectrum y bg . Using the z score, Clarisse et al. (2014) estimated the layer height (which we refer to as h C ) as that which maximizes the z-score function: This method was used in that study to produce a consistent and reasonably accurate set of cloud top height estimates for the 2011 eruption of Nabro Volcano, Eritrea. This method is currently the principal operational SO 2 layer height method for IASI used by the Support to Aviation Control Service (SACS, https://sacs.aeronomie.be/, last access: 26 October 2020) of the Royal Belgian Institute for Space Aeronomy (BIRA-IASB), supporting several Volcanic Ash Advisory Centers (VAACs) in near-real time (Brenot et al., 2014).
For simplicity, throughout the remainder of this work we refer to this type of height retrieval with a function notation:

Probabilistic enhancement
Throughout, we make a distinction between our method being probabilistic and other methods being deterministic; however, we note here that the classical methods are all based on the optimal estimation (a Bayesian method) of Rodgers (2000) and therefore are probabilistic in the sense that the retrieved quantities are the (Gaussian) mean and covariance of the state estimate. We make this distinction to highlight the fact that in the present study we focus on a much more detailed uncertainty propagation, specifically, propagation uncertainty about the SO 2 -free background atmosphere to the height retrieval, allowing some departure from the Gaussian assumption underlying previous methods. Although the Gaussian assumption is workable for many types of retrieval, it is unsuitable for the Clarisse et al. (2014) height retrieval in particular due to the role played by the arg max operation. To see this directly, we must "probabilize" the Clarisse et al. (2014) height retrieval as follows. In this process, we use a notation common in probability theory in which random variables are represented as capitalized versions of their deterministic realizations. In what follows, the only exception to this notation will be that the forward model, Jacobian, and covariance matrix are not random variables. Instead of calculating the mean and covariance of the climatological background as in a traditional trace gas retrieval, we treat the background (SO 2 -free) spectrum as a random vector Y bg (rather than the realization y bg , which is its mean), where the vector elements (each of the sampled wavenumbers or channels) are random variables. Although the total uncertainty in the measured SO 2 -free background spectrum contains a mix of aleatoric (stochastic in fact) and epistemic (knowledge-deficit) uncertainties, the epistemic uncertainty due to a lack of knowledge about the true SO 2free background for a given spectrum is considerably larger than aleatoric sources such as instrument noise and scattering. Consequently, we generate a probability space for the SO 2 -free spectrum Y bg , the uncertainty for which is derived principally from our ignorance of the true atmospheric state if SO 2 were not present. The brightness temperatures for each channel (element of Y bg ) is characterized by its own marginal probability distribution. In reality, these distributions may belong to a family of parameterized distributions; however, in this study they are nonparametric and only characterized by a marginal distribution (histogram) on each channel. Additionally, the elements of Y bg are correlated, which is given by the covariance structure: where the expectation is an average over all SO 2 -free spectra in the database (detailed in Sect. 2.5).
In this framework, the z-score function is a conditional random variable given the layer height h: and the height is therefore a random variable H = g(y, Y bg ). Implicitly, Clarisse et al. (2014) assumes that Y bg is a multivariate normal random vector with mean y bg = E[Y bg ] and covariance S, meaning that Z is a standard normal random variable. This fact about the z score is expected to hold in the present case with the full probabilistic characterization of the generally non-Gaussian Y bg because the z score is a weighted sum over all of the channels in Y bg , which is expected to converge to a Gaussian for a large collection of channels. Because the function g uses the arg max operation, which in not exactly a proper function (and also not linear), we can write that That is, the random variable resulting from a nonlinear transformation of a Gaussian random variable is not itself Gaussian and the mean value of that new variable is not equal to the value obtained by transforming the mean of the Gaussian (thus, E[H ] = h C ; Fig. 1 schematically), a standard result in elementary probability theory texts (e.g., DeGroot and Schervish, 2012). Similarly, h C is not generally the maximum likelihood height either (h C = mode[H ]). Consequently, without a clear understanding of what h C is mea-suring in terms of the statistics of H , it is difficult to contextualize the value h C . The principal enhancement over the classical method comes from setting the height retrieval in a probabilistic framework, enabling precise propagation of uncertainty in the background state to uncertainty in the retrieved height. This study aims to estimate the probability distribution of H and show the importance of its PDF in making predictions about the cloud. We enhance the method of Clarisse et al. (2014) by retrieving a probabilistic SO 2 layer; i.e., we retrieve the SO 2 layer height as a PDF for the height (Fig. 1). As described, the probabilistic nature of the retrieval product is derived from propagating our uncertainty about the SO 2 -free background spectrum through the Clarisse et al. (2014) height retrieval method (Fig. 1). Here, we use a set of 10 000 possible SO 2 -free background spectra computed by Monte Carlo (MC) sampling according to the collection of marginal distributions of Y bg and its covariance matrix S, which are first computed from a database of SO 2 -free spectra (detailed in Sect. 2.5). The process for sampling this generally non-Gaussian correlated random vector is detailed in Appendix A. In this study each sample is denoted as Although we could directly estimate the height PDF from sampling the many different backgrounds, we treat our retrieval as an update on the Clarisse et al. (2014) height estimate and cast this process in a Bayesian framework. In this framework, we treat the estimate and uncertainty from the  method as a prior distribution for the height and construct an approximate likelihood function from the data not accounted for directly in that method (the distribution of the many possible spectral residuals). The height PDF which is sought is the posterior distribution.
We impose a Gaussian prior with mean and variance given by the Clarisse et al. (2014) method. To estimate the mean and variance of the Gaussian, we first retrieve the Clarisse et al. (2014) height h C and then generate many height estimates around h C using a model spectral anomaly with SO 2 assumed at h C and MC sampling of the zero-mean noise contained in the collection of possible backgrounds. Specifically, we estimate the height due to noisy model spectral anomaly samples This anomaly represents a modeled spectral anomaly with zero-mean spectral noise added and allows for the possibility of a bias induced by the difference between the mean climatological background and the model background which does not include cloud layers. The Gaussian prior mean and variance are then taken to be the mean and variance of these noisy modeled samples. Of note, this Gaussian prior mean is very close to the value h C , but is preferred since its use does not restrict the Gaussian to be centered only at height values for which Jacobians were computed. Using these values for mean and variance is very similar to the estimate with uncertainty shown in Fig. 2 of Clarisse et al. (2014). This mean and variance parameterizes the Gaussian prior distribution f prior H (h). The likelihood function is constructed directly by retrieving the height due to the real spectrum y and the set of sampled background spectra. Each height sample is generated as follows: i.e., we construct the random variable from elements of its sample space h s ∈ H . The likelihood function measures the distribution of the possible real spectral residuals (y − Y bg ) given an SO 2 layer at height h. As there is no analytic probability model to describe this, we estimate that the likelihood is proportional to the distribution of possible heights as computed by kernel density estimation (KDE, e.g., Silverman, 1986) on this set of height samples: An alternative approach would be to replace the KDE distribution with a simple histogram of the samples, although we use KDE because we seek a distribution which is at least piecewise-continuous and not piecewise-constant. This gives an estimate of the posterior height PDF: where the proportionality is eliminated by normalizing the posterior PDF such that the total probability is unity. Although slower than retrieving the layer height (h C ) due to a mean spectrum alone, this distribution provides significantly more information, including the full PDF of H . This PDF may be used to calculate the modal, mean, and median values of the retrieved height or probabilities of finding the plume in a given altitude interval. Additionally, this PDF is essential for calculating the VCD correctly according to probability theory as detailed in the following section.

Probabilistic vertical column density
Although this method is used primarily for detection (using z scores) and height estimation, we estimate VCD as a side-product, which we treat here as a random variableX. Because we will use a linearized retrieval, the VCD values presented here will not be as accurate as those produced by an iterative technique. However, the details of our process below produce VCD values that are reasonably accurate for all but very strong emissions, as is demonstrated later in this work. Specifically, the way in which the height PDF is incorporated (detailed below) mitigates some underestimation error that would otherwise occur in a linearized approach for even dilute SO 2 clouds. As with the height estimation, the uncertainty propagated to the VCD primarily represents uncertainty about the SO 2 -free background, which is of great importance in these linearized trace gas methods.
Because the estimated VCD depends strongly on the layer height, we refer to an estimate of total VCD where the layer is given as a specified height as a "conditional VCD". In this framework, the VCD estimates of Walker et al. (2011Walker et al. ( , 2012 are conditional VCDs and are represented as a conditional random variable: where an air mass factor equal to the cosine of the satellite zenith angle (cos θ ) has been applied. This formula is in some sense a probabilistic enhancement of an optimal unconstrained least-squares estimate. Similar to the z scores, this function is normally distributed at every height with mean E[X|H = h] and variance Var[X|H = h]. These are calculated as the sample mean and variance of the conditional VCD samples due to the many possible background spectra used to estimate the height PDF. The conditional VCD is a random function of height which generally reflects the principles of the Beer-Lambert law for any given realization; i.e., a smaller VCD is retrieved for a given spectral anomaly if the layer is assumed to be higher in the atmosphere. If the height of the SO 2 layer were known exactly, the VCD could be estimated by evaluating the conditional VCD function at that exact height. This is exactly what is done in the VCD retrieval of Clarisse et al. (2014), except by using an iterative conditional VCD calculation. However, in this study, since the height of the layer is known only probabilistically, i.e., as measured by the PDF f H (h), additional computation is required to determine the VCD.
Although we do not know the true vertical profile of SO 2 concentration, the retrieval assumes a thin layer representation of the SO 2 . The total VCD (X, a random variable) is obtained by integrating the box profile between the ground and the top of the atmosphere. Similarly, a partial VCD, denoted here asX(h), can be calculated by integrating the box profile between the ground and the height h and is zero for h < H and rises linearly within the layer toX for h ≥ H . Because the assumed concentration profile scales linearly with the total VCD (X) and the conditional VCD is normally distributed at each height, the partial VCD is normally distributed as well, thus requiring only two parameters: the mean and variance, which can be found using the conditional VCD function calculated above.
Here, we give approximation formulae for the mean and variance partial VCD. The derivation of these formulae is detailed in Appendix B. The mean partial VCD below height h is found by the law of total expectation: D. M. Hyman and M. J. Pavolonis: Probabilistic retrieval of volcanic SO 2 where the expectation of the conditional VCD is taken as the mean of the samples. This formula represents a weighted average of mean conditional VCD values where the height probability density assigns the weights. Because the conditional VCD is a function of height, this formula mixes the VCD estimates from different assumed heights. The variance can also be calculated from the statistics of the conditional VCD expectation: where the conditional mean and conditional variance were previously estimated from the MC samples. The covariance between the partial VCDs for two altitudes (a and b) is given in Appendix B. Of particular interest is that these formulae may be used to calculate the expectation and variance values of the partial VCD between two altitudes: In this system, we retrieve probabilistic SO 2 information in two stages. In the first stage we perform an initial detection using the classical method (Eq. 4) to pre-screen each CrIS FOV that likely contains SO 2 , taken as an initial maximum z score greater than 5, i.e, z(h C ; y, y bg ) > 5 (e.g., Walker et al., 2011Walker et al., , 2012Clarisse et al., 2014). Preliminary investigation of this threshold indicates that it somewhat conservative, striking a balance between including regions of diffuse SO 2 and excluding almost all false detections. In the second stage, we retrieve the height PDF and the mean and variance partial VCD as a function of height for each CrIS FOV that satisfies this initial z-score threshold.

Specialized retrieval for strong SO 2 loading
For strong SO 2 columns, an alternate retrieval is needed to increase sensitivity of the retrieved VCD owing to error induced by the linearized retrieval. We define strong loading here heuristically as z > 200, which in preliminary testing corresponded to VCD values between approximately 10 and 20 DU. Since the conditional VCD retrieval uses a linearized forward model with only a 5 DU perturbation, it is expected that such an approximation would only hold for values near 5 DU and that many physically realistic VCD values would fall outside the linearization's radius of convergence. For large VCD values, the sensitivity of linear Jacobians to additional SO 2 is greatly reduced for most CrIS channels, especially the strongest CrIS channels, so linearized Jacobians with only a 5 DU anomaly will drastically underpredict the VCD for a given brightness temperature difference (Fig. 2a). In order to construct a linearized Jacobian that is more sensitive at higher VCD values, two approaches are possible: (i) we could use a larger VCD perturbation (a coarser finite difference), or (ii) we could seek a special combination of channels for which the linearization is a good approximation. The first approach is limited in that the strongest responses in the forward model are highly nonlinear, so a coarse finite difference will induce large errors there. The second approach is promising as long as such a suitable channel selection can be made that still preserves some of the main features of the ν 3 absorption band, retains enough channels to be robust, and is only applied when it is certain that SO 2 is dominating the signal.
In addressing this issue, we adopt the second approach. The specialized Jacobian must be dominated by channels with approximately linear forward model responses (Fig. 2b). This was accomplished in practice by constructing a sequence of Jacobians with various finite difference coarseness and choosing those channels for which the sequence of Jacobians is approximately constant. This channel subset (Appendix D) is used with the original 5 DU Jacobian, which can then be extrapolated successfully to high VCD values because the forward model truly is approximately linear for those channels.
One complication here is the fact that the channels with the most linear response are also those which are least sensitive to SO 2 VCD (Fig. 2). However, by applying this new retrieval only when strong SO 2 loading (z > 200) is detected by the pre-screening retrieval, the signal is guaranteed to be dominated by SO 2 absorption even in the weak, approximately linear channels, and the linearization is expected to have moderately good accuracy. In theory, a sequence of increasingly restricted retrievals could be implemented to increase the sensitivity to even stronger SO 2 loads; however, even the most sensitive channels in the specialized subset show the worst linear approximations (Fig. 2a), and so some underestimation in the inversion will always be present with such an approach. As we will demonstrate below, this approach does not increase the sensitivity so much that extremely high VCD values (several hundred to thousands of Dobson units) can be retrieved but instead increases the ability of the linearized approach to resolve moderate to high VCD values (perhaps tens to low hundreds of Dobson units).

2.5 Background state construction
At every stage of the retrieval, the background state of the volcanic SO 2 -free atmosphere must be accurate in order for this linearized method to succeed. Consequently, calculating accurate statistics of the background state is paramount. Each CrIS instrument collects almost 3 million spectra per day, allowing for robust characterization of the background spectrum including variation in conditions across seasons and locations. Because we use real CrIS spectra, not only are meteorological variations (including water vapor, clouds, temperature, etc.) accounted for, but the instrument noise profile is also included.
In constructing the background spectrum channel-wise marginal PDFs (histograms) and covariance matrix, periods with little or no SO 2 must be determined. We utilize the detailed record of global volcanic SO 2 emissions from the operational IASI SO 2 retrieval algorithm (Lieven Clarisse, personal communication, 2018) between 1 November 2017 and 1 November 2018, collecting all SNPP CrIS spectra measured on days with maximum VCD less than 1 DU SO 2 present anywhere in the atmosphere (Fig. 3a).
This leaves a database of more than 3.6 × 10 8 SO 2 -free CrIS spectra over the 1-year period. We classify the spectra regionally and seasonally, partitioning this database into four seasons and 5 • × 5 • latitude and longitude grid cells yielding 10 368 bins (Fig. 4). Each bin has a full set of marginal PDFs (brightness temperature histograms) for each channel and a covariance matrix characterizing the correlation structure among the channels. This partitioning reduces the overall variability represented in the mean spectra and thus also reduces the magnitude of the error covariance matrix entries while still capturing the fundamental variability due to spectral trends between regions and throughout a year.
For each season-latitude-longitude bin, we construct a representative sample of 10 000 possible background spectra that conform to the set of channel-wise marginal distributions and the covariance matrix relevant to that bin. Although our database is large enough to construct this sample for each bin, we generate these possible spectra via another method because it is preferable (from a mathematical standpoint) that the samples represent only what is known statistically about the spectra. That is a subtle point, but because the channel marginal distributions are represented as histograms (with finite range), generating synthetic background spectra very slightly damps the possible variance contained in a set of real measured spectra and limits the possibility of two key issues: (i) that real but anomalous or erroneous background spectra will be used and (ii) that real spectra with SO 2 just below 1 DU (the database threshold) will be used as a supposedly SO 2 -free background. For example, if extreme record-breaking conditions (e.g., hurricanes, droughts, etc.) appear in the database collection interval, their spectra will get caught in the database. These events will affect the covariance matrix and marginal distributions; however, they will cause far more variance in the retrieval if used as backgrounds than if they can only affect the used backgrounds as a forcing on the bin statistics. Additionally, construction of similar databases for other sensors could still proceed with fewer database entries since the statistics of the seasonlatitude-longitude bins would be expected to converge after fewer entries than were used here.
Since the channel-wise marginal distributions are generally non-Gaussian (e.g., Fig. 4), sampling the random vector Y bg with covariance matrix S is non-trivial. The general problem of sampling a correlated random vector with known non-normal marginal distributions and covariance matrix is accomplished by a transform sampling technique known as NORTA (NORmal To Anything) (Cario and Nelson, 1997). The NORTA process by which we generate samples of Y bg is detailed in Appendix A. In the above retrieval (in the pre-screening and fully probabilistic phases), the background spectrum and covariance are interpolated spatially from the collection of binned backgrounds to each CrIS FOV center by a bilinear interpolation scheme using the four nearest season-latitude-longitude cells (Appendix C).
NOAA-20 CrIS has very similar radiometric characteristics as SNPP, except that NOAA-20 FOV 7 noise is within specification and is therefore considered in this study (JPSS CrIS SDR Team, https://www.star.nesdis.noaa.gov/jpss/ documents/AMM/N20/CrIS_SDR_Validated.pdf, last access: 24 August 2020). Consequently, we use the SNPPgenerated backgrounds in SO 2 retrievals with both SNPP and NOAA-20 CrIS spectra.  (Fig. 5); however, because there is significantly greater uncertainty within the support of the height PDF, the largest value of mean plus uncertainty occurs just below the upper end of the height PDF support (Fig. 5c). This underestimate early in the Raikoke cloud history was likely due two factors, channel saturation despite the specialized strong column retrieval and the fact that the footprint and layout of CrIS FOV leaves many gaps (≈ 30 % by area) where extremal values could have been present. Because this analysis focuses on the SO 2 ν 3 band (1300-1410 cm −1 ), it is very unlikely that ash was responsible for the strong underestimation (e.g., Carboni et al., 2012). As described above, more specialized retrievals could be devised to increase the sensitivity to very strong SO 2 loading as in the Raikoke case; however, such schemes are beyond the scope of this work, which is principally concerned with advances in height information and estimating VCD values in more typical (low to moderate concentration) emissions. Within about 1 d, the ash and SO 2 were entrained into a large extratropical cyclone, which heavily distorted the dispersion of the cloud, with the SO 2 cloud being pushed to the north and dispersing in both easterly and westerly directions (Fig. 6). Early in this complex dispersion, SO 2 VCD values remained strong despite a rapid decline in eruptive output. This is most likely a result of the convergence caused by entrainment into the cyclone. Based on the probabilistic retrieval and tropopause data from the National Centers for Environmental Prediction (NCEP), it is clear that the vast majority of the SO 2 cloud mass was in the lower stratosphere, with only a small lower layer in the mid-upper troposphere which had mostly dispersed after the first week ( Fig. 6b-l). After 1 month, the SO 2 cloud had spread out over most of the Northern Hemisphere above 30 • N with most VCD values < 2 DU; however, some columns remained as strong as 20 DU. After 2 months, traces of the SO 2 cloud remained over northern Canada and Hudson Bay with all measured VCD less than 1 DU.

Test case II: early detection of SO 2 emission from
Bogoslof Volcano, Aleutian Islands, 2016 In the 2016-2017 eruptive period at Bogoslof volcano, 70 explosive events were identified (Coombs et al., 2018(Coombs et al., , 2019. The first five explosions were not detected in real time and could only be identified and characterized after reanalysis of satellite and other data sources (Coombs et al., 2019 (Coombs et al., 2018(Coombs et al., , 2019. As noted in Coombs et al. (2018), the USGS Alaska Volcano Observatory (AVO) was not able to issue a Volcanic Activity Notice (VAN) for this small event and consequently no height information was generated until the reanalyses of Schneider et al. (2020), in which a cloud height of 6.1 km was determined. The SACS near-real-time retrieval (https: //sacs.aeronomie.be/) only detected SO 2 from this cloud in two IASI FOVs, which was not sufficient to trigger an alert notification. This small pulse was observable by neither the multispectral infrared remote sensing methods nor automated analysis of multispectral signatures and cloud growth rates (Pavolonis et al., 2013(Pavolonis et al., , 2015a 2020). CrIS median heights are mainly clustered between 5-8 km with some scatter due to localized cloud edge effects (Fig. 7c, d). This is broadly consistent with the reanalysis of Schneider et al. (2020).
Of particular importance in this small cloud made up of only a few (17) FOVs, SNPP CrIS FOV 7 is significantly nosier (above specification) than that of other FOVs Han et al., 2013), and consequently the FOV 7 retrievals are highly suspect and have not been used to estimate the SO 2 height. They are included in Fig. 7d and e mainly to illustrate how an increase in instrument noise affects height information. As might be expected, increased instrument noise propagates larger uncertainty to the height PDFs and tends to distribute their centers to the lower and upper end of the range of considered altitudes. This is the signature of the central role the arg max operation plays in the retrieval. For example, if higher noise is propagated to the z-score height functions, the arg max can produce wildly different heights even for small differences in the z-score height profile.
Because the probabilistic framework allows the calculation of a mean partial VCD, we may derive a formula for the mean or expected concentration profile by similar means as those for Eq. (15): which is shown for FOVs in the detected Bogoslof cloud (Fig. 7e). This example demonstrates that the CrIS SO 2 detection and characterization scheme is sufficiently sensitive to capture some small emissions that are generally difficult to observe by other means.
3.3 Test case III: resolving strong stratification in an SO 2 plume, Sierra Negra, Galapagos Islands, 2018 On 26 June 2018, after a period of elevated seismicity, the onset of a major eruption at Sierra Negra was signaled by volcanic tremor at 19:40 UTC, producing an ash and SO 2 plume at 20:09 UTC (Carn et al., 2018;Vasconez et al., 2018;Hedelt et al., 2019). The first CrIS observation also occurred at 20:09 UTC from SNPP, detecting SO 2 above the Sierra Negra on three adjacent FOVs on the edge of scan with maximum initial z scores of approximately 9, 16, and 31. Subsequent overpasses show the plume rising to approximately 14-19 km and spreading in a complex manner due to vertical wind shear, as evidenced by the lower plume altitudes spreading towards the west and the upper plume altitudes spreading towards the southeast (Fig. 8). The significant shearing of the eruption plume enables direct observations of the cloud at many levels. Consequently, this eruption forms a good opportunity to highlight the broad sensitivity of this method in detecting and characterizing SO 2 at every elevation from the vent (1.124 m a.s.l.) up to ∼ 14-19 km and potentially higher. Additionally, this example highlights the strength of the probabilistic height retrieval, enabling the retrieval of any desired confidence interval on the height. Here we compute the 90 % confidence interval (Fig. 8), highlighting the fact that the 95th and 5th percentiles are in general not symmetric around the median or the same size at different locations within the plume. Because our method retrieves consistent statistics across all measurements, we can ensure the stability of the method and derived probabilities in particular, giving good smoothness even without post-processing. As described above, this is not necessarily the case for other height retrievals, which compute a single estimate with constant uncertainty. Figure 6. Time evolution (top to bottom) of the Raikoke SO 2 plume from NOAA-20 CrIS showing the expected-value total VCD (a, e, i), the expected-value stratospheric partial VCD (b, f, j), the probability that the SO 2 layer is in the stratosphere (c, g, k), and the median layer height (d, h, l). The height of the tropopause was calculated from daily NCEP reanalysis data.

Comparison with other data
Although a deep analysis of the differences between the present method and others is beyond the scope of the present work, here we highlight a brief representative comparison of our SO 2 retrievals with data from TROPOMI, IASI, and the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) during the evolution of the Raikoke eruption cloud. These sources of data are described briefly here. As mentioned above, CrIS is a very similar instrument to IASI (both Fourier transform Michelson interferometers). The relevant instrument differences are that IASI (aboard EUMETSAT satellites METOP-A/B, 21:30 LTAN) covers both the ν 1 and ν 3 SO 2 absorption bands (though only ν 3 is used to generate IASI heights), IASI's spectral resolution is 0.5 cm −1 (apodized spectra) compared with 0.625 cm −1 for apodized FSR CrIS, and, as mentioned above, despite IASI's slightly smaller FOV size (12 km-diameter at nadir) compared with CrIS (14 km-diameter at nadir), the greater spatial number density of CrIS FOVs gives CrIS a higher resolution than IASI by virtue of a smaller average sampling distance. For comparison with the set of NOAA-20 CrIS overpasses of the Raikoke cloud in Fig. 9 (24 June 2019, 22:42 UTC, to 25 June 2019, 02:36 UTC), the nearest IASI height data were collected from IASI instrument aboard METOP-B between 00:00 and 11:59 UTC on 25 June 2019. Although some of the data in this interval are highly asynchronous, the cloud did not experience major changes in this time, and the heights (and to a lesser extent VCD) can be compared. As described in Table 1, IASI heights are generated by the Clarisse et al. (2014) retrieval and the VCDs are computed by a related nonlinear technique using the retrieved heights (Clarisse et al., 2012). In the framework presented here, this VCD is the conditional VCD sampled at the IASI-retrieved height.
TROPOMI is a UV spectrometer operating aboard the TROPOMI the Copernicus Sentinel-5 Precursor (S5P) satellite, which orbits only 3.5 min behind SNPP CrIS (Veefkind et al., 2012). TROPOMI represents a significant advance in monitoring of SO 2 and other atmospheric constituents due to its very high spatial resolution (7×3.5 km 2 pixels at nadir) increasing the sensitivity and signal to noise ratio in a given region by a factor of 3 over OMI and OPMS (Theys et al., 2017(Theys et al., , 2019Fioletov et al., 2020). The TROPOMI SO 2 data used here are generated from back-scattered UV radiances with the S5P operational processing algorithm (a differential optical absorption spectroscopy, DOAS, technique) (Theys et al., 2017). Because most UV-based techniques are not able to retrieve height information, UV-based VCD data are typically presented as conditional VCD for several heights. The TROPOMI SO 2 conditional VCDs are given for assumed SO 2 plume altitudes of 1, 7, and 15 km. As the vast major-ity of the Raikoke cloud was in the stratosphere, we use the 15 km data for direct comparison with CrIS (Fig. 10). Obviously such high-resolution data are difficult to compare directly with CrIS, so it was first resampled to the CrIS FOV footprints by constructing weighted averages of TROPOMI pixel data with weights determined by the fraction of intersecting area between the pixels and each elliptical CrIS FOV, as in Sun et al. (2018). Unfortunately, since SNPP CrIS was experiencing a major outage during the Raikoke cloud's evolution due to an electrical fault, only NOAA-20 CrIS was able to measure the cloud, and consequently the NOAA-20 CrIS-TROPOMI comparison is asynchronous by about 50-55 min instead of the 3.5-5 min that would have been possible if SNPP CrIS was functioning.
The last source of comparison data is 532 nm backscattered lidar measurements from the CALIOP overpass of the Raikoke cloud between 14:32 and 14:36 UTC on 25 June 2019 (Fig. 11). CALIOP profiles aerosols, clouds, and other features between the ground and lower stratosphere (Winker et al., 2009). Although it cannot detect molecular SO 2 , it can detect volcanic ash and sulfate aerosols, which are a pho- tochemical product of SO 2 in the atmosphere. As is typical of volcanic SO 2 studies, highly attenuating CALIOP aerosol layers (especially in the stratosphere) are considered here as a proxy for the presence of SO 2 Carboni et al., 2016, e.g.,).
As mentioned above, our strongest total VCD measurement from the Raikoke cloud was 432 DU. This is significantly lower than the maximum detected by TROPOMI (> 900 DU, Hedelt et al., 2019) and several other UV-based methods (∼ 1000 DU, Simon Carn, personal communication, 2019). This suggests that our method, despite the integrated height estimate and the specialized retrieval for strong SO 2 loading, currently cannot fully capture these extremely high VCD values; however, away from these extreme values, our retrieval performs well in comparison to . Other than the relatively few columns with unusually large VCD values, the largest discrepancy between the TROPOMI-retrieved Raikoke cloud and that from CrIS is that the CrIS retrieval does not fully resolve the long, narrow, and diffuse east-west-running cloud to the south of the main cloud, although the CrIS retrieval does resolve some similar features elsewhere in the cloud. This may be due to the very high spatial resolution and sensitivity of the TROPOMI data compared with the coarse CrIS resolution that also contains ≈ 30 % gaps. An additional contributing factor is that the CrIS retrieval starts with an initial detection of the z score for pre-screening (Fig. 9j) and only retrieves the height and VCD for FOVs with z > 5. Traces of this narrow, diffuse cloud are present in the initial z-score field, although it mostly presents with a z score below the detection threshold but slightly above the background noise. Because the western part of this cloud is low altitude, the weak signal in CrIS and IASI but good detection by TROPOMI is evidence that the IR sensors were limited somewhat by interference with larger quantities of water vapor. Overall, it is likely that all of these factors played a role. A more detailed view of the comparison between CrIS VCDs and TROPOMI conditional VCDs at 15 km altitude is shown for two regions of the Raikoke cloud in Fig. 10. As described above, these measurements are asynchronous by about 50-55 min, which is evident in Fig. 10a, b, f, g, where the largest errors between CrIS and the CrIS-resolution TROPOMI occur at the cloud edges (as well as internally in areas with more complex motion) where it had clearly moved between these observations. These two regions were selected because they both contain VCDs spanning several orders of magnitude but different ranges. The northwestern region (dashed red, Figs. 9, 10a-e) contains VCD values generally up to about 50 DU whereas the southeastern region (dashed blue, Figs. 9, 10f-j) contains VCD values as high as 230 DU. As can be seen from the histograms of native resolution and CrIS-resolution TROPOMI, the lower spatial resolution (and gaps) of CrIS is at least partially responsible for its inability to resolve some of the highest VCD values measured by TROPOMI in native resolution. In regions of the cloud with generally lower concentrations, CrIS and TROPOMI compare well, scaling approximately one-to-one, with much of the noise being generated by the asynchrony. Although not a perfect comparison, the rank-order correlation is also shown for the CrIS-resolution comparison and a coarse, 100 km pixel aggregate. Taken together, the FOVby-FOV and 100 km pixel comparisons (both exact matching and rank-order) demonstrate that CrIS matches TROPOMI VCDs moderately well for these low to moderate concentration clouds. In higher concentration clouds (Fig. 10f-j), the comparison has a similar noise profile; however, CrIS consistently underestimates TROPOMI by about 75 % (−0.25 rel-ative error, Fig. 10f, g). Based on the histograms and correlation of CrIS and CrIS-resolution TROPOMI, there is a similar distribution of VCDs below about 50 DU that becomes divergent by about 100 DU. In all regions of the Raikoke cloud, the sensitivity of the fully probabilistic retrieval is limited to VCDs greater than about 0.3-0.5 DU.
The main focus and strength of our approach is the ability to generate physics-based PDFs for the height. Because our retrieval is based on the operational algorithm in use for IASI, our retrieved heights are very similar to those from IASI, although there are key differences readily apparent in Fig. 9f-i. As mentioned above, the IASI heights represent the height retrieved due to the mean background spectrum; however, because the retrieval of height is not linear (due to the arg max operation), the retrieved height is not the expected value height. Inspection of the PDFs generated by this approach show that they are typically non-symmetric, although exact comparison is not possible due the orbital separation between the satellites carrying IASI (METOP-A,B) and those carrying CrIS (SNPP,. At least for the Raikoke cloud, the IASI heights are almost entirely bound within the CrIS 90 % confidence interval (Fig. 9f, g, i) and are similar to the CrIS median heights (Fig. 9h) in most areas. For the snapshot of the Raikoke cloud shown in Fig. 9, the largest differences in height appear at the southernmost part of the cloud, where the IASI heights are > 19 km in altitude over a significant region. Furthermore, the IASI height estimate (Fig. 9f) varies significantly over nearby continuous parts of the cloud, whereas the CrIS median height is more consistent across space with some minor variation due to noise. Although not shown here, Hedelt et al. (2019) have recently developed a new SO 2 height retrieval for TROPOMI using inverse learning machines. Although computationally expensive to train, such an approach has the advantage of computation speed of the inversion once deployed, though it has not yet been incorporated into the TROPOMI SO 2 data product as of this writing, and thus a direct comparison is not possible. However, it is clear from the data presented in Hedelt et al. (2019) that their height estimates for the Raikoke cloud span the CrIS 90 % confidence interval, though they are significantly nosier than CrIS SO 2 height estimates and display a very prominent negative trend in height versus VCD, leading to systematically higher layer heights on the cloud edges than in the cloud centers (Fig. 14 of Hedelt et al., 2019). CrIS PDFs very rarely show a similar trend.
Because we retrieve a PDF on each CrIS FOV rather than a single estimate, we can compare the PDFs directly to data from a CALIOP overpass of the cloud. Here we show an example comparison from Raikoke; however, a full comparison for every overpass of the Raikoke cloud is the subject of future work. For the first several days after the eruption, there was still significant ash suspended in the dispersing cloud, leading to the appearance of several highly attenuating layers in CALIOP data between 10-15 km (Fig. 11a, b). The comparison we focus on is between CrIS retrievals from 14:18:00 to 14:24:00 UTC and CALIOP data from a subsequent overpass between 14:32:06 and 14:36:14 UTC on 25 June 2019. To prevent the mixing of nearby height PDFs, the CrIS data are interpolated to the CALIOP track by nearest-neighbor interpolation, creating a profile of the nearest CrIS SO 2 height PDFs. The nearest-neighbor interpolation over all of space is shown in Fig. 11e with 16 km pixels (the average CrIS sampling distance).
Overall, there is good agreement between the CrIS SO 2 height PDFs and the altitudes of the strongly attenuating CALIOP layers; however, the CrIS PDFs have several important characteristics that complicate comparison. There is some minor noise derived mainly from two anomalous CrIS FOVs at the cloud edge (≈ 62 • N, 175 • W) exactly at the CALIOP track, leading to anomalously high altitudes there (Fig. 11e). Most interestingly, some regions of the cloud have bimodal PDFs (Fig. 11b, c, south of 60 • N). Preliminary investigations of this retrieval suggest that such occurrences are not exceedingly rare throughout this and other clouds; however, such PDFs occur widely and tend to persist over moderate distances (as in Fig. 11b, c).
In the strictest sense, such PDFs can only be attributed to the presence of similar statistical features in the background. Specifically, if the background spectrum probability space is dominated by two sets of meteorological conditions (for example, one mode representing deep convective cloud radiances and another for cloud-free radiances), then multiple populations of the Monte Carlo height samples may accumulate, leading to a multimodal height PDF.
Relaxing this strict interpretation, there is some evidence that these bimodal PDFs may represent the presence of SO 2 at multiple altitudes. In this case, the strongest CrIS probabilities occur in a lower layer around 12 km altitude (Fig. 11b,  c) suggesting the presence of molecular SO 2 layer; however, there is no attenuating CALIOP layer there. Additionally, the CrIS retrieval assigns significant (but less) probability mass to a higher level colocated with a strongly attenuating CALIOP layer at about 15 km. Since these data are very early in the cloud's evolution (< 5 d), this layer is likely dominated by volcanic ash rather than being dominated by sulfate aerosols, having not had enough time to convert large portions of the erupted SO 2 . Of note, because the upper CALIOP layer is very strongly attenuating, it may completely shadow any evidence of lower diffuse ash clouds if they existed. Considering that ash minimally affects the SO 2 ν 3 band, this colocation at 15 km altitude suggests that this strongly attenuating layer (likely mainly ash) contains SO 2 . The fact that the stronger of the two probabilities is in the lower layer combined with the CALIOP-CrIS colocation could be interpreted as evidence that there is SO 2 at both layers. Although this is a possibility, confirmation of such a configuration would require a deeper analysis including forward and inverse trajectory modeling with advanced data assimilation techniques and therefore is well beyond the scope of the present work.

Long-term analysis of the Raikoke SO 2 cloud
By retrieving PDFs for height and partial VCD it is possible to enhance time series analysis of SO 2 clouds accordingly, enabling the generation of time series with quantified uncertainty. As an example, we calculate the total and stratospheric SO 2 mass time series probabilistically as sums of many independent retrievals (Fig. 12).
To estimate the mass of an SO 2 cloud, the values of the retrieval on the CrIS FOVs must be interpolated onto a continuous grid spanning the cloud. In the present study we use an equal-area grid and perform nearest-neighbor interpolation of the CrIS FOV retrievals. We calculate the mass (with uncertainty) as in Appendix E for both the total SO 2 mass in the atmosphere and in the stratosphere only. The stratospheric partial VCD valuesX(h tropopause ) were calculated from Eqs. (17a) and (17b) using daily NCEP/NCAR tropopause pressure level and geopotential height reanalysis data (Kalnay et al., 1996). By exploiting the fact that the mass is a normal random variable at each time, we can from these results estimate the daily "instantaneous" apparent e-folding time of the SO 2 as τ = −M(t)/Ṁ(t) withṀ(t) calculated by finite difference and the PDFs computed by standard results in probability theory (Fieller, 1932;Hinkley, 1969;DeGroot and Schervish, 2012, Appendix E). The e-folding times shown here are really apparent measurements since mass is lost from the cloud not only due to photochemical conversion of SO 2 to sulfate aerosols but also due to the dispersion and dilution of the cloud below levels that can be detected. We include this time series because it illustrates some interesting aspects of the Raikoke cloud's evolution and because it illustrates a logical extension of how uncertainty is propagated through this work.
From Fig. 12 it is clear that the SO 2 cloud, as characterized by CrIS, did not immediately show the exponential mass decay that has been used in similar studies of large eruptions (e.g., Read et al., 1993;Carn et al., 2017;Krotkov et al., 2010). The observed trend is likely a combination of retrieval limitations and genuine atmospheric chemical properties. As described above, the CrIS VCD underestimate early in the Raikoke cloud history was likely due to channel saturation despite the specialized strong column retrieval. The strongest CrIS VCDs were only approximately 50 % of the strongest columns reported at the time; 432 DU from CrIS versus > 900 DU  and > 1000 DU (Simon Carn, personal communication, 2019). In aggregate, CrIS total VCD were approximately 75 % of those of TROPOMI in the most concentrated regions (Fig. 10). This underestimate was propagated to the CrIS maximum total mass estimate of approximately 1.1 Tg SO 2 (Fig. 12) compared with preliminary estimates from other sources (1.4-1.5 Tg; Sennert, 2019;Global Volcanism Program;Simon Carn, personal communication, 2019). Very large VCD values persisted for many days, though after approximately 10 d most of the cloud had diluted sufficiently that CrIS and preliminary TROPOMI data were in good agreement between 10 and 20 d from the eruption, both decreasing from about 1.0 to 0.6 Tg ( Fig. 12; https://so2.gsfc.nasa.gov/pix/special/2019/raikoke/raikoke_ tropomi_so2stl_0621-07062019.html, NASA/GSFC, 2019).
Despite this early underestimation and the fact that the SO 2 injection is more or less instantaneous compared with the time span of cloud detectability, CrIS total mass (and, to a lesser extent, preliminary TROPOMI total mass as well) did not begin the anticipated exponential decay until approximately 15-20 d after SO 2 injection. We posit that this was mainly due to CrIS channel saturation and highly irregular dispersal by extratropical cyclonic winds affecting the total CrIS-measurable mass and cloud dilution, respectively. However, the presence of this effect in preliminary TROPOMI suggests that at least some contribution may have derived from time-dependent SO 2 to sulfate conversion kinetics. As has been invoked in previous studies, this could have been the result of limited hydroxyl radical (OH) early in the cloud history (e.g., Theys et al., 2013;Sekiya et al., 2016), although detailed chemical modeling to support this is beyond the scope of this work. As the e-folding times shown in Fig. 12c and d are calculated by finite difference, they are quite noisy; however, they exhibit the same trend of slow decay early in the cloud history even after CrIS and TROPOMI masses become similar. After about 35 d, the apparent e-folding time for the total VCD settles to a more or less constant value (∼ 10 d e-folding time). The apparent stratospheric e-folding time exhibits a very different pattern, remaining approximately constant (∼ 10 d e-folding time) until about day 30, when it begins to shorten. The large uncertainty in the stratospheric e-folding time after approximately 30 d is the result of the increasing relative stratospheric mass uncertainty and the particular details of the PDF calculation (Hinkley, 1969). Because SO 2 is typically assumed to have a long lifetime in the stratosphere, this is likely the result of dilution below the detection threshold over large low-concentration regions of the cloud, although some portion of the SO 2 re-entering the troposphere as it extends to lower latitudes (where the tropopause is higher) may also play a role considering that the total column does not show this effect as significantly.

Conclusions
i. New probabilistic enhancement of existing hyperspectral IR SO 2 retrieval techniques enables the retrieval of PDFs for SO 2 height and partial VCD, providing significant statistical power, precision, and consistency to global SO 2 detection, tracking, and characterization efforts. Retrieving these PDFs enables the calculation of many new quantities, including exceedance probabilities for concentration, layer height constraint probabilities, mean concentration profiles, and mass at different layers with uncertainty. Although these capabilities are primarily beneficial for operational SO 2 monitoring, these methods are relevant to climatological studies because of the ability to the stratospheric fraction of total mass for a given SO 2 cloud if tropopause heights are available from ancillary sources.
ii. This technique is capable of resolving larger VCD values than would be anticipated for a linearized approach due to two factors: (i) the use of the height PDF increases the retrieved total VCD compared with a single height estimate and (ii) the use of a specialized channel subset retrieval that improves the linear approximation when the signal is certain to be dominated by SO 2 . However, these techniques are still limited in their ability to resolve extreme VCD values (many hundreds of Dobson units) like some of those observed in the recent (2019) eruption of Raikoke Volcano, Kuril Islands. Because of the improved spatial resolution over IASI and the technique's sensitivity, we can resolve heights for small clouds that cannot be resolved well by IASI. Additionally, the technique can adequately resolve height information across a broad range of plume altitudes, including those in the lower stratosphere and close to the surface (though with more limited detection).
iii. Preliminary comparisons suggest that this method generally compares well with other measurements of SO 2 VCD and altitude; however, the probabilistic framework adds significant value, especially in the retrieval of height information. Cross sections through these probability clouds compare very well with cloud heights from CALIOP lidar backscatter.
iv. As a logical extension of the probabilistic framework, this technique enables the characterization of SO 2 clouds for long-term probabilistic analyses of cloud evolution and key time-varying parameters such as total mass, stratospheric mass, and apparent e-folding time.
v. The algorithms presented here are currently being integrated into VOLCAT, where they will be used for operational SO 2 cloud detection, characterization, and tracking in support of aviation safety. We anticipate future work to include more comprehensive comparison of height PDFs with CALIOP lidar backscatter data, application of these techniques to similar instruments including IASI and the Atmospheric Infrared Sounder (AIRS), and the development of volcanic degassing and aviation-focused products. The general problem of sampling a correlated random vector (Y i ∈ Y ∈ R N ) with non-normal marginal distributions F Y i and covariance matrix S is accomplished by a transform sampling technique known as NORTA (NORmal To Anything) (Cario and Nelson, 1997), in which the desired random vector is written as a component-wise inverse marginal transform of a standard normal random vector: where is the univariate standard normal cumulative density function (CDF). In this formulation, the individual marginals can be used to transform the components of the standard normal vector. The crux of this method lies in generating a standard normal random vector Z with appropriate correlation structure ρ Z , which upon component-wise transformation, as above, will produce the desired random vector Y with the desired covariance structure S. That is, a correlated Z must be generated with the correlation matrix which, after transformation, results in the correlation matrix ρ Y = Corr(Y ) associated with the known covariance matrix S. This is accomplished by solving for each of the unique N (N − 1)/2 correlations (ρ Z (i, j )) in the lower triangle of ρ Z . The correlation transformation function is and where ϕ(z i , z j ; ρ Z (i, j )) is the bivariate standard normal density function with correlation ρ Z (i, j ) between the variables Z i and Z j . These problems may be inverted individually by various methods outlined in the original formulation of Cario and Nelson (1997).

A1 Application of NORTA to background spectrum
For the purpose of generating the correlated random background spectrum in the present study, we make several modifications to the classical method of Cario and Nelson (1997) to make our problem tractable but retain the high fidelity of the CrIS measurements. In the present study, 177 CrIS channels are used, representing the FSR mid-wave band between 1300-1410 cm −1 . This yields 15 576 independent correlations -matching inverse problems that are solved in the present study by gradient descent iteration. As in the main text, we first collect a database of SO 2 -free CrIS spectra for each seasonal 5 • × 5 • bin and compute the 1300-1410 cm −1 background covariance matrix S, as well as the channelwise marginal distributions, as a sequence of 177 histograms. Then we perform the NORTA process to generate 10 000 samples of Y bg for each season-latitude-longitude bin.

A2 Numerical integration of the joint expectation
Because each of the correlations matching inverse problems requires multiple rounds of numerical integration of Eq. (A5), we make several modifications to increase computational efficiency. Because many of the channel correlations are strong, the bivariate standard normal distribution ϕ(z i , z j ; ρ Z (i, j )) for each such pair of channels is very narrow. Consequently, for a typical rectangular sampling domain, the integrand of Eq. (A5) is approximately zero almost everywhere except for a concentrated region in which accurate numerical integration is challenging. To reduce wasted integrand samples, we make a standard transformation of the bivariate normal distribution and then cast the domain in polar coordinates and approximate the integral over a finite radial domain [0, R]: is the product of the transformed ith and j th components of the desired random vector. This approximates the full improper integral within tolerance , with the fixed value = 10 −6 in the present study. The terminal radius is estimated conservatively as where the maximum and minimum values of the components were recorded during the generation of S and the marginal distributions from sample data. In the absence of knowledge of these minima and maxima, they could be estimated from the marginal distributions as fixed percentiles of these distributions.
The estimated radial limit of integration R is derived by requiring that This inequality can be solved for R under the conservative estimation that maxG ij = maxY i maxY j and minG ij = minY i minY j which are derived from the maximum and minimum brightness temperatures on each channel from among all background spectra in the database. This leads to and, upon integrating, R may be chosen conservatively to satisfy This transformation to a scaled polar coordinates ensures that the curvature and gradients in the integrand are as small as possible, ensuring that numerical integration is as accurate as possible for a given domain sampling up to the order of the method employed. Additionally, the sensible radial limitation of this integral ensures that it can be computed efficiently to within tolerance without the inclusion of samples which minimally affect the total (approximately Gaussian tails).

A3 Numerical solution of inverse problems
Although each channel pair inverse problem can be solved separately by Newton's method or other algorithms, we solve all of the problems jointly, restating the problem as a gradient and descent minimization of the total square correlation error where the vectors r Z , r Y ∈ R 15,576 are comprised of the unique lower triangular elements of ρ Z and ρ Y , respectively, and C : R 15,576 → R 15,576 is the correlation transformation function cast as a vector function. Although this seems intractable due to the extreme dimensionality, there is no correlation between these dimensions (no correlation between pairwise inverse problems), and consequently the Jacobian is zero everywhere except for its main diagonal. The gradient descent method of Barzilai and Borwein (1988) produces fast convergence to a global minimum due to the monotonicity and bounding properties of the correlation functions C ij as described by Cario and Nelson (1997). At each iteration, the NORTA process is completed and the error in the synthesized channel marginal distributions and spectral covariance matrix are used as convergence criteria. The vectorization of these independent problems allows for standardization in convergence criteria using global (total) error minimization. Because the error is measured on the final product (the desired random spectra), a minimum number of iterations is required by comparison with performing each problem separately and then generating the desired random spectra.
Appendix B: Derivation of VCD mean, variance, and covariance formulae As in the text, SO 2 is assumed to exist in a narrow, 1 km thick layer, represented by a box profile: where L is 1 km. For the purposes of simplicity in computation, we consider a limiting case of a very thin layer (L → 0), where the finite box profile converges in distribution to the Dirac delta: The calculation of the mean and variance below makes extensive use of Fubini's theorem, allowing reordering of iterated integration. Because these integrals contain the Dirac delta, it is not simple to show that the conditions of Fubini's theorem are satisfied due to the Lebesgue non-integrability of the Dirac delta, which in turn stems from the fact that the Dirac delta is not a proper function but is a distribution instead. However, we proceed assuming that the iterated integrals can be interchanged. We remark that a proof using limits of functions ("nascent delta functions") approaching the Dirac delta could be substituted here at great cost to simplicity and readability. For a finite thickness layer (as assumed in the main text), all convolutions with the Dirac delta below would be replaced with convolutions with a boxcar function (a finite nascent delta). For the 1 km thick box profile above, the results of convolution will be smoothed, 1 km running-average versions of the desired functions. The final integral formulas would be the same except that the integrands would first be smoothed by a 1 km running average. In a discrete setting with 1 km sampling of the retrieved height PDF, the assumption of a 1 km thick box profile yields the same results as using the Dirac delta. The mean partial VCD is calculated as the expectation where H and X are the domains of height and VCD (both σ finite measure spaces) and the last equality follows from the definition of the conditional PDF. Rearranging this iterated integral gives Because of the symmetry of the delta function (δ(η − h ) = δ(h − η)), the integral properties of the delta function yields Because the algebraic form of the variance is Var(X(h)) = E[X 2 (h)] − [E(X(h))] 2 , only the second moment of the partial VCD E[X 2 (h)] must be calculated to complete the formula. This quantity is Since the dummy variable η always runs between 0 < η < h, the delta function δ(η 0 −η) is always centered in the interval 0 < η 0 < h and the integral properties of the delta function can be applied again: Substitution of E(X 2 |H = η) = Var(X|H = η) + [E(X|H = η)] 2 into the formula for E[X 2 (h)] and subsequently into the formula above for Var(X(h)) completes the variance.
Similarly, for the covariance between partial VCD at two altitudes h = a and h = b with b ≥ a, only the mixed expectation E[X(b)X(a)] must be calculated: it follows that η a ∈ H, η b ∈ H and 0 < η a < η b , so that f H (η a ) E X 2 |H = η a dη a = E X 2 (a) .
Using the algebraic formula for the variance and covariance, we attain by substitution Cov X (a),X(b) = Var X (a) which is always less than Var[X(a)] unless a = b since the partial VCD is cumulative, yielding larger values at higher altitudes.

Remark
Of particular importance is that these formulae may be used to calculate the expectation and variance values of the partial VCD between two altitudes. The expected value is and the variance is

Appendix C: Bilinear interpolation of background spectrum
To smooth the changes between retrievals in adjacent background cells, we use a bilinear interpolation of the background spectra. For a general quantity (Q), the bilinear interpolation is represented as Q(x, y) = c x c y Q (x 0 , y 0 ) + (1 − c x ) c y Q (x 1 , y 0 ) + c x 1 − c y Q (x 0 , y 1 ) between the corner points (x 0 , y 0 ), (x 1 , y 0 ), (x 0 , y 1 ), and (x 1 , y 1 ), using the scalings c x = (x 1 − x)/(x 1 − x 0 ) and c y = (y 1 − y)/(y 1 − y 0 ). We interpolate the inverse error covariance matrix S −1 by this formula. However, because our background spectrum is characterized probabilistically as a set of N = 10 000 samples (y s bg ∈ Y bg ) of a correlated random vector Y bg in each seasonal 5 • × 5 • background grid cell, the samples cannot be interpolated directly by the above formula. Instead we treat Y bg = Y bg (X, Y ) as a function of a random position where (X, Y ) is a discrete random position, taking only the cell corner points as possible values. In this case, X and Y represent longitude and latitude. In particular, we characterize (X, Y ) by the probability mass function p(x i , y j ) = (−1) i+j (i −c x )(j −c y ) for i, j ∈ {0, 1}, which is simply the corner point weights in the bilinear interpolation formula. Consequently, we generate Y bg (x, y) = E (X,Y ) [Y bg (X, Y )] = i,j Y bg (x i , y j )p(x i , y j ) by sampling the discrete distribution p(x i , y j ) to generate the number of samples taken from each of the corner points n(x i , y j ) = [Np(x i , y j )], where the bracket represents rounding to the nearest integer. Using this sampling, we generate the samples of Y bg (x, y) as the collection of each of the n(x i , y j ) samples from the corners Y bg (x i , y j ). This generates a total of N samples for the interpolated background spectrum.
where the mean is and variance is where each covariance is zero because each measurement is independent. Because eachṀ t is normally distributed, this sequence of means and variances fully parameterizes its random process. To calculate k t , we also must calculate Cov(M t ,Ṁ t ): where the last two equalities follow from the independence of each M t and the fact that the central coefficient (δ t ) in any central finite difference for the first derivative is zero. For non-central differences, this is not zero, the last equality does not hold, and Cov(M t ,Ṁ t ) = 1 t δ t Var(M t ).
With random processes for the mass and mass rate of change calculated we can calculate the decay rate coefficient as a function of these two random processes: which is a ratio of two Gaussian random processes which may be correlated depending on the finite differencing scheme. The calculation of such a ratio of random variables (Fieller, 1932;Hinkley, 1969) describes the uncertainty of the decay coefficient at each time as a PDF f k t (k).
Calculating the PDF for the instantaneous e-folding time τ t = k −1 t is performed by applying standard rules for functions of random variables (from DeGroot and Schervish, 2012): Notably, the distributions are not Gaussian either the decay rate coefficient or the e-folding time .