Retrieval algorithm for the column CO2 mixing ratio from pulsed multi-wavelength lidar measurements

Abstract. The retrieval algorithm for CO2 column mixing ratio from
measurements of a pulsed multi-wavelength integrated path differential
absorption (IPDA) lidar is described. The lidar samples the shape of the
1572.33 nm CO2 absorption line at multiple wavelengths. The algorithm
uses a least-squares fit between the CO2 line shape computed from a
layered atmosphere model and that sampled by the lidar. In addition to the
column-average CO2 dry-air mole fraction (XCO2), several other
parameters are also solved simultaneously from the fit. These include the
Doppler shift at the received laser signal wavelength, the product of the
surface reflectivity and atmospheric transmission, and a linear trend in the
lidar receiver's spectral response. The algorithm can also be used to solve
for the average water vapor mixing ratio, which produces a secondary
absorption in the wings of the CO2 absorption line under humid
conditions. The least-squares fit is linearized about the expected XCO2
value, which allows the use of a standard linear least-squares fitting method
and software tools. The standard deviation of the retrieved XCO2 is
obtained from the covariance matrix of the fit. The averaging kernel is also
provided similarly to that used for passive trace-gas column measurements.
Examples are presented of using the algorithm to retrieve XCO2 from
measurements of the NASA Goddard airborne CO2 Sounder lidar that were
made at constant altitude and during spiral-down profile maneuvers.


offset frequency is step-tuned between laser pulses to scan across the CO2 absorption line. The number of laser wavelengths in the scan and the exact wavelength of each laser pulse are preprogrammed and can be adjusted via software commands (Numata et al., 2012). An electro-optical modulator is used to gate the output of the slave laser into 1-µsec wide pulses. The 65 laser pulses are then amplified by a multi-stage fiber laser amplifier. The airborne lidar's laser pulse rate is 10 KHz and there are 30 wavelengths per scan, which gives a wavelength scan rate of about 300 Hz. The transmitted laser pulse energies at all wavelengths are also sampled and the results are used to normalize the received signal to correct for the effects of laser power fluctuations.

Lidar Signal Processing and Atmosphere Modeling
The initial processing of the lidar data consists of (a) processing the stored lidar data to form a series of atmosphere transmission measurements across the CO2 absorption line; (b) generating a CO2 absorption line shape from the radiative transfer model and mereological data at the time and location of lidar measurements; and (c) performing a least-squares fit of 80 the modeled line shape function to the measurements to solve for XCO2, as shown in Fig. 3.

Lidar Signal Processing
The signal waveforms are first corrected for the detector baseline offset and other instrument characteristics and then scaled to the received optical signal power. The pulse energies from the scattering surfaces are calculated by integrating the received pulse waveforms over the pulse width interval. The relative atmosphere transmittances for all laser wavelengths are calculated by dividing the received pulse energies by the transmitted ones and then multiplying by the square of the range from 90 the lidar to the reflecting surface. The signal to noise ratio (SNR) of the atmospheric transmittances at each wavelength is estimated based on the received signal energy, the estimated background noise, and the inherent instrument noise. Finally, a least-squares fit of the modeled line shape to the lidar measurements is used to estimate XCO2 along with the other parameters.
The lidar returns from clouds are identified by comparing the target elevation from the lidar measurements, namely, aircraft altitude minus the lidar range, to the surface elevation from either the on-board radar measurements or from a Digital 95 Elevation model (DEM). For dense clouds, the signal pulse energies from the clouds are sufficient for XCO2 retrieval to the cloud top . For thin clouds and aerosols, the laser pulses can often reach the ground surface and with sufficient SNR for XCO2 retrievals to the surface. The signal waveform prior to the ground return can be averaged to obtain the atmospheric backscatter profiles at the laser wavelength which gives information about the height and density of the clouds and aerosols . 100

Model for the Lidar Signals
The average signal pulse energy reflected from the surface can be calculated from the lidar equation (McManamon, 2019), as where ! ( ) and " ( ) are the received and transmitted laser pulse energies; 105 is the laser wavelength; # ( ) is the one-way atmosphere transmission as a function of the laser wavelength; ' is the diffuse surface reflectance in nadir direction; ! is the collecting area of the receiver telescope; is the range from the lidar to the scattering surface (the column height); 110 ! is the receiver optical transmission efficiency.
The product of the surface reflectance and the two-way atmospheric transmission can be calculated from the received laser pulse energy after correcting for the range, as (2) 115 Here ( is a constant which is a function of the lidar instrument parameters and can be calibrated independently.

Model for the CO2 Absorption Line Shape
The total atmospheric transmission to the surface can be written as The first two terms, ./$ $ ( ) and 0 $ ( ) are the two-way transmissions due to CO2 and water vapor which are functions of the 120 laser wavelength . The last term, 1 $ , accounts for the transmission of aerosols and other particles, which are independent of the laser wavelength. 1 $ is often called the off-line atmospheric transmission.
To model the total atmosphere transmissions due to CO2 and water vapor, the atmosphere is divided into a number of layers. The total transmission is modeled as the product of the transmissions of all the layers traveled by the laser pulse. The layered transmission is calculated from the layered radiative transfer atmospheric model, which takes into account of the effects 125 of the temperature, pressure, and humidity at the altitude of the layer. The vertical profiles of temperature, pressure and humidity are obtained from a meteorological analysis model or from other atmospheric data along with an a priori vertical profile of CO2 mixing ratios.
The atmospheric transmissions of each layer for each of the lidar wavelength across the CO2 absorption line are calculated using the Beer-Lambert law. The total two-way transmission due to CO2 can be written as 130 where = 1,2, . . . , ( is the index for the laser wavelengths with ( the total number of laser wavelengths used in the lidar measurements; = 1,2, … , $ is the index for the atmosphere layer with $ the total number atmosphere layers; 135 ./$ 7 3 9 is the CO2 molecular density for the j th layer; 3 is the average altitude of the j th layer; ./$ 7 3 , 2 9 is the absorption cross-section of a CO2 molecule in the j th layer at the i th wavelength.
For each laser wavelength the optical transmissions of CO2 for each layer are calculated from their optical depth (OD), which is defined as the absolute value of the logarithm of the one-way atmospheric transmission. The total column OD is then calculated by summing the layered ODs over the column height of the lidar measurement. The two-way optical 145 ( 2 )]. In our XCO2 retrieval the layered OD is calculated by using the spectroscopy database HITRAN 2008 (Rothman et al., 2009)  V12.1, (Clough et al., 1992;Clough and Iacono, 1995), for the a priori CO2 mixing ratio and meteorological vertical profiles at the time and location of the lidar measurement.
The atmospheric pressure, temperature and water vapor profiles influence line broadening which affects the cross 150 section of the CO2 molecule. The LBLRTM software incorporates these effects and computes a numerical line shape function in OD at the given altitude of each layer. For the airborne data retrievals, meteorological data are obtained from the near realtime forward processing of the Goddard Modeling and Assimilation Office, (GMAO) FP system, the Goddard Earth Observing System Model, Version 5 (GEOS-5) (Rienecker et al., 2008). The data are drawn from the 8-per-day analyzed fields on the full model grid (0.25 x 0.3125 deg x 72 layers, inst3_3d_asm_Nv files). The GEOS-5 data are used for the meteorological 155 conditions for the retrievals for the times and places where the mission-directed atmospheric measurements are not available.
In measurement campaigns this occurred for most of the flights except for those in spiral maneuvers. We extract the nearestin-time, latitude-longitude interpolated meteorological soundings every minute at regular positions along the flight's ground tracks. The 42 lowest analysis levels are used for each profile location. Analyzed pressure is used for the vertical grid coordinate for any of the profiles. The surface pressure and surface height are horizontally interpolated from the model. 160 Ideally, we would want to retrieve the CO2 mixing ratio at each layer, as in conventional passive trace gas sounding.
In practice, the average laser power is limited and there can only be a limited number of laser wavelength samples across the CO2 absorption line while still maintaining adequate SNR at each wavelength. Although a few more parameters about the CO2 absorption line shape can be retrieved , they are not sufficient to provide the full vertical profile of CO2 mixing ratio. Therefore, we choose to solve for a scale factor for the modeled CO2 line shape function that minimizes the 165 error between the modeled and the lidar sampled CO2 absorption line shape at all laser wavelengths. The retrieved average XCO2 is equal to the product of this scale factor and the a priori CO2 mixing ratio. This method has been shown to work well in simulations and for retrievals from our airborne CO2 Sounder lidar measurements (Kawa et al., 2016;Abshire et al., 2014Abshire et al., , 2018.
Using the scale factor, the OD which is attributable to the CO2 absorption can be written as 170 where ./$ is the scale factor, 2 MMMMMM 3 is the a priori CO2 mixing ratio, and MMMM ./$ ( 2 ) is the a priori OD attributed to CO2 absorption. The atmospheric transmission due to CO2 absorption can now be approximated as

Solving for XCO2 from the Lidar Measurements via a Least-Squares Fit 175
The column XCO2 and several other variables are solved simultaneously from a least-squares fit of the modeled line shape to the lidar measurements. One of them is the Doppler shift in the wavelengths of the received signal, which occurs when measuring at non-nadir angles or to a sloped surface. Another parameter being solved is the product of the surface reflectance and the two-way off-line atmospheric transmission. For the CO2 line at 1572.33 nm and under high humidity, there is a weak https://doi.org/10.5194/amt-2020-466 Preprint. Discussion started: 22 December 2020 c Author(s) 2020. CC BY 4.0 License.
water vapor absorption feature on the left wing of the CO2 absorption line. The retrieval algorithm can solve this absorption 180 feature to avoid compensating it and causing biases in the retrieved XCO2. For our airborne lidar, there is a small linear trend (slope) in the received laser pulse energy as a function of the wavelength. A likely cause of this trend is the residual errors from correction of the uneven receiver spectral response of the optics, especially the optical bandpass filter. Since this is known to drift over temperature and time, the retrieval solves for this residual slop for every point of the XCO2 retrieval.
The least-squares fit may be formulated by expressing the lidar measurement data in matrix form, , a single column 185 matrix with elements 2 given by Eq.
(2). The parameters to be solved, = { 7 }, is expressed as a 8 × 1 matrix. In our case 8 = 5, and each element is defined as $ is the product of the surface reflectance and the two-way atmosphere transmission at off-line wavelength; $ = ./$ is the scale factor for the XCO2 line shape function; 8 = 06"9! is the scale factor for the water vapor line shape function; 190 : is the linear slope of the receiver spectral response.
; is the Doppler shift of the received signal wavelength. The modeled atmospheric transmission given in Eq. (7) can be expressed as a single column matrix, ( ), with each element equal to where # $ ( , ) is the atmosphere transmission defined in Eq.
(1) but expressed as a function of both the laser wavelength and the parameters to be solved, and = ( , : ) is the normalized receiver optical transmission as a function of the wavelength and the slope of the linear trend of the receiver spectral response. Here we also included the term for the water vapor.
A scalar-valued loss function can be defined as the sum of squared differences between the lidar measurement data and the model, as, 200 where [ − ( )] is an ( × 1 matrix and is a ( × ( diagonal matrix for the weighting factors. The weighting factors are chosen to balance out the contributions from the measurements at different laser wavelengths with different SNRs. The least-squares fit finds the parameter set that minimizes the loss function. For small changes in XCO2 and high SNR lidar measurements, Eq. (9) can be "linearized" by the first two terms of 205 its power series expansion about initial estimates of the parameter values, . The function ( ), which sometimes is called forward model, can then be approximated by where = ( ) = _ 0( ( ) ⋮ 07 4 % 9 b with 0( 2 ) equal to Eq. (8) evaluated at the initial value of the parameter set.
Substituting Eq. (10) into Eq. (9) and defining = ( − ) and = − , the loss function can now be approximated as For mathematical convenience, we normalize the lidar measurements with respective to their initial estimate and 215 define a new variable A diagonal matrix can be defined with each element equal to 1/ 0( 2 ). The loss function can be rewritten using the identity matrix ≡ H ≡ H , as, where = , ( ) = ( ), and = H .
The loss function given in Eq. (13) is of the same form as that of a linear least-squares fit with measurement data and weighting factor . The derivative of the function ( ), which is often referred to as the Jacobian, is given by For the CO2 Sounder lidar, each term of the Jacobian can be derived as # 〉 , same for all = 1,2, … ( ; 2,$ = −2 MMMM ./$ ( 2 ), one for each laser wavelength, = 1,2, … ( ; 2,8 , same as above but for water vapor; , which can be obtained from the modeled line shape function; 230 2,; = ( 2 − < ), with < the center wavelength of the CO2 line shape function.
For measurement noise that is zero-mean and follows a Gaussian distribution, the optimal weighting factors are given by the reciprocal of the variance of the measurement data (Bevington 1969). In our case, the optimal weighting factors can be approximated as 235 where 〈 ( 2 )〉 is the average value of the lidar measurement which is assumed to be close to the initial estimate 0( 2 ).
Therefore, for each wavelength the weighting factors can be approximated by the SNR of the lidar measurement at that wavelength. As mentioned earlier, the SNRs are estimated based on the received signal pulse waveforms. The XCO2 and other parameters can now be solved using a standard linear least-squares fitting method with the loss function Eq. (13), Jacobian Eq. (14), and weighting factors Eq. (15). One method to carry out the least-squares fit is to use the 240 pseudo inverse function, which gives a numerical solution to the parameter set, as r = with a 8 × ( matrix, which is often called gain matrix and can be computed from the pseudo inverse function (•) (Peters and Wilkinson, 1970), as The pseudo inverse matrix function can be found in Matlab and other software tools.
The covariance of the parameters can be obtained from Eq. (16), as 7 r 9 = ( ) .
with ( ) a diagonal matrix with the elements given by Eq. (15). The covariance matrix 7 r 9 is in general not a diagonal matrix even though ( ) is a diagonal matrix. 250 The variances of the estimated parameters are given by the diagonal elements of 7 r 9 as a measure of the goodness of the fit. However, variance is only one of the criteria of the XCO2 retrieval. There can still be a bias in the estimated parameters if there is a mismatch between the measurements and the model.

Averaging Kernel
The averaging kernel is a function widely used in passive atmospheric sounding measurements. For these the vertical profile 255 of CO2 mixing ratio is retrieved and the averaging kernel is defined as (Rodgers, 2000), where • and are the estimated and the actual CO2 mixing ratio of the layers, respectively, is the gain matrix and is the Jacobian. The averaging kernel gives a measure of the sensitivity of the retrieved CO2 mixing ratio to the actual one at the given altitude. However, the averaging kernel defined above cannot be used directly for our multi-wavelength IPDA 260 lidar where only the column average XCO2 is retrieved.
Here we define an averaging kernel specifically for the retrieval algorithm for our multi-wavelength IPDA lidar, as, where 2 " is a scalar and equal to the estimated average column XCO2. This new averaging kernel has $ elements, one per atmosphere layer. It gives a measure of the sensitivity of the retrieved average column XCO2 to that of each layer. The retrieved 265 XCO2 can be expressed as the product of the retrieved scale factor and the a priori column average value, i.e., 2 " = • $ 2 MMMMMMMM . The averaging kernel can be written as where = • $ ∆ ⁄ is the second row of the G matrix for calculating • $ and is an ( × $ matrix and each term can be written according to Eqs. (4), (5) and (12), as 270 Therefore, each element of the averaging kernel becomes b 7 3 9 ≈ 2 ∑ $,3 2 2 MMMMMMMM 62! 7 3 9 ./$ 7 3 , 2 9 ∆ 3 =.
The term in the bracket is the layered OD from the atmosphere model assuming uniform CO2 mixing ratio 2 MMMMMMMM . Therefore, the averaging kernel is the sum of these layered ODs weighted by the corresponding terms of the G matrix. 275

Evaluation of the Retrieval Algorithm Using Airborne Lidar Data
The algorithm described here was used to retrieve XCO2 from measurements of our 2017 airborne lidar campaign . The lidar and the airborne measurements have been described in detail in Abshire et al. (2018). Here we show an example of using the retrieval algorithm on a data set collected during one of the flights. We also show the retrieved XCO2 at different altitudes in comparison to the in situ measurements made during two spiral-down maneuvers. 280 Figure 4 shows a segment of a Level-1data set from our 2017 airborne campaign. It shows 30 transmitted pulse waveforms and the corresponding received pulse waveforms averaged over 32 laser wavelength scans. The decrease (tilt) of laser pulse amplitudes over the pulse width interval is caused by the depletion of energy stored in the laser amplifier media, which does not affect the IPDA lidar measurements. The energies of the transmitted laser pulses fluctuate by a few percent, which is normalized out in the signal processing. The tails in the transmitted pulse waveforms shown in Fig. 4 (b) are caused 285 by an artifact of the laser monitor detector, which is different from the one used in the receiver. The amplitudes and energies of the received laser pulse waveform plotted in Fig. 4 (c) clearly show the CO2 absorption near the center of the wavelength scan. The XCO2 retrieval is carried out at 1 Hz, during which the host aircraft typically travels about 200 m.
For the least-squares fit the weighting factor for each wavelength is the square of the SNR of the lidar detected signals at that wavelength. The SNRs are estimated from the received lidar signal as (Gagliardi and Karp, 1995) 290 Here 〈 ' ( 2 )〉 is the average number of received signal photons per pulse at wavelength 2 , 〈 n 〉 is the average gain of the avalanche photodiode (APD), n is the APD quantum efficiency, n is the APD gain excess noise factor, ' is the integration For the XCO2 retrieval, the average number of received signal photons are estimated from the received pulse waveform. This is obtained by first integrating the received pulse waveform from the detector in volts, dividing the result by the detector responsivity in volts/W, and the photon energy in Joules. The average number of background noise photons is estimated from the average surface reflectance, off-line atmosphere transmission, the nominal sun light irradiance on the 310 surface, and the receiver optics model. All other parameter values in Eq. (24) are instrument related and can be found in Abshire et al. (2018). Figure 5 shows the CO2 absorption line shape sampled by the lidar along with that from the model assuming a constant a priori XCO2 vertical profile of 400 parts per million (ppm). It also shows the placement of laser wavelengths across the CO2 absorption line. One laser wavelength (the second from the left) was placed at a secondary peak due to water vapor. Three 315 wavelengths were placed on the wings of the CO2 absorption line. The rest were roughly equally spaced in OD along the absorption line. The residual differences between the measurements and the model after the least-squares fit is also plotted in  averaging time and the models before and after the fit (red and black lines). Bottom, differences (the residuals) 320 between the lidar measurement and the models at the lidar wavelengths before (in red circles) and after (in black circles) the least-squares fit for the data set shown in Fig. 4.
The averaging kernel is calculated based on Eq. (22) for each fit of 1-second lidar measurement data. The averaging kernel and a 4 th order polynomial fit for the data shown above are plotted in Fig. 6. 325 Figure 7 shows the results of the retrieval using the algorithm described above from the airborne CO2 Sounder lidar measurements made on July 21, 2017 starting at 00:30:00 for 820 seconds. The data consist of about 500 second segment measured at a nearly constant aircraft altitude followed by about 300 seconds in the spiral descent. The last part of the flight was near Edwards Air Force Base CA and the surface elevation was nearly constant for the last 500 seconds. The retrieved XCO2 during this period was steady with a slow downward trend. The root-mean-squared (rms) variation of the retrieved 330 XCO2 from 2100 to 2300 seconds was 0.67 ppm, which includes both the fitting error and the actual XCO2 variation along the flight path. By comparison, the estimated standard deviation from the covariance matrix was about 0.35 ppm (Fig. 7).  2017. The in situ profile was measured by an updated version of the AVOCET gas analyzer on board the airplane (Vay et al., 2011). The in situ XCO2 at the airplane altitude is obtained by integrating the CO2 profile from the in situ measurements weighted by the same averaging kernel. The XCO2 retrieved from the lidar measurements agrees with that from the in situ measurements at all airplane altitudes above 4 km. Below 4 km, the laser beam no longer completely overlaps the receiver field of view and the total CO2 absorption (line depth) becomes too low to be reliably measured. 345  The blue squares are the column integrated readings of the AVOCET gas analyzer from the altitude plotted to the surface. For the spiral-down comparisons, the median differences (biases) between the retrieved lidar and column in situ measurements are 0.72 ppm for July 21 and 0.16 ppm for August 8, 2017.

Biases in the Retrieved XCO2 365
The least-squares-fit method minimizes the sum of squared errors but does not guarantee minimum biases in the estimated parameters. The variances of the solutions approach zero as the SNR increases, as shown in Eq. (18). However, a mismatch between the CO2 absorption line shape derived from the model and the actual line shape can cause biases in the retrieved results. Therefore, it is important to model the atmosphere accurately and avoid systematic errors.

Choice of Laser Wavelengths 370
The choice of the laser wavelengths is a tradeoff among several factors. The total number of laser wavelengths has to be greater than the number of parameters to be solved from the lidar measurements. However, the total average laser output power is fixed. Using fewer wavelength samples improves the SNR for each sample, but provides fewer constraint to the fit. More wavelength samples lower the SNR at each wavelength, but allows solving for more parameters and help to reduce the bias in the XCO2 retrieval. There is also an advantage to select the laser wavelengths to be symmetrically distributed about the line 375 center since the effect of the nonuniformity in the receiver's spectral response may cancel out .
The airborne CO2 Sounder lidar mostly used 30 wavelengths with four off-line, four about the center of the peak absorption up to OD=1.2, one on the water vapor peak absorption, and the rest approximately uniformly distributed in OD . This choice of the laser wavelengths produced measurement precisions < 1 ppm and biases < 1 ppm. Abshire et al., (2018) also report airborne measurement made using 15 laser wavelengths that showed no apparent difference 380 in the XCO2 measurements to those using 30 wavelengths.

Number of Parameters to Retrieve
It is possible to use the least-squares fit to solve for more parameters of the CO2 line and lidar instrument, as long as the number of parameters to retrieve is less than the degrees of freedom of the lidar measurements. However, solving for more parameters, especially when they are correlated, increases the variance in the retrieved values, which limits the benefit. 385 One example is to divide the atmosphere into a few layers, each with its own line shape function and scale factor, to obtain some information about the vertical distribution of XCO2. However, the results from the least-squares fit for the XCO2 for the layers are correlated and the errors from the fits are usually too large to be useful (Chen et al., 2014). A singular value decomposition (SVD) method also has been used to extract a few more parameters about the line shape without the need for an a priori vertical XCO2 profile . The SVD method can retrieve the certain characteristics of the 390 line shape and provide certain information about the vertical distribution of XCO2

Conclusion
An algorithm to retrieve XCO2 has been developed for measurements from a pulsed multi-wavelength integrated path differential absorption (IPDA) lidar. The retrieval algorithm uses a least-squares fit of the line shape function derived from a multi-layer atmosphere radiative transfer model based on meteorological data to the line shape sampled by the lidar 395 measurements. In addition to XCO2, the algorithm simultaneously solves for the product of the surface reflectance and the offline atmosphere transmission, Doppler shift of the received laser signals, a secondary water vapor mixing ratio (if present), and a linear trend of the lidar receiver non-uniformity in its spectral response. Since it can accurately retrieve XCO2 in the presence of these conditions, this approach provides a more robust measurement of XCO2 compared to IPDA lidar that use only on-line and off-line wavelengths. The retrieval algorithm has been used successfully in the data processing of the NASA 400 Goddard's multi-wavelength pulsed IPDA lidar from its 2016 and 2017 airborne campaigns. The algorithm may also be used for retrievals for multi-wavelength lidar that target other atmospheric gases, such as CH4.