Quantification and mitigation of the airborne limb imaging FTIR GLORIA instrument effects and uncertainties

The Gimballed Limb Observer for Radiance Imaging of the Atmosphere (GLORIA) is an infrared imaging FTS spectrometer with a 2-D infrared detector operated on two high flying research aircrafts. It has flown on eight campaigns and measured along more than 300 000 km of flight track. This paper details our instrument calibration and characterization efforts, which in particular leverage almost exclusively in-flight data. First, we present the framework of our new calibration scheme, which uses information from 5 all three available calibration measurements (two blackbodies and upward pointing “deep space” measurements). Part of this scheme is a new correction algorithm correcting the erratically changing non-linearity of a subset of detector pixels and the identification of remaining bad pixels. Using this new calibration, we derive a 1-σ bound of 1% on the instrumental gain error and a bound of 30 nW cm−2sr−1cm on the instrumental offset error. We show how we can examine the noise and spectral accuracy for all measured at10 mospheric spectra and derive a spectral accuracy of 5 ppm, on average. All these errors are compliant with the initial instrument requirements. We also discuss, for the first time, the pointing system of the GLORIA instrument. Combining laboratory calibration efforts with the measurement of astronomical bodies during the flight, we can derive a pointing accuracy of 0.032°, which corresponds to one detector pixel. 15 The paper concludes with a brief study on how these newly characterised instrumental parameters affect temperature and ozone retrievals. We find that, first, the pointing uncertainty and, second, the instrumental gain uncertainty introduce the largest error in the result.

These terms are all functions of both the detector pixel (u, v) and wavenumber ν of the spectral sample.
Assuming the response of the detector to be linear with respect to the incoming photon flux (see also Sect. 4.2), the gain g and offset L o may be determined from two uncalibrated measured spectra S 1 ∈ C and S 2 ∈ C of blackbody radiation sources with known emission characteristics. and with B being the Planck function and T the temperature of the corresponding blackbody. If a deep space spectrum S d is used as one blackbody spectrum, the radiation from this source is (effectively) zero, and Eqs. 2 and 3 simplify and As shown by Kleinert et al. (2018), the calibration parameters determined by two blackbodies with a rather small 140 temperature differences are more susceptible to errors in, e.g., blackbody temperature or homogeneity than the combination of one blackbody and a deep space measurement. Since the radiance from the cold blackbody is closer to (but still above) the radiation coming from the atmosphere, it is better suited for the determination of the gain function than the hot blackbody. We therefore determine the gain once every hour from the cold blackbody and the deep space measurements.
The instrument offset is dominated by its thermal self-emission and changes during the flight, along with the changing temperature of the instrument. When the gain is known, the offset can be calculated from this gain and the spectrum of one calibration source. Since the recording of the blackbody spectra is much faster and is thus taken more frequently, we use the gain and the cold blackbody measurements to determine the instrument offset every 15 minutes. Between the calibration measurements, the offset is linearly interpolated in time except for the 155 contribution of the outer entrance window (see Sec. 4.3).
The main differences to the calibration method described by Kleinert et al. (2014) are the use of a technique for removing atmospheric emissions from deep space spectra (see Sect. 4.1), an improved non-linearity correction (see Sect. 4.2), and the averaging of gain magnitude over the flight and real part of the offset over forward and backward sweep directions.

Methods
This section presents methods developed to account for effects of a real instrument at (comparatively) low flight altitude. For each effect we will give first a short explanation of the physical cause and then detail the method developed to compensate and characterize this effect in the calibration process.
4.1 Removal of atmospheric contribution to calibration spectra 165 For satellite measurements, the instrumental self-emission can be measured by directly looking into true deep space with effectively zero radiance. Due to GLORIA's position below the aircraft, the maximum elevation angle of its optical axis is ≈10°above the horizon. Taking deep space measurements at a 10°angle, combined with a flight altitude below 15 km, there is still a considerable amount of atmosphere in the line of sight of these deep space measurements. In order to remove the atmospheric radiance emitted by the air within this line of sight, we use 170 atmospheric retrieval techniques to best model the atmospheric contribution to the measured signal and then subtract the forward calculated atmospheric spectrum from the measured one. As a starting point, we calibrate the deep space measurements using gain and offset functions derived from measurements of the two onboard blackbodies.
The median spectrum over the whole detector array is used for the representation of the radiance of the central pixel.
This is justified by the observation that the radiance variation over the detector field at this upward pointing elevation 175 angle is rather linear with the elevation angle or pixel row. The median is insensitive to outliers, hence the bad pixels have a negligible impact on the result.
The forward calculation of the spectra requires a priori assumptions on the atmospheric state, i.e., pressure, temperature, and volume mixing ratios (VMRs) of the relevant trace gases emitting in the spectral range covered by the measurements. Pressure, temperature, H 2 O and O 3 are taken from ECMWF analysis data (e.g., Dee et al., 2011), (Karlsruhe Optimized and Precise Radiative transfer Algorithm; Stiller, 2000) and the retrieval software KOPRAFIT (Höpfner et al., 1998).
The goal is to model at best the atmospheric contributions present in the measured deep space spectra in order 185 to unveil the instrument self-emission. The actually retrieved temperature and trace gas profiles are not employed further as this model is tuned towards fitting the measurements optimally and not to derive realistic atmospheric parameters. The atmospheric spectrum is modeled in the spectral range from 750 to 1400 cm −1 , allowing for a good calibration quality also slightly outside the specified spectral range starting at 780 cm −1 .
The fit is performed iteratively. First, a residual radiance offset is fitted and subtracted from the measured spec-190 trum. In the next step, a broadband fit is performed with temperature and eight gases (H 2 O, CO 2 , O 3 , N 2 O, CH 4 , HNO 3 , CFC-11, and CFC-12) as fit parameters. Altogether, 29 gases are used in the forward calculation. The fit is then refined by fitting single gas profiles in dedicated microwindows in several iterations. The gases and iterations are shown in Table 2 and described in more detail in Appendix A.
The fit results are then used for a forward calculation over the whole spectral range. The processed spectra, from 195 which atmospheric emissions were removed, are called shaved spectra. The shaved and measured spectra are shown in Fig. 2. Besides some remaining atmospheric features, the residual reveals a Planck-like offset and some broadband structures below 850 cm −1 , which can be attributed to the germanium entrance window. The remaining atmospheric features are reduced to the order of 10 nW cm −2 sr −1 cm.
The forward calculated spectrum is valid for the central row of the detector, because the median over the array has 200 been fitted. Although the variation with elevation angle is rather small for the upward-looking measurements, it is not fully negligible. Therefore, forward radiative transfer calculations are performed for an elevation of +8°and +12°, corresponding to the lowermost and uppermost detector row, respectively. For the rows in between, these spectra are linearly interpolated (see Fig. 2). The residual is slightly different to the one calculated for the corresponding elevation angle of +10°, but the differences are much smaller than the remaining atmospheric features and therefore 205 negligible.
The subtraction of atmospheric features from the uncalibrated spectra is done for each pixel individually by interpolating the forward calculated spectra to the corresponding row and multiplying this spectrum by the gain function of the corresponding pixel, using the gain that was determined from the two blackbody measurements.
Several small sections of the spectrum are then linearly interpolated from the neighboring spectral samples, either 210 because of still remaining atmospheric features (namely Q-branches of CO 2 , HNO 3 , and CH 4 ) or because of spikes in the spectrum at distinct known frequencies due to electrical noise.
The uncertainty of the instrument offset determination is estimated from the difference between the measurement after subtraction of broadband offset and the forward calculation. It is estimated to 20 nW cm −2 sr −1 cm (2 sigma).
Details of this uncertainty estimation are given in Appendix A.

Detector non-linearity correction
The detector is subject to non-linearity, which causes a change in sensitivity depending on the overall photon load.
In a first order approximation, this effect causes a scaling of derived spectra, which depends on the photon load of the scene (see App. B for more details).
As described by Kleinert et al. (2014), the non-linearity of the detector has been characterized by carefully per-220 forming dedicated measurements of a constant source while varying the integration time on ground. From these measurements, one correction curve for all pixels was determined. This curve works well for most of the pixels, but a considerable number of pixels show spontaneous changes in their non-linear behavior which makes the derived correction unsuitable for these pixels. We attribute this to the different thermal expansion coefficients between the detector material and the silicon substrate, causing make-and-break contacts Perez et al. (2005). In earlier data ver-225 sions, these pixels had simply been filtered out, but rigorous filtering leads to a considerable decrease in the number of usable pixels. Therefore, we developed a method to determine a correct gain for these pixels as well. Extending the work of Guggenmoser et al. (2015), we constructed an algorithm to exploit the smoothness of the instrumental offset L o to correct the faulty pixels.
As the instrument is focused on infinity, the instrumental offset must be a spatially smooth function, as any 230 features in the image from objects residing within the instrument are effectively folded with a Gaussian with a very large support. Thus, we assume that any spatial discontinuities in L o are caused by the uncorrected non-linearity of the involved pixels. The assumption on spatial smoothness was exploited by Guggenmoser et al. (2015) to improve upon the offset without correcting also the gain.
The non-linearity causes, in a first order approximation, a scaling of the affected spectra. The atmospheric and 235 deep space measurements are reasonably close in photon load and are thus behave in a very similar fashion. In contrast, blackbody measurements have a much stronger signal and scale differently for the problematic pixels. Thus, we use a non-linear fit to estimate pixel-wise scalar non-linearity scaling factors for the blackbody measurements (see Appendix B for details). We derive these non-linearity scaling factors for adjusting the blackbody spectra such that the offset L o is free from discontinuities over the whole spectral range (see (Eq. 4)). A set of such derived 240 non-linearity scaling factors is depicted in Fig. 3. The irregular, clustered distribution of bad pixels is clearly visible.
The uncertainty in determining theses factors is slightly larger in the circular region where the instrumental offset is close to zero for most of the spectral range. The uncertainty in determining the factors is in the order of a third to half a percent. Comparing the values derived from the forward and backward sweep direction suggests an uncertainty of ≈0.5% on average. Figure 4 shows the histogram of derived correction factors. The factors cluster around the one 245 value, with a Laplacian-like distribution in the center. However, there is a significant number of pixels with a strong non-linearity in the 5% to 10% region.
The non-linearity correction factors are derived for each calibration sequence containing a deep space measurement, but are applied also to blackbody measurements in between by linear interpolation in time. As the non-linear behavior is known to change between flights, it may also change within a flight. We thus analyse a subset of atmospheric spectra for calibration artifacts. Excluding clouds, atmospheric spectra are typically spatially smooth as well.
Using a similar method as described above, we now determine a scaling factor for the gain function. With a perfect correction or instrument, a value of 1 is expected for all pixels. Differences from one can be interpreted as remaining error in gain after non-linearity correction. Analysing a set of 158 atmospheric measurements uniformly distributed 255 over flight 16 of the WISE campaign gives Fig. 5. The determined scaling errors are much lower than the correction factors applied to the raw blackbody spectra. Only the lowermost rows show large errors; these interferograms taken at lower altitudes already exhibit a mean value sufficiently different from those of deep space measurements to be subject to a different point on the non-linearity curve compared to deep space spectra.
The resulting row-averaged scaling errors are within ±1%, whereby some of the largest differences are due to 260 individual pixels of high variability and the rows where (coloured) noise is generally quite high. The standard deviation of the remaining error in gain averaged over rows can be thus computed to be ≈0.2%.

Outer window emission correction
The entire instrument, including all windows, is calibrated in flight using deep space and blackbody measurements. per minute. The instrumental gain is computed from calibration sequences with a stable instrument and window temperature and averaged over the flight such that only the instrumental offset is affected by this problem.
The spectral emission signature of the germanium window is deduced from in-flight measurements at the beginning of the flight, where the window temperature changes rapidly, while the temperature of the other instrument components stays rather constant. We attribute the difference of the measured instrument offset between the first 275 two calibration sequences to the change in window temperature and calculate the emissivity of the outer germanium window as with L o1 and L o2 being the calibrated instrument offset calculated from the first and second calibration sequence and T 1 and T 2 being the temperature of the outer window at the measurement times of the first and second calibration 280 sequence. We did this for several flights, discarding outliers and averaging over the rest. The resulting spectral emissivity of the germanium window is shown in Fig. 6. The impact of the germanium window emission is most readily noticeable in the spectral range around 830 cm −1 , where the atmospheric signal is very low, but affects also lower wavenumbers. The window temperature changes too fast to be captured by the regular calibration measurements 285 after take-off, when the window rapidly cools with dropping environmental temperatures, during and after dive maneuvers, and in situations where the window is exposed to direct sun light. We also found the window temperature to fluctuate by ≈0.5 K during tomographic measurement patterns, where GLORIA quickly points towards different azimuth angles and the window is thus subject to different air flow patterns.
In order to account for these rapid temperature changes of the entrance window, we developed the following 290 approach: Given two calibration measurements at times t 0 and t 1 , we compute the instrumental offset L o at time t with t 0 < t < t 1 by linear interpolation: We now add a correction term for the window emission to retain the measured instrumental offset at times t 0 and 295 t 1 while compensating for the changes in window emission due to the measured window temperature T win (t) in between. The improved instrumental offset L * o is thus computed as The window emission needs to be enhanced by the factor of (1 − ) −1 as the gain function already takes into account the absorption characteristics of the outermost window. Here, the emission takes place within the instrument and 300 thus, the emission of the outermost window is not attenuated. Please note, that the instrumental offset is subtracted from the measured spectra.
As an example, the effect of the correction is shown for two wavelengths in Fig. 7. Due to the germanium emission characteristics, radiances at 830 cm −1 are strongly affected by the window emission feature, while radiances at 950 cm −1 are mostly unaffected. Figure 7a shows the radiances at 830 and 950 cm −1 for the uncorrected calibration.

305
In situations of strong temperature fluctuations (i.e., after 13:20) the radiances at 830 cm −1 behave very differently from the signal at 950 cm −1 . Figures 7c and 7d show the same data when applying the discussed window correction.
The radiances at 830 and 950 cm −1 are now much more consistent.
The situation depicted in Fig. 7 shows an untypically large variation in outer window temperature. In this worst case, the amount of correction applied has a standard deviation of 7 nW cm −2 sr −1 cm at 830 cm −1 and 0.5 nW cm −2 sr −1 cm 310 at 950 cm −1 . We assume that only ≈90% of the effect can be corrected in this fashion (due to uncertainties in both the window emissivity estimate and window temperature measurements) and that thus a systematic error of at most 1 nW cm −2 sr −1 cm may remain after correction at wavenumbers below 900 cm −1 and one tenth of that above.

Parasitic image correction
As analysed in detail for GLORIA by Sha (2013), reflections of incoming light at the surfaces of the beam splitter 315 cause positive and negative parasitic images because the surfaces of beam splitter and compensator plate are wedged.
Typically, the beam splitter is mounted such that these images lie in the horizontal plane and are thus invisible in the horizontally averaged radiance data. During the WISE campaign (and the immediately preceding StratoClim campaign), the beam splitter was turned by 90°and the parasitic images were located on the vertical axis and thus caused noticeable distortions in averaged data. The magnitude of these parasitic images is in the order of a few 320 percent of the original signal, which introduces significant errors in the vicinity of strong gradients in radiation, i.e., over cloud tops. Therefore we have developed a correction method, which is also applicable if the parasitic images lie in the horizontal plane and the scene is not homogeneous.
The effect is most readily visible in the moon measurements taken for pointing analysis. Figure 8 shows an exemplary image used to characterize the effect. For simplicity's sake, we assume that the parasitic images can 325 be simulated by a simple convolution of the unperturbed image with a vector with just three non-zero entries, which sum up to one, whereby the center value represents the "correct" image and the outer ones the negative and positive parasitic images, respectively. Under these assumptions, the effect can be corrected nearly perfectly using a convolution of the incorrect image with an inverted vector. At the borders, we simply extrapolate the uppermost and lowermost row, respectively, for the convolution.

330
The correction vector contains the position (in whole pixels) and the magnitude of the upper and lower parasitic image, respectively. These four free parameters were determined from six independent moon measurements taken during three separate flights and located at various locations on the detector. The parameters were varied manually until we got a satisfactory correction for all six measurements. We found a common shift of the parasitic images of 16 pixels and a factor of 1.8±0.2% for the negative, upper image and -2.5±0.2% for the positive, lower image. The 335 errors are estimated from the range of visually acceptable values. Figure 8b shows the corrected image. One sees small remaining artifacts, which could not be fully removed by tuning the four parameters; we believe this to be caused by our overly simple and discrete model of the effect.
To properly quantify the remaining effect after correction, we inspected spectra averaged over the three regions indicated in Fig. 8. The three spectra represent (A) the background of the atmosphere outside of the region affected 340 by the parasitic image of the moon, (B) inside this region and (C) the corrected version. Figure 9a shows the three spectra. Spectrum B is generally decreased outside the 1 000 cm −1 region, where the radiance field is vertically homogeneous due to strong ozone emissions. The difference plot of the affected spectrum (B-A) in Fig. 9b shows a discrepancy of up to ≈100 nW cm −2 sr −1 cm. The corrected spectrum (C-A) does not exhibit obvious defects above the NESR level.

345
The effect is worst at cloud tops, where the radiance can quickly drop from >5 000 nW cm −2 sr −1 cm down to <500 nW cm −2 sr −1 cm over a couple of pixels. An underestimation of up to 100 nW cm −2 sr −1 cm above cloud top was observed in a couple of profiles. With the assumed uncertainty, the effect is reduced by an order of magnitude to ≈10 nW cm −2 sr −1 cm in the worst case (close to cloud tops), but typically it remains below the noise level.

350
The behavior of individual pixels of the detector of GLORIA changes significantly from flight to flight to the extent that we need to determine a list of bad pixels for each flight individually. These pixels are then excluded from horizontal averaging when preparing the final level 1 products. The classification between "good" and "bad" is always somewhat arbitrary, and there is a considerable number of pixels which do not belong clearly to either category. The goal of our bad pixel identification is to identify and discard only the worst pixels that would affect 355 the level 1 product when included. We define good pixels as pixels that agree with the median value of their row.
To exclude effects of an inhomogeneous scenery on the one hand and use measurements closely resembling regular atmospheric measurements on the other hand, we decided to analyse the deep space measurements, which are cloud free and available in sufficient quantity for all flights.
For each pixel and deep space measurement, we compute the root mean square error (RMS) between its value and 360 the median of the row over all spectral samples. Analysing each flight individually, we can examine the histograms of the so computed RMS, which always show a very similar structure (Fig. 10): a Gaussian-like peak trailing slowly off to higher values. This is composed of the behavior of "standard" pixels with a Gaussian noise distribution on the one hand, and the influence of other pixels with increased noise or other erratic behavior on the other hand. General noise level and spread varies from flight to flight and from campaign to campaign due to the configuration, employed 365 electronics, and detectors. We assume that the left side of the distribution in Fig. 10 resembles closely the Gaussian distribution of "good" pixels and fit a Gaussian function with unknown mean, standard deviation, and scaling to it.
About 4 percent of pixels are beyond the limits of the plot. The Gaussian curve fits reasonably to the left hand side of the distribution and the peak. We then define those pixels as bad, for which the median of the difference computed over all deep space measurements is larger than the mean plus 9 times the standard deviation. We use the median 370 here, as defects in the calibration offset of a single calibration sequence could otherwise cause a large number of pixels to be discarded. Figure 11 shows this median (panel a) as well as the masks derived by different thresholds.
The 9-sigma threshold is by design very inclusive and the probability to exclude "good" pixels is negligible, as we are not interested in discarding a pixel solely for displaying a slightly increased amount of noisiness. Still, we find that ≈10 percent of the detector pixels are excluded by this criterion on average. About half of these excluded pixels 375 are obviously defect and lie mostly on the outermost columns, where the read-out electronics has known issues. The other half shows a variety of behaviors; for example, some pixels exhibit a telegraph noise pattern, switching their mean value rapidly between different levels, other change their non-linearity during the flight and have thus an offset during longer periods of time. Figure 11c shows the finally chosen mask for an example flight. In Fig. 11a, a mask with a stricter 3-sigma 380 threshold is shown for comparison. This mask shows large clusters of masked pixels in the top left corner, which are likely flagged due to border effects in the smoothing of calibration offset. For reference also a more relaxed 12-sigma threshold is depicted in Fig. 11d. To further motivate the chosen threshold, we also examined both the noise of the level 1 data as well as the quality of level 2 results (trace gases and temperature). All depicted masks perform very well with respect to level 2 results, whereas applying no mask causes artificial horizontal structures in 385 level 2 data. Estimating the average noise of spectral samples of cloud-free pixels gives us Table 3. Employing no filtering increases the average noise value significantly. All thresholds effectively filter out really bad pixels to the extent that the estimated noise is very similar. Due to the irregular distribution of more noisy pixels on the detector, using a strict threshold can decrease the number of remaining pixels in some rows to only a handful, causing the average noise value to increase again. To make the most of the available measurements we thus decided on the 390 9-sigma threshold.

Pointing analysis
GLORIA makes use of a highly sophisticated and precise pointing system based on high-precision sensors and a gimbal mount providing agility on all three axes. The pointing system enables two different limb view acquisition modes for high spatial and high spectral resolution, nadir pointing, as well as calibration. Different control modes are 395 connected to these observation scenarios, because the requirements are different. The major features of the different control modes and the pointing system are described in Appendix C.
An additional camera operating in the visible spectral domain is mounted on the interferometer. This camera covers a wide field-of-view (FOV), thereby completely enclosing the FOV of the spectrometer. For most of the interferograms, correlated images of the scene or video sequences are taken. The information provided by this 400 camera can be used for cloud identification and for pointing quality analysis.
Limb sounding an its retrieval is strongly dependent of the acquisition and absolute knowledge of the line of sight. Therefore an on-ground calibration is inevitable in order to determine pointing offsets and to get a good pointing acquisition in flight. For this purpose, we have built a calibration optics system delivering parallel beams with a broadband infrared light source and an off-axis parabolic mirror. Since this optical system is made for several 405 purposes like determining FOV or adjusting the focal length of the spectrometer, there are five sources arranged like the "five" on a die, but in fact for the pointing calibration just the source in the middle is used. The sources can be seen both in the visible and the infrared spectral range. The calibration optics is mounted on a tripod and the beam of the middle light source is adjusted with the help of a theodolite to be horizontal with an accuracy of ≈0.05 mrad. This optical system being placed in front of the instrument pointing towards the spectrometer allows 410 for determination of the offset angle between the nominal horizon of the gimbal control and the real horizon (see Fig. 12). This offset value is commanded to the control system, so that the output of measured elevation is referenced to real horizon. The image of the source on the GLORIA detector has a size of 3 to 5 pixels (see Fig. 13). We assume an uncertainty of 1 pixel for determining its center, which corresponds to ≈0.03°.
azimuth this is more difficult. The best azimuth calibration can be performed in the hangar of HALO at DLR. In this home base hangar of HALO, TU Dresden measured the precise coordinates of reference points marked on the hangar walls and also on the floor close the the position of the integrated GLORIA (Scheinert and Barthelmes, 2014). With these marks, it is possible to validate absolute azimuth and elevation with respect to the pointing system of GLORIA.
In the laboratory at KIT there are known positions of some landmarks too, so the azimuth can be determined there 420 as well but not as accurately as in the hangar at DLR. We thus assume a higher uncertainty in azimuth of ≈0.1°.
Due to the high demands of limb sounding, the need for determination the absolute pointing knowledge in flight arises. The on-ground calibration has to be verified and corrected, because the absolute pointing of the instrument typically changes due to thermal warping between ground and flight conditions. This absolute attitude might also change from flight to flight during a campaign, for instance due to the forces working on GLORIA during landing 425 of the aircraft. Sometimes, mis-configurations of the instrument or the exchange of the navigation system with the spare have a similar effect.
In order to perform an in-flight LOS calibration, a suitable astronomic object has to be observable by GLORIA, i.e., close to the horizon and on the right side of HALO. During WISE several dedicated observations of moon-rise and moon-set were taken, providing an absolute calibration source under flight conditions. Since flight time and 430 path of the aircraft were determined by scientific goals, such measurements were only feasible during three flights of the WISE campaign as secondary objectives. One such measurement is shown in Fig. 14. Refraction of the visible light close to the horizon impacts the apparent position of the moon. Our approach to correct for this is described in Appendix D. Analysing measurements of past campaigns revealed some previous accidental moon measurements, whereas campaigns after WISE continue to take moon calibrations whenever feasible. The visible camera can easily 435 locate Venus and other planets visible to the naked eye, but the derived LOS from such measurements comes along with the added uncertainty of alignment between visible and IR camera. In the infrared, the planets are much less bright, but we could successfully identify Venus after averaging 48 images providing an alternate target for line-ofsight calibration.
In addition to these direct but sparse pointing measurements, we perform level 2 retrievals to determine the atti-440 tude. Here, we use data of two such retrievals: one based on data acquired with high spectral sampling (0.0625 cm −1 ; Johansson et al., 2018) and one based on data acquired with intermediate spectral sampling (0.2 cm −1 ; see Appendix F for details), which is available for all profiles. For campaigns prior to 2017, for some flights only data in coarse spectral resolution is available (0.625 cm −1 ). For these flights a different approach based on the retrieval described by  was used, where instead of temperature an elevation correction value was 445 derived (the CO 2 lines used starting from WISE are not sufficiently resolved in the early campaigns). We typically use only a single correction factor for a flight, which is determined after filtering short (less than 3 km between instrument and cloud top) profiles from the mean of the remaining values. An error estimate is computed from the standard deviation.
using the on ground calibration before and after the campaign. Between WISE flights 2 and 3, the inertial navigation system needed to be exchanged with the spare, causing a change in elevation offset. Between flights 13 and 14, the pointing was readjusted using information from preliminary level 2 and moon data that suggested a systematic offset of 0.17°at the time. Thus, the pre-campaign calibration is applied for flights 1 and 2 and the results of the post-campaign calibration are applied for the remaining flights, respectively. The values of the different level 2 455 retrievals agree within the respective error bars. In this particular campaign, the moon calibration seems to indicate a systematically smaller elevation correction, but other campaigns show also higher values. The differences between the various methods and calibrations are consistent within the estimated uncertainties. For the further processing we typically select the most reliable value (depending on number of available profiles and spectral resolution) from the level 2 result. As remaining error we assume simply an uncertainty of one pixel, i.e., ≈0.032°.

Performance and Characterization
This section gives an analysis of the quality of our level 1 data from in-flight data and aggregates the (simplified) result in a table to serve as a basis for error estimates of level 2 products such as temperature or trace gas VMRs.

Noise equivalent spectral radiance (NESR)
Friedl-Vallon et al. (2014) estimated the NESR of GLORIA from selected measurements to assure it is within 465 specification. Here, we extend this work to estimate the noise of individual atmospheric measurements for different altitudes and flights. For the level 2 processing we need to determine the NESR associated with spectra averaged over an entire row. We thus focus on the NESR of spectra averaged over the detector rows.
There are several methods to estimate the NESR from measured spectra. In an ideal instrument, the imaginary part of calibrated spectra contains only measurement noise. In practice, however, it also contains some residual signal 470 due to, e.g., small phase errors (the effect of which is negligible in the real part of the spectrum), asymmetries in the interferogram due to a variable scene (especially in presence of clouds), and artifacts introduced by calibration inaccuracies in the imaginary part of the instrument offset. Instead, we are using two different methods to determine the NESR from the real part of the spectrum.
For the first method, we look for each detector pixel at the radiance variation over seven consecutive measurements 475 of a deep space sequence. We can safely assume that the observed scene stayed constant during this brief time frame.
These pixel-based estimates are used to determine the NESR of horizontally averaged values using the bad pixel mask determined in Sect. 4.5. Then, the resulting NESR spectra are averaged vertically to present a single spectrum.
For the second method, we only use a single deep space measurement and look at the horizontal variation from detector pixel to detector pixel. In particular, the NESR is estimated by computing the horizontal standard deviation 480 (again excluding bad pixels as determined according to Sect. 4.5) for the first measurement of the sequence only.
These values are then divided by the square root of the number of corresponding valid horizontal pixels and averaged as above over the vertical dimension.
Results from both methods plotted against wavenumber are compared in Fig. 16. These noise spectra were computed for deep space measurements processed in the full 0.0625 cm −1 spectral resolution of the high spectral reso-485 lution mode as well as the reduced 0.625 cm −1 of the high spatial resolution mode, which we also use for blackbody measurements. The NESR spectra derived from horizontal variation are 10% and 5% higher than those derived from temporal variation. This is partially expected as calibration noise and calibration inaccuracies only contribute in the horizontal variation analysis. We also observe additional structures in the ozone 1000 cm −1 band, which we associate with imperfections in the shaving. Locally enhanced values in the NESR are caused by electrical disturbances.

490
The highly resolved NESR spectra are 3.14 and 3.03 times higher than those derived from low resolution data for the time and horizontal method, respectively. This is reasonably close to the expected factor of √ 10 such that the NESR can be estimated from measurements with either resolution and scaled to any different resolution employed in atmospheric observations.
Employing the second method, we can now produce an NESR estimate for all of our atmospheric measurements 495 allowing us to closely track instrument performance over the whole flight. Figure 17 shows an example of such an analysis for a set of roughly 200 evenly distributed measurements from one flight. Figure 17a is depicting an NESR averaged spectrum; only spectra determined as fully cloud free (having a cloud index of more than 6 according to (Spang et al., 2012)) are included in this average. Figure 17b shows the evolution of noise over time as a pseudocolor plot. The NESR is spectrally averaged in the range from 750 cm −1 to 1450 cm −1 . The beginning of the flight 500 before 14:00 shows increased NESR values, which can be attributed to the higher blackbody and outer window temperatures. The higher blackbody temperatures at the beginning of the flight require a shorter integration time for the calibration measurements, which are thus subject to a higher NESR, affecting the calibration quality. The high values in the lower part of the detector array are due to clouds which lead to spatial inhomogeneities and thus to an overestimation of the NESR. The last panel (Fig. 17c) shows the noise of individual rows averaged over cloud 505 free measurements. The lowermost rows are missing since no cloud-free spectra were available for analysis. Some high values due to unidentified small-scale clouds remain in the lowermost rows with data. At higher altitudes one can see that the NESR is not uniform over all rows due to the read-out electronics and the uneven distribution of filtered pixels. Typically, all flights of a campaign exhibit very similar NESR characteristics, but from campaign to campaign, we can observe small variations such as a generally increased NESR value due to changes of the 510 instrument or its operation parameters.
From this and analysis of other flights, we derive a typical representation of the NESR with 5 nW cm −2 sr −1 cm for the wavenumber range between 880 cm −1 and 1300 cm −1 and 8 nW cm −2 sr −1 cm outside this range for a spectral resolution of 0.625 cm −1 . The figures scale with √ 3.125 for the spectral resolution of 0.2 cm −1 and √ 10 for the spectral resolution of 0.0625 cm −1 (see Table 4).

Gain accuracy
This section estimates the stability of the gain magnitude from in-flight data. During the calibration, we compute a gain magnitude for each calibration sequence containing a pair of blackbody and deep space measurements, each giving an independent gain estimate (see Sect. 3). These estimates are averaged in a succeeding step to reduce the impact of measurement noise. The variability of these gain magnitudes gives us an uncertainty estimate by 520 computing the standard deviation of the gain magnitude for all flights and all pixels. The median standard deviation over all pixels is shown in Fig. 18 for several flights of the WISE campaign. The resulting accuracy is better than 0.1% and thus well within our target range of 1%. One can see an increased uncertainty towards the edges of the usable wavenumber range, which is caused by the decreased sensitivity of the detector and in consequence a lower SNR. Towards lower wavenumber regions, one can also see a spectral structure which resembles the window 525 emission feature shown in Fig. 6; these are likely caused by small temperature variations of the window during the rather long deep space measurements. Around 1050 cm −1 and 1300 cm −1 an increased uncertainty at the location of strong ozone and methane emissions is observed, which is most probably due to imperfections in the shaving of deep space spectra. While these estimates show the uncertainty of an average single detector pixel, we also use them to describe the accuracy of horizontally averaged data, because some of the underlying errors are strongly correlated 530 (such as the impact of atmospheric window emissions). We pick the highest uncertainty of the flights to have a single value for analysis as a worst-case assumption (max).
In addition to these variations, our gain might be subject to a systematic error common to all measurements.
There is a range of potential sources for such a systematic error to be stemming from, for example, inaccurate blackbody temperatures, uncorrected detector non-linearity, or shaving defects. To gain an independent estimate 535 of the absolute accuracy of the gain magnitude, we turn to atmospheric measurements and compare the calibrated spectra with a Planck curve of ambient temperature in a wavenumber range which is located within the optically thick ozone Q-branch from 1050 cm −1 to 1056 cm −1 . We selected only the atmospheric profiles for which ECMWF indicated an ozone volume mixing ratio above 300 nmol mol −1 to ensure sufficient optical thickness and thus prac- there is no gain bias in GLORIA data. As both offset and gain errors would reflect in this analysis as well as the inherent uncertainty in ECMWF temperatures, it is difficult to quantify exactly how accurate the gain is. Instead, we can give an upper bound of 2%, which fits well to our threshold requirement . For level 2 error estimates, we assume a general gain magnitude error of 1% , which obviously might imply higher or lower errors in individual flights.

Offset accuracy
The instrumental gain and offset are determined using a direct measurement of the deep space background. Thus, we expect only small, correlated errors due to measurement noise and imperfect shaving of atmospheric contributions.
In order to quantify these errors in offset, we analysed special in-flight measurements, where the elevation angle of the optical axis of GLORIA alternated between -0.38°and -1.00°for 12 consecutive images. We interpolated 555 each profile to an evenly spaced elevation axis, computed differences between succeeding measurements and finally averaged the differences. The result is depicted in Fig. 19. Within the overlapping range, we find a difference of (-1.7±5.3 nW cm −2 sr −1 cm) between the different pitch angles. Although statistically not significant, the higherpointing measurements are colder, which would be consistent with warm stray light from below affecting the measurements. The available measurements do not allow for a full quantification of the effect, though. Analysing the 560 discrepancies in more detail reveals spectrally and spatially correlated structures of up to 6 nW cm −2 sr −1 cm magnitude above 900 cm −1 increasing towards 16 nW cm −2 sr −1 cm below (Fig. 19b). The differences around 1000 cm −1 and other strong emission features may partially be caused by errors in gain, not offset. The magnitude of the difference is similar for both the upwards and downwards pointing pixels. This lends weight to the hypothesis that it is largely caused by an offset error, as the measured radiances are much higher at lower pixels which would lead to 565 higher absolute differences in case of gain errors. Due to the various smoothing methods employed in generating the offset calibration data, we also expect both spatial and spectral correlation. We assume here an eyeballed correlation length of 10 pixels vertically and 50 cm −1 spectrally and average the magnitude to 10 nW cm −2 sr −1 cm. This error is separate from the systematic uncertainty from shaving.

570
The spectral axis of our level 1 data depends on the proper association between taken images and optical path difference. We perform an off-axis correction and characterize the laser wavelength as described by Kleinert et al. (2014), using the deep space measurements. These are cloud free and taken periodically in high spectral resolution mode.
While the geometric parameters, which determine the off-axis angle, stay constant during one flight, we found 575 that the laser wavelength sometimes varies significantly, e.g., because of temperature drifts and resulting laser mode hops. In order to obtain a better temporal resolution of the evolution of the laser wavelength, we modified our spectral calibration algorithm to determine only the laser wavelength from otherwise off-axis corrected calibrated atmospheric spectra. This allows us to do quality checks on calibrated and horizontally averaged level 1 data and to better quantify our uncertainty. For each atmospheric measurement, all pixels unaffected by clouds are averaged, Norton-Beer weak apodization instead of the usually employed Norton-Beer strong apodization Beer, 1976, 1977). whereas the 0.625 cm −1 spectra have large errors of about 5-10 ppm (i.e. a relative error in wavenumber knowledge of less than 0.001%) and are thus only useful for a qualitative analysis (to detect large errors). For this flight, the spectral accuracy is in the order of 1 ppm on average. Using the 0.2 cm data, the accuracy is still diagnosed to be better than 2 ppm. This is much better than the original target accuracy of 10 ppm. Applying this technique to all flights of the WISE campaign, we estimate the spectral accuracy to be in the order of 2 ppm. The same accuracy 595 is also valid for other campaigns from PGS onward with the exception of a few flights subject to known technical issues. To account for additional variations over the detector and to include outliers, we use an accuracy estimate of 5 ppm (see Table 4). This corresponds to about one tenth of the smallest employed spectral sampling distance of 0.0625 cm −1 at the largest wavenumber of 1 400 cm −1 .

Point spread function 600
The point spread function (PSF) determines the amount and direction of incoming light being measured by the detector pixels. The theoretical shape for a diffraction limited instrument is the Airy-disk: for r > 0 a(r) = 1, for r = 0 with J 1 being the Bessel function of first kind, wave number ν, aperture D and distance to the optical axis r.

605
This function defines the PSF for a pixel of infinitesimal extent and needs to be integrated over the pixel size to determine the actual PSF for the pixel. In the horizontal direction, because of our averaging over full rows, we can assume effectively infinite extent with only a small error, but vertically the detector pixel width needs to be considered.
The optical aperture diameter is 3.6 cm. Figure 21 shows several PSF functions for an aperture of 3.2, 3.6, and We determine the PSF from extinction values from level 2 retrieval results. In the presence of optically thick cloud tops, artifacts will appear in retrieved extinction values if the simulated PSF is different from the real one.
With a PSF that is too wide, the modeled radiance above cloud top is overestimated and, accordingly, unnaturally small extinction values are generated in order to properly simulate the measured radiances. With a PSF that is too 615 narrow, there is no expected sharp step in the extinction profile at the cloud top, but instead a more gradual increase in the transition region. This is exemplified in the extinction profile in Fig. 22 retrieved with different aperture sizes based on the retrieval approach discussed by Ungermann et al. (2020). For a PSF with an assumed aperture smaller than 3.6 cm, extinction drops even below the background values found at higher altitude values above the cloud top. For this specific profile only from an aperture of 3.6 cm upwards, a sensible extinction profile can be derived.

620
While the range of admissible apertures varies from profile to profile, all derived extinction profiles were plausible for an aperture of 3.6 cm. The same experiment was performed at a different atmospheric window at 1214 cm −1 .
This shows similar results for the corresponding PSF, so the retrieval results are consistent with the expected value from instrument design. From the different characterization methods and the variability of the results, we assume an uncertainty of 10% in the width of the PSF.

6 Analysis of impact on level 2 data
The comprehensive characterization of leading level-1 errors from in-flight data is a solid basis to revisit our level 2 results and estimate the impact of these errors on our derived quantities.
The retrieval of geolocated physical quantities from limb measurements is a ill-posed non-linear problem. Using a forward model simulating the radiative transfer, one adjusts geophysical quantities such as temperature or trace 630 gas volume mixing ratios until the simulated radiances agree with the measured ones within expectation (e.g., Rodgers, 2000). For efficiency, we typically apply a linearized Gaussian error analysis after deriving an atmospheric profile (e.g., Johansson et al., 2020). Here we use a simpler approach to allow also for the quantification of all errors estimated in this paper including those for which the forward model does not offer the necessary derivatives.

635
For brevity, we examine only two quantities. We selected temperature as it is the fundamental quantity necessary for all further retrievals and ozone as one of the most commonly retrieved trace gases. The corresponding retrieval setups are described in Appendix F. We assume here that the given errors, with exception of the NESR, are systematic and affect all measurements of one flight in the same way. We picked flight 10 (7th October 2017) of the WISE campaign for this exercise as it offers many profiles reaching deep into the troposphere.

640
-To examine the effect of NESR, we applied independent Gaussian noise of 14.4 nW cm −2 sr −1 cm to all samples (employed spectral resolution of the level 1 data is 0.2 cm −1 ).
-The effect of the deep space shaving error is estimated by adding 20 nW cm −2 sr −1 cm to all measurements.
-The effect of the detector non-linearity is estimated by generating a Gaussian distributed gain error with mean 0 % and standard deviation 0.2 % for each pixel and modifying the radiances accordingly. -The offset error has a spatial structure and to best capture this, we decided to simply apply the difference from Fig. 19a to each measurement of the flight.
-For examining the total gain error we modified the gain in a dedicated level 1 run with a consistent 1% error (-1% gives similar results, but of opposite sign).
-We modified the LOS by 0.032°for the flight (-0.032°gives similar results, but of opposite sign).

650
-For estimating the effect of PSF width uncertainty, we examined a retrieval with a PSF of reduced width by 10% (increased width behaves quantitatively similar).
-Last, for examining the effect of a spectral shift, we applied a shift of 5 ppm in a dedicated level 1 run by modifying the employed laser wavelength by 5 ppm (-5 ppm behaves quantitatively similar).
We then performed temperature retrievals based on the CO 2 laser lines around 950 cm −1 of the whole flight for 655 191 uniformly spaced profiles (our standard selection for performing test retrievals for this flight) and computed the mean and standard deviation of the difference to an unperturbed reference run. The results are collected in Table 5.
Several of these errors have a noticeable structure in altitude, which we neglect (average away) here to simplify the discussion.
The instrument pointing has the largest impact with a mean error of ≈1 K. The error close to flight level is thereby 660 smaller but increases towards lower altitudes. Due to the typically employed LOS retrieval based on ECMWF temperatures, differences to ECMWF are typically smaller than this (Johansson et al., 2018), but 1 K seems a reasonable estimate for the absolute accuracy of the temperature product. The next leading error sources are related to the shaving-induced offset error and gain uncertainty. Both are in the order of 0.25 K. The other errors are an order of magnitude smaller.

665
The exemplary trace gas retrieval for O 3 derives volume mixing ratios from emissions of the 1 000 cm −1 ozone band. The results are here similar to those of the temperature retrieval. First, larger differences in temperature induce an error itself and, second, the error source affects the ozone retrieval as well. Due to the inclusion of the strong emissions at 980 cm −1 , the gain error affects this retrieval especially strongly. All errors are smaller than 5 %, which is often a reasonable accuracy estimate simply due to given uncertainties in spectral line strength and other modelling The GLORIA instrument has been operated for more than ten years now. In this time, we have learned about the new challenges posed by 2-D FTIR imaging observations in general. This enabled us to make full use of all calibration data, to develop methods to statistically use atmospheric observations for in-flight validation and to tackle a number of known instrument artifacts, several of which are specific to the 2-D concept. This mitigates some of the leading error-sources in previous data versions such as detector non-linearity and pointing.
In this study we have collected the matured state of our calibration and level 1 processing, which forms the basis for further level 2 processing. We have exploited all three available calibration sources (the two blackbodies and the deep space measurements) to correct for the non-linearity of our detector, which can change erratically between different cold-runs of the detector. This fully solves a problem only partially addressed by previous efforts

685
( Guggenmoser et al., 2015). The new correction allows us to use more of the available pixels thus reducing the NESR of row-averaged spectra.
The described algorithm is able to track the noise-level of measurements during the flight to measure the impact of, e.g., warm blackbodies with short integration times on the calibrated data and identify technical issues. The computed noise level of 5 nW cm −2 sr −1 cm for 0.625 cm −1 is within our instrument specification. 690 We determine the line of sight calibration by both immediate measurements of celestial bodies and a level 2 product employing ECMWF temperatures. This allows to put strong bounds on its accuracy of 0.032°, which corresponds to one vertical pixel.
Further, we leveraged different in-flight atmospheric measurements to get direct bounds on the accuracy of instrument gain and offset. The gain is accurate within 1%, whereas the offset is subject to a potential bias of up to 695 30 nW cm −2 sr −1 cm.
These aggregated errors allow us to propagate the error assumptions to level 2 products and identify the leading errors. The analysis show-cases that the largest contributor of uncertainty is the line of sight calibration, which is already reduced to the order of ±1 pixel and thus at the limit of what we can achieve with the given instrument; it also matches with the initial design specification (Friedl-Vallon et al., 2006). This uncertainty is closely followed by 700 the uncertainty in total gain and offset. Still, both errors are within our initial instrument target thresholds  and their impact on level 2 data is comparable to other level 2 error sources such as spectroscopic uncertainties. This will allow us to scientifically exploit the data as intended to gain new insights into the chemistry and dynamics of the UTLS region of the atmosphere. But it also demonstrates the capabilities of this instrument concept in general and the options an imaging detector offers to increase the data quality.

705
Data availability. GLORIA level 1 data are available on request. GLORIA level 2 data are accessible via the (HALO database).
The NOAA data are available from the NOAA websites (Dlugokencky and Tans, 2020;Dlugokencky, 2020;NOAA). spectra and the reconstruction of the forward calculated and gives an estimate for the uncertainty of this method.
The spectra which were calibrated based on the blackbody-blackbody measurements still show a significant broadband offset which cannot be attributed to the atmosphere. Although the origin of this offset is not completely explained, it has turned out that it can be well described by a Planck function. This Planck-shaped offset is determined and subtracted in a first step. For this purpose, the offset is determined at 5 spectral microwindows as listed in Ta-715 ble 2, and a Planck function is fitted to these 5 points with temperature and emissivity as fit parameters. In order to obtain more stable fit results, a scale is also fitted to the data in order to compensate non-perfect VMRs in the a priori profiles. Furthermore, a spectral shift is fitted in each microwindow, from which a mean shift over the whole spectral range from 750 to 1 400 cm −1 is deduced. This shift is used for all further retrievals. Figure A1 shows an exemplary median deep space spectrum, calibrated with the two blackbody measurements, 720 the offset values determined in the 5 microwindows and the fitted Planck function. All further plots in this section also refer to the same measurement.
In the next step, a broadband fit is performed in order to describe the measured spectrum. In total, 29 gases are considered in the forward calculation. The result of this fit is shown in Fig. A2. In general, the spectrum is well represented by the fit, and the residual is mostly below 20 nW cm −2 sr −1 cm. There are, however, some remaining 725 atmospheric features of, e.g., HNO 3 around 850 to 900 cm −1 , CO 2 below 800 and around 950 cm −1 , O 3 around 1 050 cm −1 , and N 2 O and CH 4 at higher wavenumbers. In order to reduce these residuals, several species are fitted again in dedicated microwindows, while the fit results <of all species from the broadband fit are used as new a priori profiles. For some species, it has turned out that dependencies of interfering species have to be considered. For example, a simultaneous fit of CFC-11, HNO 3 and CFC-12 in the spectral window from 830 to 920 cm −1 did not give satisfactory results, while a fit of CFC-11 and CFC-12 in a first step, followed by a fit of HNO 3 with the fit results for CFC-11 and CFC-12 as new a priori profiles, considerably reduced the residual. For some species (namely O 3 , N 2 O and CH 4 ) it was not possible to represent the whole spectral range with one profile. This can be explained with inconsistencies in spectroscopic data (e.g., Glatthor et al., 2018) or different temperature dependencies. Therefore, different profiles were fitted for different spectral ranges. The different species and microwindows for these itera-735 tions are also given in Table 2. In these iterations, also an offset is fitted for each MW to improve the fit result. This offset is then discarded. are attributed to the emission of the germanium entrance window, which is not fully compensated by the calibration using the two blackbodies, because the window correction (see Sect. 4.3) could not be applied to these data, due to a malfunction of the temperature sensor. Therefore, this difference is rather seen as an error in calibration using the two blackbodies and is not attributed to the uncertainty of the instrument offset determination. From Fig. A4, a 1-sigma uncertainty of 10 nW cm −2 sr −1 cm is derived for the whole spectral range. For the estimation of the systematic 755 uncertainty due to the instrument offset determination, we take the 2-sigma uncertainty, i.e., 20 nW cm −2 sr −1 cm.

Appendix B: Pixel-wise non-linearity factor determination
This section describes the computation of pixel specific non-linearity correction factors in detail.
The voltage U ∈ R measured by a detector pixel is a monotonous function f : R → R of incoming radiation P .
Most of the non-linearity has been characterized and is corrected for in an early processing step in the level 0 760 processing (Kleinert et al., 2014), such that we have here mostly linearly behaving pixels with small variations on top. With x ∈ R being the sled position, the incoming radiation in an interferogram varies around the mean P 0 ∈ R such that one can linearize f at P 0 : For the Fourier-transformation S of the interferogram U follows: with c being a constant function influencing only the zeroth frequency. Neglecting higher-order effects, this means, that the uncalibrated raw spectrum is scaled with the slope of the tangent of f at the mean level P 0 .
As the uncorrected non-linearity only changes slowly with P 0 , this implies that measurements with similar P 0 at the same exposure time will be subject to an effectively identical scaling factor. This applies to optically thin 770 atmospheric and deep space measurements, which are taken at the same exposure time. Problematic are the blackbody measurements which require a much shorter exposure time and are subject to a very different photon load or atmospheric measurements affected by strong cloud emissions.
In the final calibration step, we deduce the instrumental gain g from one deep space and one blackbody measurement (see Sect. 3). To derive a correct gain, the different slope induced by the different photon load and exposure 775 time of the blackbody measurements needs to be corrected (our data analysis shows that the deep space measurement is effectively similar in behavior to the atmospheric cloud-free measurements), such that one only needs to determine one linear correction factor α for each detector pixel for a given blackbody measurement S bb , as there is no spectral variation (under the assumption of no effects of higher order). With this linear scaling factor, a corrected gain g * can be computed as: Here, S bb and S ds refer to the raw spectra of shaved deep space and cold blackbody measurement. u and v are the integer coordinates of the detector pixel and ν relates to a spectral sample.
To estimate the effect of the correction on the calibrated spectrum, one may express the corrected spectrum in terms of the uncorrected one: As the signal S bb is much stronger than S ds , the non-linearity is effectively causing a scaling of the calibrated spectrum. For our typical blackbody and deep space spectra, an error of 5% in non-linearity causes an error in gain between 4% and 5.2%, varying within this range both spatially and spectrally. These errors scale nearly linearly in the value range observed for our detector. The error assumes its largest values in the spectral regions, where the 790 atmospheric window (see Sect. 4.3) has the strongest emission features and in the centre of the detector, where instrument emission is smallest.
To identify the individual non-linearity, we exploit the fact, that the instrumental offset L o must be spatially smooth. As such, we define for each detector pixel (u, v) a cost function J u,v 795 with S uds being the "unshaved" deep space measurement to determine the correction factors for α pixel by pixel.
The L o generated by the shaved deep space is close to zero for many pixels in the centre of the detector array, such that measurement noise prevents us from reliably determining the correction factors from the shaved measurements alone. The ozone band at 1000 cm −1 present in the unshaved deep space spectra are a sufficiently strong signal to reliably determine α for all pixels of the detector. The second term is a simple ad-hoc regularisation term to make the 800 problem numerically well-behaving. We found that λ = 0.1 works well for our purposes. In addition, we excluded all ν, for which the signal generated by the calibrated unshaved deep space is smaller than 200 nW cm −2 sr −1 cm.
Otherwise, the noise error is amplified too much by the division in the cost function. We thus restricted the spectral range to 900 cm −1 to 1200 cm −1 . Further, we average the spectral samples over 10 cm −1 wide bins to reduce the number of involved measurements and thus the computational effort needed for the fit.

805
For the function smooth, in principle any function capable of smoothing a 2-D grayscale image and especially of removing shot noise would be suitable (such as a median filter). We found it useful and more efficient to fit a 2-D polynomial of 20th degree to the 2-D field, remove outliers from the fit, and refit the polynomial in a second step to generate a smooth field. Thus, we get a continuous and smooth (in the mathematical sense) 2-D function with known properties. An example of the smoothing is shown in Fig. A5: panel a shows the real part of the "unshaved" 810 deep space spectra divided by the uncorrected gain for one wavenumber. This is effectively the sum of the calibrated spectrum and the instrumental offset. With the continuous color scale, one can already see cluster of pixels with elevated radiance values compared to the surroundings. The noisiness becomes much more apparent in Fig. A5b, where a wrapping colour scale is employed. The effect of smoothing the image with a 2-D polynomial two-step fit is shown in Fig. A5c. The polynomial tends to overshoot at the left edge of the display, where almost all pixels are 815 unstable due to the read-out electronics. As such, the leftmost two columns are typically discarded by default in the last processing steps.

Appendix C: Description of the pointing system
The attitude is determined by an inertial measurement unit with laser gyroscopes from Honeywell (HG9900) that is mounted on the gimbal yaw frame. This attitude data is combined with information from a Novatel GPS using Control. This inertial measurement and stabilization control unit (iIMCU) was developed by iMAR Navigation GmbH, Germany.
This pointing system allows several control modes that are connected to the observation scenarios needed for GLORIA. For atmospheric measurements with high spatial resolution, the control system switches to Target mode 830 which works like a point and stare mode. During the acquisition of one interferogram, the LOS of one defined pixel is stabilized to a specific point defined in WGS84 coordinates. Then, step-by-step, the azimuth is turned by a certain angle, and the next point is stabilized. The coordinates are calculated using predefined patterns with a gimbal yaw angle step size of typically 4°to 8°and the dwell time of several seconds depending on the configured spectral resolution of the interferometer.

835
For atmospheric measurements with high spectral resolution the Altitude-Azimuth mode is used. FOV stabilization is focused on a given tangent altitude and azimuth in the WGS84 system for a specified pixel. Contrary to the Target mode, the horizontal moving of the aircraft is not compensated, but the azimuth is kept constant during the acquisition of one interferogram, leading to a slight horizontal smearing of the tangent point. In both modes, elevation is permanently adapted to keep the altitude of the tangent height of a specified pixel constant, even during 840 variations of the flight altitude.
The Nadir mode also acts as a point and stare mode like the Target mode, but is designed to stabilize on tracing points on the ground through an aperture in the bottom of the belly pod.
Deep space calibration measurements are performed with an elevation of 10°in the Angle mode of the pointing system with stabilization of constant elevation and azimuth angles defined in WGS84.

845
Blackbody calibration measurements are performed in Gimbal mode with fixed positions of the gimbal mount in order to point to one of the two internal blackbodies.
All pointing changes are synchronized to the short breaks of interferogram acquisition when the slide in the interferometer changes direction in order to avoid scene jumps in the interferogram data.
Appendix D: Determining the moon position 850 To determine the position of the moon in relation to the instrument with high accuracy, we make use of the JPL ephemeris data in version DE421. We use the Python software package "skyfield" to access the JPL data and determine the apparent altitude and azimuth at the instrument position including relativistic effects and aberration (Rhodes, 2019). Even though the moon measurements were taken at 12 km altitude and above, the refraction of the atmosphere cannot be neglected, especially as some moon measurements were taken below the horizon. The 855 skyfield package computes refraction, but only for up to -1 • , a reasonable value at ground level, but insufficient for our purposes. Lacking a readily available formula for computing the refraction under the given conditions, we determined the effect of refraction by a straightforward ray-tracing algorithm given below.
Earth was assumed to be spherical, and the ray was traced in a plane with polar coordinates (r, θ) with the origin at the center of the Earth. The coefficient of refraction n(r) was computed using atmospheric profiles for temperature 860 and pressure taken from ECMWF analysis data at instrument position. Let the ray be described by r = r r (θ), and the angle between the ray and vertical φ (r r (θ) , θ). We back-trace the ray from the instrument at position (r 0 , 0) with φ = φ 0 . In the absence of refraction, the ray would be a straight line with φ = φ 0 − θ, so for the refracted ray we can write φ = φ 0 − θ + δ (φ 0 , r 0 , n (r)), where the angle δ represents deviation from the straight line.
Snell's law gives d (n/ sin φ) = 0 for any point on the ray path, therefore dδ/dn = dφ/dn = tan φ/n. From basic geometry in polar coordinates ∂r r /∂θ = r r / tan φ. Using these two relations and equation (D1) we formulate the back-tracing of the ray as an initial value problem where the variablen = n − 1 was introduced for numerical stability, as n − 1 1. We solve this numerically to 870 obtain φ| r→∞ ≈ (φ 0 − θ + δ) | r=2R0 with R 0 the mean Earth radius. We then iterate the interval halving to get moon elevation π/2 − φ 0 as a function of the (known) astronomical elevation π/2 − φ| r→∞ , r 0 and n(r).
In a final step, we identified the position of the moon in the infrared images by manually matching a circle of the expected moon size with the taken images (see Fig. 14). We expect an uncertainty in this process of about one detector pixel for the visual identification and 0.02°for refraction and positioning corresponding to a total of 0.05°875 for both azimuth and elevation.

Appendix E: Standard atmosphere profiles
Several of our methods require standard atmospheric profiles for a range of trace gases. We typically use the profiles collected by Remedios et al. (2007) for this purpose. Comparing forward calculations using these profiles to our recent measurements revealed noticeable discrepancies in the order of several nW cm −2 sr −1 cm associated with 880 CCl 4 , CH 4 , CO 2 , CFC-11, SF 6 , and N 2 O. While these discrepancies can be partly associated with the fact that a climatological standard profile will almost never fully agree with the actually given one, we found that, especially for the upwards pointing stratospheric deep space measurements, a large part of the discrepancy was caused by changes of atmospheric composition between the creation of the data set and the time of measurement.

885
To improve the climatological profiles especially for the shaving procedure, we retrieved global monthly data for these species from the Halocarbons & other Atmospheric Trace Species (HATS) research program (e.g., Hu et al., 2016;Montzka et al., 2021Montzka et al., , 1996Elkins and Dutton, 2009;Montzka et al., 2009;Conway et al., 1994;Dlugokencky et al., 1994). We further smoothed the data by computing a running mean over twelve consecutive months and fully disregard differences between the hemispheres for simplicity. For HCFC-22 and CFC-14, no such 890 globally averaged product was available at the time; instead measurement data from several individual stations was given. We computed a crude global average over the available stations and also performed a running yearly mean to derive a similar product as given for the other species.
To update the profiles of Remedios et al. (2007), we simply scale the full profile (and the associated standard deviation) with a multiplicative factor derived from the quotient between the value of the HATS data and the ground 895 level value of the profile. This improved the fit of radiances and reduced visible residuals by these interfering species and thus reduced the systematic error induced by emissions from these trace gases.
Appendix F: Level 2 processing F1 line of sight/temperature retrieval To determining the line of sight from measured data with a spectral resolution of 0.2 cm −1 , we assume that tem-900 perature values supplied by ECMWF analysis data are close to real values. In effect, we perform a standard retrieval keeping temperature at ECMWF values and only vary the elevation correction value until the simulated measurements fit the measured ones best. A very rough error estimate of the approach shows that with gradients of ≈10 K km −1 a systematic 1 K error in ECMWF would only cause an error in final pointing data of about 100 m.
The situation becomes even better when profiles contain both tropospheric and stratospheric sections with temper-905 ature gradients of opposite sign. Still wrongly represented tropopauses, gravity waves, or other non-uniform biases in temperature impact the method. We thus do (generally) not trust in individual values, but in the average over a whole measurement flight after filtering short profiles with high uncertainty.
The retrieval itself leverages CO 2 emission lines around 950 cm −1 , where only few other gases have interfering emissions. Table 6 shows the used spectral regions. These were selected to avoid strong emission features by, e.g.,

910
CFC−12, O 3 , H 2 O and SF 6 and cover as many CO 2 lines as feasible. The retrieval uses the trace gasses of CO 2 , CFC−12, and SF 6 with climatological values (see Appendix E). For H 2 O, we use water vapour as given in the ECMWF analysis data, even though the impact of this is small. In addition to the line of sight, an extinction profile is also retrieved to capture both atmospheric aerosol and small calibration errors.
To derive temperature instead, we simply revert the roles of line of sight and temperature. We use smoothed 915 ECMWF temperatures as a priori and initial guess (but the choice of a priori typically does not affect the results, only the convergence speed). In addition to temperature, this setup also derives PAN as secondary target to remove its (small) influence in polluted air.

F2 O 3 retrieval
The ozone retrieval discussed briefly in Sect. 6 is a simplified version of the retrieval described by (Ungermann et al., 920 2015). It uses the four spectral regions listed in Table 7, each averaged to a single radiance. The temperature and PAN values derived by the temperature retrieval of Sect. F1 are used as well as ECMWF analysis water vapour. The retrieval derives only O 3 volume mixing ratios and a single extinction profile. Climatological values are employed for the trace gases CO 2 , CFC-12, CFC-113, NH 3 , and SF 6 (see Appendix E). and described the pointing system and its calibration. IB contributed the analysis of parasitic images. FFV contributed to the instrument description and NESR and PSF analysis. SJ contributed to the line of sight analysis and level 2 error diagnosis. LK contributed the refraction correction for moon measurements. TN helped with all effects related to GLORIA electronics. All authors reviewed the whole paper and provided many corrections and suggestions. All authors supported the development and/or operation of GLORIA in the field.
pean Centre for Medium-Range Weather Forecasts (ECMWF) is acknowledged for meteorological data support. We especially thank the full GLORIA team, including the institutes ZEA-1, ZEA-2 at Forschungszentrum Jülich and the Institute for Data Processing and Electronics at the Karlsruhe Institute of Technology, for their great work during the campaigns on which all the data in this paper are based. We would also like to thank the pilots and the ground support team at the Flight Experiments facility of the Deutsches Zentrum für Luft-und Raumfahrt (DLR-FX). We thank Albert Adibekyan, Christian Monte and Max

940
Reiniger from PTB, Germany for support on the characterization of the blackbodies and the Germanium entrance window in the frame of the EMPIR Project 16ENV03, "Metrology for Earth Observation and Climate 3" (MetEOC-3). We really appreciate the competent collaboration with iMAR Navigation and their very good support whenever it was needed. Data was provided by the       high spectral resolution mode (chemistry mode) 8.0 0.0625 13.5 Table 3. Average noise value of a 0.625 cm −1 spectral sample and maximum noise of a superpixel averaged over all spectral 0.625 cm −1 samples for different bad pixel masks. The smaller the n in n-sigma, the more pixels will be filtered.