Calculating the vertical column density of O4 from surface values of pressure, temperature and relative humidity

We present a formalism that relates the vertical column density (VCD) of the oxygen collision complex O2-O2 (denoted as O4 below) to surface (2 m) values of temperature and pressure, based on physical laws. In addition, we propose an empirical modification which also accounts for surface relative humidity (RH). This allows for simple and quick calculation of the O4 VCD without the need for constructing full vertical profiles. The parameterization reproduces the real O4 VCD, as derived from vertically integrated profiles, within −0.9%± 1.0% for WRF simulations around Germany, 0.1%± 1.2% for 5 global reanalysis data (ERA5), and −0.4%± 1.4% for GRUAN radiosonde measurements around the world. When applied to measured surface values, uncertainties of 1 K, 1 hPa, and 16% for temperature, pressure, and RH correspond to relative uncertainties of the O4 VCD of 0.3%, 0.2%, and 1%, respectively. The proposed parameterization thus provides a simple and accurate formula for the calculation of the O4 VCD which is expected to be useful in particular for MAX-DOAS applications.


Introduction
In the atmosphere, two oxygen molecules can build collision pairs and dimers, which are often denoted as O 4 (Greenblatt et al., 1990;Thalman and Volkamer, 2013, and references therein). O 4 has absorption bands in the UV/visible spectral range, thus O 4 can be retrieved from atmospheric absorption spectra, e.g. by applying Differential Optical Absorption Spectroscopy (DOAS) (Platt and Stutz, 2008). Measurements of the O 4 absorption pattern in scattered light provide information about light 15 path distributions in the atmosphere, for instance allowing to investigate light path length increase within clouds (Wagner et al., 1998) or the retrieval of cloud heights from satellite measurements (Acarreta et al., 2004;Veefkind et al., 2016).
For Multi-Axis (MAX) DOAS, i.e. ground-based instruments measuring scattered light at different elevation angles, O 4 measurements provide information on vertical profiles of aerosol extinction (Heckel et al., 2005). Prerequisite for MAX-DOAS profile inversions is knowledge about the O 4 vertical column density (VCD) which provides the link between the 20 measured slant column densities (SCDs) at different viewing angles and the forward modelled SCDs based on radiative transfer calculations. Thus, a wrong input of the O 4 VCD directly affects the resulting aerosol profiles. For the profile inversion algorithm MAPA  applied to measurements taken during the CINDI-2 campaign (Kreher et al., 2020), for instance, a change of the input O 4 VCD of 2 %, 3 %, 5 %, or 10 % causes changes of the resulting median aerosol optical depth of 6 %, 8 %, 13 %, or 20 %, respectively. Thus, for MAX-DOAS profile inversions, the O 4 VCD should be determined 25 with accuracy and precision better than about 3 %, which limits the impact on resulting AODs to below 10 % and leaves other sources of uncertainty, i.e. the spectral analysis (≈ 5 %) as well as radiative transfer modeling (≈ 4 %) (see Wagner et al., 2021,  The O 4 VCD can be calculated by vertical integration of the O 2 number density profile squared. This requires knowledge of vertical profiles of temperature, pressure, and humidity, e.g. as derived from radiosonde measurements or meteorological 30 models. However, radiosonde measurements are only available for few stations and do not provide continuous temporal coverage, while modelled profiles might not be available in some cases (e.g. during measurement campaigns in remote regions and poor internet connection; for these cases, profiles from a climatology might be used as fallback option), or might not reflect the conditions at the measurement site appropriately, in particular in mountainous terrain not resolved by the model.
Measurements of surface air (at 2 m) temperature, pressure, and humidity, on the other hand, are routinely performed by 35 meteorological stations, and could be added to any MAX-DOAS measurement site with relatively low costs and efforts. Wagner et al. (2019) proposed to construct full temperature and pressure profiles from the respective surface values by assuming (a) a constant lapse rate of −6.5 K km −1 from ground up to 12 km, and constant temperature above, and (b) applying the barometric formula. 1 Wagner et al. (2019) estimate the uncertainty of the calculated O 4 VCD to 3 % and list the diurnal variation of the surface temperature and the limited representativeness of the surface temperature for the temperature profile above the boundary 40 layer as the main source of uncertainty.
The method proposed by Wagner et al. (2019) reproduces the true O 4 VCD within about 2 % (bias) ±2 % standard deviation (SD) globally when compared to ECMWF profiles. Locally, however, large deviations up to 7 % could be found, as shown in this study, which is mainly caused by the assumption of a fixed lapse rate of −6.5 K km −1 : While this value reflects typical continental conditions quite well, it is not appropriate in particular over deserts, where lapse rates are stronger (closer to the 45 dry adiabatic lapse rate), and parts of the oceans with weaker (i.e. closer to zero) lapse rates due to condensation.
In this paper we present a simpler approach for the calculation of O 4 VCD just from surface values of temperature and pressure and an a-priori lapse rate based on physical laws, without the need of constructing full profiles. In addition, we provide an empirical parameterization involving surface relative humidity that also accounts for variations of the atmospheric lapse rate. The final equation allows for simple and quick calculation of the O 4 VCD with high accuracy and precision just 50 from surface measurements of temperature, pressure, and relative humidity.
The manuscript derives the formalism of the parameterizations of the O 4 VCD in Sect. 2. In Sect. 3, the datasets used for illustration and quantification of uncertainties are introduced, followed by applications of the O 4 parameterizations in Sect. 4. Important aspects like comparison to standard methods used for the calculation of the O 4 VCD, the impacts of temperature inversions, surface altitude, or diurnal cycles, and the accuracy and precision of the proposed parameterizations are discussed 55 in Sect. 5, followed by conclusions. 1 Note that, for this approach, as well as for the parameterizations presented in this study, temperature inversions are problematic. As MAX-DOAS applications require daylight, however, night-time inversion layers are irrelevant for this study. The remaining temperature inversions at daytime, mostly occurring in early morning hours and over cold water and ice surfaces, will be discussed in Sect. 5.2. In this section, we provide the formalism for the calculation of O 4 VCDs from surface values of pressure, temperature, and relative humidity.

60
Basic quantities of the derivation below are (a) the number density n, and (b) the vertical column density (VCD) V , i.e. the vertically integrated number density.
The O 4 number density is just defined as the O 2 number density squared. Consequently, the O 4 number density has the unit molecules 2 cm −6 , and the O 4 VCD has the unit molecules 2 cm −5 . This matches the common procedure in the DOAS community; the O 4 cross section is given in cm 5 molecules −2 accordingly (Greenblatt et al., 1990;Thalman and Volkamer, 2013).
Pressure is denoted by p, temperature by T , and the altitude above sea level by z, while altitude above ground level is denoted by z . For relative humidity, RH is used in the text as well as in formulas. Surface values are indicated by the subscript "0".

General approach
The VCD V is the vertically integrated number density n: This integral can be re-written as This effective height h can be understood as the height of the gas column if the gas would be in a homogenous box under surface conditions p 0 and T 0 . Note that the effective height equals the scale height H only in case of exponential profiles, i.e. an isothermal atmosphere (see Appendix A).

80
Thus, the VCDs for O 2 and O 4 can be written as and Re-arranging Eq. 4 for n O2,0 and replacing one n O2,0 term in Eq. 5 yields Hence the O 4 VCD can be expressed as the product of the O 2 VCD, the O 2 surface number density, and the ratio of effective 1. Assuming a hydrostatic atmosphere, the surface pressure is just the gravitational force per area of the total air column.
Thus, the O 2 VCD is directly related to the surface pressure: 95 with ν O2 being the volume mixing ratio of O 2 in dry air, g being the gravitational acceleration on Earth, and M being the molar mass of dry air.
2. According to the ideal gas law for dry air, the surface number density of O 2 can be expressed as with the universal gas constant R. -For an isothermal atmosphere, i.e. an exponential profile of n O2 , the ratio hO 2 hO 4 is just 2 (Eq. A2).
-For real atmospheric conditions, where the lapse rate varies with altitude, the ratio of effective heights can be still 105 described by Eq. A10 if an effective lapse rate is considered: For a given atmospheric profile, the effective lapse rate is thus defined as the lapse rate of a polytropic atmosphere of the same O 4 VCD.
Replacing the terms in Eq. 6 by Eq. 7, Eq. 8 and Eq. A10 yields combining the constant factors. Thus with the assumptions specified above, the O 4 VCD is proportional to p 2 0 /T 0 , with the lapse rate Γ determining the slope. In order to evaluate the performance of the parameterizations for the O 4 VCD, we also calculate the "true" O 4 VCD, which is derived by (a) calculating the profile of n O2 from profiles of T , p and RH. In this step, the effect of humidity is explicitly accounted for by subtracting the water vapor pressure before calculating n O2 based on the ideal gas law. 120 (b) performing the numerical integration (using Simpson's rule) of n 2 O2 from surface to top of atmosphere (TOA). The integration has to be performed up to sufficiently high altitudes  recommend z TOA ≥ 30 km) as otherwise the integrated VCD would be biased low due to the missing column above. As not all datasets considered below cover this altitude range, we estimate and correct for the missing O 4 column above the highest profile level by applying Eq. 10 for the highest available layer, assuming a lapse rate of zero above. Note that the temperature increase in the upper stratosphere 125 is not relevant here as the contribution to the O 4 VCD above 30 km is negligible. Thus, the "true" O 4 VCD is calculated as For z TOA of 20 km, the correction term is of the order of 0.3 % of the total O 4 column.
2.5 O 4 VCD as a function of surface pressure, surface temperature, and surface relative humidity Equation 10 might be applied for a common lapse rate, like −6.5 K km −1 as proposed in Wagner et al. (2019). This works 130 generally well over most continental regions. However, large deviations have to be expected for regions with different lapse rates, in particular over deserts, where lapse rates are typically stronger (more negative, i.e. close to the dry adiabatic lapse rate). Over parts of the ocean, on the other hand, lapse rates are weaker (closer to zero).
In order to modify Eq. 10 such that it can be applied globally, but keep it still a simple function of surface measurements, we make use of the relation between the effective lapse rate and the RH at ground:

135
-For ascending air masses, RH 0 determines the altitude at which condensation takes place. This relation is directly reflected in the calculation of the lifted condensation level (LCL) as function of RH 0 (Lawrence, 2005;Romps, 2017).
Thus, the lower RH 0 , the higher the altitude range above ground where dry adiabatic lapse rates apply.
-For descending air masses (in particular the large-scale subsidence over tropical deserts), no condensation takes place and dry adiabatic lapse rates apply.
In both cases, low relative humidity at ground is associated with lapse rates closer to the dry adiabatic lapse rate.
Real atmospheric profiles are of course more complex as these simplified scenarios, in particular due to advection, but still, a correlation between RH 0 and effective lapse rates is expected. We thus parameterize the effective lapse rate by the relative humidity at ground via a linear function: Replacing this in Eq. 10 results in V O4 becoming a function of surface values for pressure, temperature, and relative humidity: where the parameters a and b are linked to α and β from Eq. 13 via and The parameters a and b can then be determined by a linear least squares fit by comparing V O4,RH to V O4,true : thus

155
a and b will be derived in Sect. 4.2 based on true O 4 VCDs calculated from ECMWF profiles. This empirical approach also, at least partly, corrects for effects neglected in the derivation of Eq. 10, i.e. ignoring the tropopause in the calculation of the ratio of effective heights (App. A3), and applying the ideal gas law for dry air in Sect. 2.3.
From a and b, also the corresponding effective lapse rate for a given RH 0 can then be calculated with Equations 13, 15 and 16. This lapse rate allows to construct full atmospheric profiles of T and p (applying the barometric formula for polytropic 160 atmosphere) from surface measurements when needed, in particular for MAX-DOAS inversions based on optimal estimation.
As humidity effects are already accounted for in the determination of a and b, no further correction for humidity should be applied in this case.

Comparison of parameterized to "true" O 4 VCD
In order to assess accuracy and precision of the proposed calculation of the O 4 VCD from surface measurements of T 0 , p 0 and 165 Γ (Eq. 10) or RH 0 (Eq. 14), we define the relative deviation δ of a derived O 4 VCDs to the true value: datasets: 1. Global model data, in order to check for the performance of the parameterizations globally, covering the full range of the relevant parameter space for surface values of pressure, temperature, humidity, and altitude. 2. Regional model data with high spatial resolution, which are also compared to surface stations and allow to investigate the impact of diurnal cycles.

175
3. Balloon-borne radiosonde measurements, in order to apply the formalism to high-resolved profile measurements.
Nighttime profiles of T can be considerably different from daytime profiles, in particular in case of temperature inversions (i.e. positive lapse rates) often occurring within the nocturnal boundary layer. For MAX-DOAS measurements, however, nighttime profiles are irrelevant. Thus, we consider all atmospheric datasets for daytime conditions only. This is done by selecting data with an solar zenith angle (SZA) below 85°.

Global model (ECMWF)
We use global model data as provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) for two purposes: -In order to investigate global patterns, we use ERA5 reanalysis data (Hersbach et al., 2020) truncated at wavenumber 639 on the Gaussian grid N320, corresponding to ≈0.3 • resolution. Model output is provided hourly. Here, we focus 185 on ERA5 data for four selected days, i.e. the 18 March, 18 June, 18 September and 18 December 2018, covering the full globe for all seasons. As the regular latitude-longitude grid over-represents high latitudes, we only consider the fraction of cos(lat) pixels for each latitude for the calculation of histograms, correlation coefficients, means and standard deviations.
-For comparison with the standard approach for the calculation of the O 4 VCD that was used in MAPA so far, we use 190 ERA-Interim reanalysis data truncated at wavenumber 255, corresponding to ≈0.7 • resolution, which was preprocessed to a dataset with 6 hourly model output (0:00, 6:00, 12:00, 18:00 UTC) interpolated to a regular horizontal grid with a resolution of 1°. This dataset is denoted as ERA_I daily below.
In addition, we make use of a monthly climatology of atmospheric profiles (ERA_I clim ) based on the same ERA-Interim data, which was constructed as back-up solution recommended within the FRM4DOAS project in case of no other profile 195 information being available.

Radiosonde measurements (GRUAN)
The Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN) is an international reference observing network of sites measuring essential climate variables above Earth's surface (Sommer et al., 2012;Bodeker et al., 205 2016). Atmospheric profiles of temperature, pressure, and humidity are measured by regular balloon soundings equipped with radiosondes and water vapor measurements (Dirksen et al., 2014 4.1 O 4 VCD as a function of p 0 , T 0 , and lapse rate Γ According to Eq. 10, the O 4 VCD is proportional to p 2 0 /T 0 , with the lapse rate Γ determining the slope. We illustrate this correlation for the investigated datasets in Fig. 1. V O4, true varies considerably for all datasets, where the low values are caused by mountains due to reduced pressure, while the very high values for ERA5 and GRUAN are caused by cold temperatures in polar regions. The variability of V O4, true is 220 well reflected in p 2 0 /T 0 , and very good correlation between both quantities are found, with most data points in accordance to plausible lapse rates in the range of −4 to −6.5 K km −1 . For the WRF simulations for Germany, most data points are matching to a lapse rate close to −6.5 K km −1 . ERA5 and GRUAN data show higher variability in slopes, as they also cover a wider range of atmospheric conditions. Wagner et al. (2019) proposed to determine the O 4 VCD based on vertical profiles of T and p constructed from the respective surface values by assuming a constant tropospheric lapse rate of −6.5 K km −1 . We can use Eq. 10 for the same purpose, but  Generally, good agreement between V O4, Γ and V O4, true is found (Fig. 2): on average, δ Γ is 1.7 %, i.e. V O4, Γ are higher than 230 V O4, true by 1.7 %. Over land around noon, δ Γ is close to 0. Over ocean, however, δ Γ is higher (about 3 % up to 7 %). For ERA5 data on 18 June 2018, δ Γ over Germany is close to 0 as well (Fig. 3). On global scale, however, only moderate agreement is found between V O4, Γ and V O4, true , with a mean deviation of 2.6 %. High values for δ Γ are found generally over ocean. In particular over cold water surfaces, like the West coast of North and South America, the Hudson Bay, or the Great lakes, δ Γ is very high (up to 7 %). This is related to temperature inversions close to ground: due to the too low surface 235 temperatures, the O 4 VCD calculated from Eq. 10 is biased high. This will be discussed in detail in Sect. 5.2. Over continents, δ Γ is lower, and generally close to 0, except over deserts, where negative values are observed.  In Fig. 5 (a), the effective lapse rate is compared to the actual lapse rate between ground and 5 km altitude above ground, revealing a correlation of 0.83. Figure 5 (b) displays the relation between relative humidity at ground and the effective lapse rate (R=0.59). Assuming a linear relation between Γ eff and RH 0 in Eq. 13 results in a linear relation between RH 0 and the ratio Q (Eq. 18). This is displayed in Fig. 5 (c). Interestingly, the correlation (R=0.79) is far better than in (b) (R=0.59). This indicates 245 that the parameterization based on RH 0 allows to at least partly correct for other simplifications made in the formalism in Sect. 2.3, in particular the neglect of humidity in the ideal gas law.

Effective lapse rate and relative humidity at ground
We use ERA5 data from 18 June 2018 to determine the parameters a and b in Eq. 14 by applying a linear least squares fit to the data presented in Fig. 5 (c), as shown by the black line. Fitted parameters are a = 1.77434±0.00003 and b = 0.11821±0.00004 (for RH 0 in absolute numbers, i.e. 0.5 for 50% RH). The corresponding parameterisation of the effective lapse rate (Eq. 13) 250 is Γ eff = (−7.709 + 4.038·RH 0 ) K km −1 , which yields -7.709, -5.690 and -3.671 K km −1 for RH 0 of 0%, 50%, and 100%, respectively.
4.3 O 4 VCD as function of p 0 , T 0 , and RH 0 With Eq. 14, an empirical parameterization of the O 4 VCD was derived based on surface values of temperature, pressure, and relative humidity. We applied this parameterization to all investigated datasets. Figures 6 and 7 display δ RH for WRF and 255 ERA5, respectively. GRUAN results are shown in Fig. 8.
For ERA5, the parameterization involving RH is a substantial improvement compared to the results for δ Γ . The large differ-260 ence between land (in particular deserts) and oceans seen in δ Γ (Fig. 3) is strongly reduced for δ RH (Fig. 7). For 18 June 2018, the mean of δ RH ≡ 0.0 % is of course a consequence of the fit optimizing a and b which is based on the same ERA5 dataset.  But there is also a considerable reduction of SD from 1.6 % for δ Γ to 1.0 % for δ RH . Applying Eq. 14, with a and b derived from ERA5 data for 18 June 2018, to ECWMF data from other months yields similar results, as shown in Fig. C2, with largest deviations of 0.2 ± 1.8 % observed for 18 March 2018.

265
Remaining systematic deviations in the maps of δ RH are due to weather, for instance associated with low pressure or frontal systems. This reflects the simplifying assumptions made, in particular assuming hydrostatic conditions in Sect. 2. Note, however, that MAX-DOAS retrievals are usually not considered for weather conditions associated with rain and clouds.
cold surfaces causing temperature inversions, as discussed in more detail in Sect. 5.2.

270
mountains, which tend to show systematic deviations δ RH that are mostly negative (e.g. over the Andes or the Himalayas).
So far, the formalism derived in Sect. 2 was applied to data from meteorological models. Now we test it for measured profiles 275 from radiosondes as well. Application of Eq. 14 to GRUAN data generally yields deviations close to 0 between parameterized and true O 4 VCDs for all stations, as shown in Fig. 8. Parameterized and true VCD show high correlations, indicating that the temporal variability of the atmospheric state is well captured by the simple parameterization based on surface values alone.
The mean deviation δ RH of all considered GRUAN profiles is −0.3 %, with a SD of 1.4 %. For 11 out of the 17 stations, the mean agreement is within 1 %. Largest deviations are found for La Reunion (REU), where V O4, RH is biased low by −2.5 %.

280
This is probably related to the altitude of this station of more than 2 km on a remote island in the Indian ocean. Highest positive deviation of 1.1 % is found for Barrow, with also highest SD of 1.8 %. This is caused by some very high values during spring where surface temperatures are very low (< 240 K) and temperature inversions occur. in O 4 VCDs when ignored. Thus we apply the following correction to the ERA-Interim profiles:

Comparison to existing methods for the calculation of the O 4 VCD
-In case of GRUAN station altitude being higher than ERA-Interim surface altitude, the ERA-Interim profiles of T and 295 RH are just linearly interpolated. As pressure profiles are almost exponential, ln(p) is linearly interpolated.
-In case of GRUAN station altitude being lower than ERA-Interim surface altitude, ERA-Interim profiles are extended by surface values of T 0 and ln(p 0 ) as derived from linear extrapolation of T and ln(p), respectively. RH at ground, however, is not extrapolated, as this might result in unphysical values of RH below 0 or above 1. Instead, the value of the lowest ERA-Interim model layer is taken as RH 0 .

300
We calculate the deviation to the true VCD (defined by the GRUAN profiles) according to Eq. 19 for daily and climatological ERA-Interim data. The correlation coefficients as well as mean and SD of the resulting deviations are also included in Fig. 8.
For O 4 VCDs based on daily ERA-Interim profiles, the agreement to VCDs integrated from GRUAN profiles is generally very good. Correlation coefficients are almost 1 and deviations are close to 0 for most stations. Only for mountainous sites as Boulder, where surface altitude differ between GRUAN and ERA-Interim, clear deviations from 0 are found.

305
Results based on the ERA-Interim climatology, however, show far weaker correlation than for daily ERA-Interim data, as they do not resolve day-to-day changes in meteorology. Mean deviations are within ±1 % for most stations, with a SD of about 2 %.
In comparison to these existing methods, the O 4 VCDs based on Eq. 14 are worse than those based on daily ERA-Interim profiles, but significantly better than those based on a profile climatology, in particular in terms of correlation and SD.

Impact of temperature inversions
The presented parameterizations derive the O 4 VCD just from surface values of T , p, and RH. This requires some basic assumptions about the atmospheric profile shape. In case of temperature inversions, these assumptions do not hold. Thus, we focused on daytime conditions by selecting only data with SZA<85°. But still, temperature inversions can also occur during daytime, in particular over cold water and ice surfaces, as well as shortly after sunrise. 315 Figure 9 displays temperature inversions, here defined as the difference between tropospheric maximum and surface temperature, for ERA5 data on 18 June 2018. Strong temperature inversions are found e.g. over Hudson Bay or the Great lakes where sea surface temperature is low. Also at the Western edge of the illuminated Earth (i.e. shortly after sunrise), temperature inversions occur, e.g. in North and South Africa at 6:00 UTC, indicating remainings of nocturnal profiles.
Large parts of the regions with high positive deviation δ Γ (Fig. 3) or δ RH (Fig. 7) actually correspond to temperature in-320 versions. Thus, for MAX-DOAS measurements close to cold surface waters or other regions with temperature inversions, the formalism of Eq. 10 and Eq. 14 should only cautiously be applied, and corrections of surface temperature might be needed for better results. For the discussion below, temperature inversions of more than 2 K have been skipped from ERA5 data in order to avoid interference of different effects, in particular over Antarctica. Figure 9. Temperature inversions, expressed as difference between tropospheric maximum and surface temperature, for ERA5 simulations at 0:00, 6:00, 12:00, and 18:00 UTC on 18 June 2018.

325
The formalism in Sect. 2.3 is assuming dry air. Addition of humidity results in lower O 2 and O 4 number densities, which significantly affects the O 4 VCD, especially in the tropics . Humidity affects all terms in Eq. 6 (i.e. the O 2 VCD, the O 2 surface number density, and the ratio of effective heights of O 2 and O 4 ), but cannot be accounted for in the formalism without completely losing the simplicity of Eq. 10.
However, these effects are partly accounted for in Eq. 14, with empirically determined parameters a and b, since the ratio Q 330 was determined based on the true O 4 VCD where humidity effects were appropriately accounted for. In order to check for possible remaining impacts of humidity on the performance of Eq. 14, we check how far δ RH is related to specific humidity at ground (Fig. 10). In addition, we compare δ RH also to the total column water vapor, as this provides information on humidity in the full column, not only at surface. In both cases, correlations are low, and no significant impact of humidity on δ RH could be found.

Impact of surface altitude
Figures 6 and 7 reveal systematic spatial patterns in δ RH corresponding to mountains. We thus investigate a possible relation between surface altitude and δ RH for all investigated datasets (Fig. 11). For the WRF simulations, the Alps can be clearly recognized in Fig. 6, with mountains showing lower values of δ RH . This can also be seen in the density plot in Fig. 11 (a), where surface altitude and δ RH are anticorrelated with R= −0.46, and a decrease of δ RH of roughly 1 % per km. For GRUAN stations (c), results are similar, but statistics are poor, and the correlation coefficient is low, as only two stations (Boulder and La Reunion) are available with a surface altitude above 1 km.
For ERA5, however, results are not as clear as those for WRF. The correlation coefficient is low, and for altitudes between 2 and 3 km, it looks like δ RH is increasing rather than decreasing with altitude. And for very high surface altitudes as found over the Himalaya, δ RH is still close to 0 and would not match the slope of 1 % per km estimated for WRF.

345
The reason for the poor correlation between z 0 and δ RH for ERA5 compared to the WRF results is not clear to us. Obviously, other factors would probably also have to be considered (season, SZA). But since there is no clear correlation, and a quantitative correction would rather worsen δ RH instead of improving it for several mountain areas around the globe, we decided not to apply an explicit correction for surface altitude.
Consequently, the parameterization of Eq. 14 has higher uncertainties when applied for mountainous sites: for z 0 > 2 km, 350 δ RH is −0.5 % on average with a SD of 1.8 %. But still, the parameterized O 4 VCD matches the requirement of accuracy/precision better than 3 % even for elevated sites.

Diurnal cycles
Surface conditions can change rapidly, e.g. in case of passing frontal systems or storm tracks. For such rapid changes, the change of the true O 4 VCD might not be adequately represented by the change of V O4, RH . These effects are reflected in the SD this, we extract the WRF simulations at the locations of the DWD ground station network. In order to focus on strong diurnal patterns, we select at each station those days where the change of surface temperature, as recorded by DWD, exceeds 10 K. Figure 12 displays the diurnal cycles of surface properties and O 4 VCDs for WRF and DWD station data. While surface pressure shows no relevant changes during day, surface temperature increases by 9.8 K from morning to afternoon due to the selection of days with strong diurnal cycle in surface temperature 2 . For WRF simulations, a similar pattern is found, but 365 the mean temperature increase over the day is smaller (7.4 K). As V O4, RH is reciprocal to T 0 , a change of 10 K in surface temperature alone would correspond to a change of V O4, RH of 3.5 %. However, at the same time, RH decreases by about 30 %, which has an opposite effect on V O4, RH . Consequently, the diurnal cycle of V O4, RH is only moderate (about 1.9 % and 1.1 % decrease from morning to evening for DWD and WRF, respectively, where the cycle for WRF is less strong due to the less strong cycle in T 0 ).

370
The true O 4 VCD, as derived from the integrated WRF profiles, also decreases over the day, and agrees well to V O4, RH (WRF) in the afternoon. In the morning, however, V O4, RH (WRF) is higher compared to noon by 0.8 %, while V O4, true is only 0.5 % higher. This deviation between parameterized and true O 4 VCD indicates that in the early morning, surface measurements are not as useful for determining the full column, which is probably related to remainders of the nocturnal boundary layer which often has atypical lapse rates due to temperature inversions.

375
But even during morning hours, the systematic error made by V O4, RH is relatively small, at least for the investigated time period for Germany. But also for the global ERA5 analysis, the impact of diurnal cycles on the calculation of the O 4 VCD is only moderate; otherwise, Figures 7 and C2 would show systematic East-West gradients.
Thus, the parameterization of Eq. 14 also reflects most of the diurnal cycle of the O 4 VCD, with remaining systematic errors below 0.3 %.

Accuracy and precision
In Eq. 14, we provide a formula for the calculation of the O 4 VCD. Accuracy and precision of the resulting V O4 thereby depend on accuracy and precision of (1) the chosen parameterization and (2) surface values p 0 , T 0 and RH 0 .
1. We estimate overall accuracy and precision of Eq. 14 to < 1 % and < 2 % based on mean and SD of deviations between parameterized and true O 4 VCD for WRF, ERA5 and GRUAN data as presented above. Higher deviations can occur in 385 particular in case of temperature inversions (see Sect. 5.2).
2. Application of Eq. 14 requires surface measurements of p 0 , T 0 , and RH 0 . Uncertainties of temperature and pressure are rather uncritical, as an error of 1 K and 1 hPa for T 0 and p 0 would correspond to an error of 0.3 % and 0.2 % in V O4, RH , respectively. In order to reach an accuracy/precision of 1 %, the corresponding errors of RH 0 have to be lower than 16 %. These limits should be achievable for adequate meteorological instrumentation and a measurement procedure 390 following WMO guidelines. In particular, surface temperature should be measured at about 1.25 to 2 m above ground using a radiation shield (WMO, 2018).
The proposed parameterization thus allows to calculate O 4 VCD with overall uncertainties below 3 %, which is sufficient for applications in MAX-DOAS profile inversions. Compared to existing methods, the parameterization yields even better results than a profile climatology. 395 We thus consider the proposed parameterization as useful approach for determining the O 4 VCD for cases where no daily model profiles are available, and recommend to also apply it for mountain sites for comparison and possible correction of daily model profiles.

Conclusions
The O 4 VCD can be expressed in terms of surface pressure and temperature based on physical laws, if a constant lapse rate is 400 assumed, without the need for constructing full vertical profiles. With an empirical correction which parameterizes the effective lapse rate as linear function of surface RH, we could present a formula for simple and quick calculation of the O 4 VCD based on p 0 , T 0 , and RH 0 : This accuracy and precision of < 3 % is typically lower than other uncertainties of spectral analysis or radiative transfer modeling . Thus, the proposed parameterization is well suited for application in MAX-DOAS profile inversions. Moreover, the parameterization reflects the true O 4 VCD, as derived from radiosonde measurements, even better (in particular in terms of temporal correlation and SD) than O 4 VCD calculated from a climatology of atmospheric profiles of T , p and RH. We thus recommend to equip MAX-DOAS measurement stations with state-of-the-art thermometer (with 415 radiation shield), barometer, and hygrometer.
Code availability. A Python implementation of the derived functions for the calculation of the O4 VCD is provided in the Supplementary material. , which allows for simpler notation avoiding compound fractions.
For application in Eq. 6, the inverse ratio has to be taken.

A1 Isothermal atmosphere
For the simple assumption of a barometric pressure profile with constant T , the O 2 number density decreases exponentially with altitude: Thus, for O 2 profiles declining exponentially with z, the ratio of effective heights is just

A2 Polytropic atmosphere
If the temperature is changing linearly with altitude, i.e. the dependence of T (z) = T 0 + Γ · (z − z 0 ) is described by a constant lapse rate Γ, the resulting profile of O 2 follows a power function: being altitude above surface, and being the constant exponent.
Note that the for a constant lapse rate, temperature reaches 0 K at an altitude of For T 0 = 300 K and Γ = −6.5 K km −1 , z TOA is about 46 km.
Thus, Eq. A3 is defined from z = 0 to z = z TOA , and n O2 is set to 0 above.

Integration of Eq. 3 yields
For O 4 , the number density profile is and thus The ratio of effective heights can then be calculated as For a lapse rate of 0 this equals the result for exponential profile (=2). For a typical lapse rate of −6.5 K km −1 , the ratio of effective heights is 1.81.

A3 Impact of the tropopause
In the previous section, the ratio of effective heights was calculated assuming a constant lapse rate throughout the atmosphere.

455
A more realistic approach would be to assume a constant temperature above the tropopause (TP), as was done in . However, with the separation of the atmosphere in troposphere and stratosphere, it would not be possible to express the ratio of effective heights as simple function of the lapse rate as in Eq. A10. Thus, we decided to neglect the impact of the tropopause on the ratio of effective heights in the derivation of Eq. 10.
This causes a bias of V O4, Γ that can be easily quantified from Eq. 10 itself (applied at the tropopause instead of ground):

460
The stratospheric O 4 column for constant T is TP TTP for constant lapse rate. The difference is 6 · 10 40 molecules 2 cm −5 (for T TP = 200 K, p TP = 193 hPa), which is about 0.45% of the total O 4 VCD.
Thus, the O 4 VCD derived from Eq. 10 is higher than the respective VCD resulting from the profile construction proposed in Wagner et al. (2019). For V O4, RH (Eq. 14), this bias is eliminated by the empirical fit to the true O 4 VCD.
The O 4 VCD depends on the ratio of effective heights for O 2 and O 4 (Eq. 6), which can be expressed by the atmospheric lapse rate (Eq. A10). This formalism might also be used in the other direction: from total column measurements of O 2 and O 4 , an effective atmospheric lapse rate can be derived.

+
and thus

B1.2 DWD weather stations
Germany's National Meteorological Service (Deutscher Wetterdienst, DWD) provides hourly measurements of surface temperature, pressure and relative humidity for a network of ground stations in Germany (Kaspar et al., 2013). Data are provided via 490 the climate data center web interface (CDC-v2.1; https://cdc.dwd.de/portal/). The meteorological measurements are performed in accordance to the guidelines of the world meteorological organization (WMO) to minimize local effects. Additionally, we have applied quality control filters such that the parameters QUALITAETS_BYTE (QB) is below 4 (thereby excluding untested, objected, and calculated values), and QUALITAETS_NIVEAU (QN) is either 3 (automatic control and correction) or 7 (second control done, before correction) to only retain measurements of high quality. By applying these criteria, we retained 495 98.2 %, 100%, and 99.5% of T 0 , p 0 , and RH 0 data, respectively. Note that using only data with QN=10 (the best possible quality check level) would result in no data left for the period considered in this study. If only QN=7 had been applied, we would have retained the same number of T 0 and RH 0 but no p 0 data.
For this study, we extract DWD measurements for May to June 2018, 6:00 to 17:00 UTC, and only consider stations providing T 0 , p 0 , and RH 0 simultaneously, resulting in 206 stations which are displayed in Fig. B2.

B1.3 Validation of WRF surface values
We use the DWD network of surface stations for investigating the accuracy and precision of the WRF simulations. Figure   B3 displays correlations between surface values from the DWD station network and the respective WRF simulations. For this purpose, each station is associated with the nearest neighbor from the WRF simulation. We do not interpolate the WRF data as we still want to compare the parameterized O 4 VCD with the true VCD derived from vertical integration of the WRF profiles.

505
Surface altitude (a) is lower in the gridded elevation map used as input in the WRF simulations by 20 m on average, and by almost 1 km for the station on Germany's highest mountain, Zugspitze. This is a consequence of the spatial resolution of the WRF simulations of 1 km, which is not sufficient for resolving single mountains. The systematic negative bias of WRF surface altitude indicates that the DWD stations tend to be located on hill and mountain tops.
This difference in altitude would directly affect the comparisons of T and particularly p. Thus, we apply a simple correction 510 of station values and extrapolate them to the respective WRF surface altitude assuming a lapse rate of −6.5 K km −1 . For RH, no correction is applied.
The comparison reveals a good agreement between surface values from WRF and DWD, with remaining systematic biases of WRF simulations of −1 K for T 0 and 1 % for RH 0 . Figure B3. Comparison of WRF surface values (y axis) to DWD ground stations (x axis). For T and p, station values are adjusted to the mean altitude of the respective gridded elevation map used as input for WRF siumulations (see text for details).

515
The GRUAN stations used in this study are listed in Table B1, including station shortcut and full name, latitude, longitude, altitude of the station, and the number of available profiles with SZA< 85°. Figure B4 displays a map showing the GRUAN station locations.
The temporal cover of radio sonde measurements at the different stations is displayed in Fig. B5. Note that some stations only contribute a low number of measurements. Still, we decided to keep all stations, as the application of a threshold for a 520 minimum number of profiles of e.g. 50 would remove all tropical sites (Darwin, Manus and Nauru). Figure B4. Location of GRUAN stations considered in this study. For station names and further details see Table B1. Figure B5. Time of the available sonde flights (with SZA<85°) for the GRUAN stations considered in this study. For station names and further details see Table B1. 18 December (bottom) 2018. The projection focuses on daytime for each timestep. On the right, the frequency distribution and mean and SD are given for all hourly outputs of the respective day. Figure C2. Deviation δRH according to Eq. 19 for ERA5 at 0:00, 6:00, 12:00 and 18:00 UTC on 18 March (top), 18 September (middle) and 18 December (bottom) 2018. The projection focuses on daytime for each timestep. On the right, the frequency distribution and mean and SD are given for all hourly outputs of the respective day.
SB developed the full formalism, performed the intercomparisons to external datasets, and wrote the manuscript, with input and feedback from all co-authors.