Continuous temperature soundings at the stratosphere and lower mesosphere with a ground-based radiometer considering the Zeeman effect

. Continuous temperature observations at the stratosphere and lower mesosphere are rare. Radiometry opens the possibility of observing microwave emissions from two oxygen lines to retrieve temperature proﬁles at all altitudes. In this study, we present observations performed with a temperature radiometer (TEMPERA) at the MeteoSwiss station at Payerne for the period from 2014 to 2017. We reana-lyzed these observations with a recently developed and improved retrieval algorithm accounting for the Zeeman line splitting in the line center of both oxygen emission lines at 52.5424 and 53.0669 GHz. The new temperature retrievals were validated against MERRA2 reanalysis and the meteorological analysis NAVGEM-HA. The comparison conﬁrmed that the new algorithm yields an increased measurement response up to an altitude of 53–55 km, which extends the altitude coverage by 8–10 km compared to previous retrievals without the Zeeman effect. Furthermore, we found correlation coefﬁcients comparing the TEMPERA temperatures with MERRA2 and NAVGEM-HA for monthly mean proﬁles to be in the range of 0.8–0.96. In addition, mean temperature biases of 1 and − 2 K were found between TEMPERA and both models (MERRA2 and NAVGEM-HA), re-spectively. We also identiﬁed systematic altitude-dependent cold and warm biases compared to both model data sets.


Introduction
Continuous and weather independent temperature soundings with high temporal and vertical resolution at the stratosphere and lower mesosphere are experimentally challenging but desirable to measure continuously the temperature at the stratosphere and lower mesosphere and to assess the intermittent behavior of atmospheric waves, which is important for understanding the day-to-day variability of the forcing from below in the ionosphere and thermosphere for space weather applications (Liu, 2016). Continuous observations of atmospheric temperature in the middle atmosphere are crucial to understand the chemistry (e.g., ozone) (Stolarski et al., 2012;Anderson et al., 2017) and to infer dynamics due to thermal wind balance (Matthias and Ern, 2018).
Satellite observations provide global coverage. SABER (Sounding of the Atmosphere using Broadband Emission Radiometry) on board the TIMED (Thermosphere-Ionosphere-Mesosphere-Energy and Dynamics) satellite measures temperatures from the troposphere up to mesosphere and lower thermosphere. The satellite has an orbit around Earth that permits to cover all local times within 60 d and, thus, provides only limited information on the short-term variability of tides and planetary waves. Furthermore, the latitudinal coverage changes in time due to the yaw cycle of the spacecraft (Russell et al., 1999;Remsberg et al., 2008;Rezac et al., 2015). The Microwave Limb Sounder (MLS)  on the AURA satellite (Schoeberl et al., 2006) is on a sun-synchronous orbit and, thus, passes at fixed local times the same geographic locations making a data analysis of tides and their intermittency unfeasible, although MLS obtains temperatures from the stratosphere up to mesosphere covering all latitudes between 82 • N and 82 • S (Panka et al., 2021).
However, for low-and mid-latitudes SABER observations have been utilized to gain some insight into the climatological seasonal behavior of the migrating and nonmigrating diurnal and semidiurnal tides (Oberheide et al., 2011;Dhadly et al., 2018). Furthermore, these satellite observations have been proven to be valuable for data assimilation purposes into general circulation models (GCMs) such as the Navy Global Environment Model -High Altitude (NAVGEM-HA) (Eckermann et al., 2018). NAVGEM-HA temperature and wind fields show reasonable agreement to ground-based observations and the underlying day-to-day variability due to atmospheric tides and planetary waves (McCormack et al., 2017;Stober et al., 2020). Continuous ground-based temperature observations of the stratosphere and mesosphere are challenging and ambitious. There are only a few Rayleigh lidar measurements that are long enough to infer the tidal variability (Baumgarten et al., 2018;Baumgarten and Stober, 2019). This is mainly due to the fact that lidar observations are weather dependent, which essentially limits the measurement time and data availability. Furthermore, some of these lidars have only nighttime capabilities (Wing et al., 2018;Sica and Haefele, 2015), introducing additional ambiguities to infer mean temperatures and to assess the tidal variability.
Microwave radiometry offers a robust remote sensing technique that is almost weather independent to retrieve atmospheric temperature profiles at the stratosphere and lower mesosphere. A few years ago the University of Bern developed a temperature radiometer TEMPERA (temperature radiometer) to perform continuous soundings including the troposphere (Stähli et al., 2013;Navas-Guzmán et al., 2016). Recently, we developed a new retrieval algorithm due to updates in the radiative transfer model ARTS (Buehler et al., 2018;Eriksson et al., 2005) and revised Quantum numbers of HITRAN. The new retrieval algorithm accounts for the Zeeman effect at the line center in both emission lines at 52.5424 and 53.0669 GHz for routine temperature soundings. The advantage of the new retrieval algorithm is an increased altitude coverage. In this study, we present a validation of the new temperature profiles against MERRA2 and NAVGEM-HA for the location Payerne in Switzerland.
The article is structured as follows. Section 2 contains a brief description of the temperature radiometer TEMPERA, and Sect. 3 summarizes the Zeeman effect on the oxygen emission lines. MERRA2 and NAVGEM-HA data sets are presented in Sect. 4. The retrieval algorithm is outlined in Sect. 5. The TEMPERA temperature soundings and valida-tion are shown in Sects. 6 and 7. The results are discussed in Sect. 8. Our conclusions are summarized in Sect. 9.
2 The TEMPERA radiometer TEMPERA is a ground-based radiometer that was developed at the University of Bern. It measures atmospheric microwave radiation in the range of the oxygen emission complex at 50-60 GHz. For stratospheric temperature retrievals, two emission lines of the O 2 molecule are observed with a high-resolution digital FFT spectrometer at 52.5424 and 53.0669 GHz with a resolution of 30.5 kHz and a bandwidth of 960 MHz. The instrument was located at the aerological station in Payerne (46.82 • N, 6.95 • E, 491 m a.s.l.) and was directed westwards with an elevation angle of 60 • . The antenna's half-beam-width (HPBW) is 4 • . A more detailed technical description of the instrument can be found in Stähli et al. (2013). The measured spectra can be inverted into vertically resolved temperature profiles considering the pressure broadening of the spectral emission lines and their radiative transfer. Retrievals presented in this study make use of the Atmospheric Radiative Transfer Simulator (ARTS) (Buehler et al., 2018) and Qpack, the Matlab interface, for ARTS (Eriksson et al., 2005).
Already in 2015 first observations of the Zeeman effect in the line center for atmospheric Oxygen were reported using TEMPERA (Navas-Guzmán et al., 2015). In 2017 Navas-Guzmán et al. (2017) presented a comparison of almost 3 years of continuous TEMPERA observations with radiosondes, the Microwave Limb Sounder (MLS) on board the AURA spacecraft, and a Rayleigh lidar. These former studies inferred stratospheric temperature profiles up to an altitude of 40-45 km altitude blanking the line center to avoid a contamination of the temperature measurements due to Zeeman line broadening, which was not included at this time in the retrievals due to limitations in the available databases for the radiative transfer and quantum numbers in HITRAN that are required to account for the Zeeman effect in both oxygen lines (Larsson et al., 2019).
The observations presented in this study were performed with the laboratory prototype between 2014-2017 (Stähli et al., 2013). The receiver was upgraded in July 2015, which improved the overall performance of the instrument. The upgrade showed much better suppression of the standing waves. However, the new receiver introduced a small temperature offset in the calibrated and tropospheric corrected spectra of about 0.6 K.

The Zeeman effect
The Zeeman effect is a splitting of energy levels in emission and absorption processes due to an interaction of the molecules involved with a magnetic field. Atmospheric oxygen has a permanent magnetic moment that interacts with Earth's magnetic field. Therefore an emission line, coming from rotational transitions, splits up into several lines. The degree of the line splitting depends on the strength of the magnetic field. Earth's magnetic field is rather weak, compared to stellar magnetic fields often analyzed in astronomy, which leads more to a broadening of the line center rather than a visible separation of individual Zeeman lines for each energy level. At mesospheric altitudes where the atmospheric pressure is already low, Zeeman broadening dominates over pressure broadening. Thus, temperature retrievals above 45 km are no longer feasible without taking into account the Zeeman effect. The change of the line shape due to the Earth's magnetic field for both frequencies is demonstrated in Fig. 1 and underlines the importance to include the magnetic field strength in the inversion. The new retrieval algorithm (Larsson et al., 2019) computes the Zeeman effect for both oxygen emission lines.

MERRA2 and NAVGEM-HA
Stratospheric and mesospheric temperatures obtained from the new retrieval algorithm are compared to MERRA2 reanalysis (Gelaro et al., 2017) and to the meteorological analysis NAVGEM-HA (Eckermann et al., 2018). The vertical temperature profiles are extracted for the location of Payerne considering the spatial averaging of the radiometer of about 250 km in diameter keeping the temporal resolution of the model fields of 3 h. Only the vertical resolution of the model data was interpolated to a fixed altitude grid with 2 km vertical resolution to simplify the comparisons. MERRA2 reanalysis utilizes a 3DVAR data assimilation (e.g., Gelaro et al., 2017, and references therein), which updates the state vector every 6 h. A detailed description of the hybrid 4-DVAR data assimilation in NAVGEM-HA is provided in Kuhl et al. (2013) and Eckermann et al. (2018). Similarly to MERRA2 the model state vector is updated every 6 h at the mesosphere.
For the comparison with the temperature observations from TEMPERA, the model data was analyzed at the geographic location of Payerne, and all grid points in a 250 km radius were averaged after they had been interpolated to a geometric vertical altitude grid. Daily mean temperatures and tidal amplitudes were derived by an adaptive spectral filter similarly to Pokhotelov et al. (2018), Baumgarten and Stober (2019), and Stober et al. (2020). The geopotential altitudes from NAVGEM-HA were converted into geometric heights (Stober et al., 2021). The temporal resolution of 3 h for both model data was kept.
5 Temperature retrieval with optimal estimation 5.1 Temperature retrievals The inversion of the forward model is solved with ARTS 2.4 (Atmospheric Radiative Transfer Simulator; Buehler et al., 2018). The mathematical method follows the formalism from Rodgers (2000) and is briefly explained in this section.
Lets be y the measurement vector and x the state vector. In our case, y is the spectrum with n channels, and x is the temperature profile with m grid points. The forward model F (x, b) maps the atmospheric state x to an idealized spectrum, this is usually written as The vector b contains some other parameters that are not included in the state vector, and is the measurement error. The challenge is to find an inversion of the forward model F (x, b) that presents an optimal estimate to the observations. The problem is that there is often no unique state x for a given measurement y, which is classified as ill-posed. The inversion (also called retrieval) can be understood as a mapping R of the measurement vector y onto an optimal state vectorx whereb is the best estimate of the forward model parameters, x a denotes the a priori knowledge on the state vector, and c are some additional parameters. The optimal estimation method (OEM) provides the most probable solutionx in the context of the forward model. To apply this method information about the atmospheric state must be added. This information is included in the a priori state x a , which is a preknowledge background state of the atmosphere. The choice of a certain a priori state is crucial and explained in Sect. 5.2. The error covariance of the a priori state is described in the a priori-covariance matrix S a , and the measurements errors are described in the measurement-error covariance matrix S . The optimal solution can be found by maximizing the probability P (x|y) of x under the condition that y is known, or in this case equivalent and most common, minimizing the cost function J (x) = −2 ln P (x|y), which can be written in the form The derivation of this cost function is based on Bayes' probability theorem, and the assumption that the probability distributions for the a priori covariance S a and for S , as well as the posterior distribution x, are Gaussian. The minimum of J (x) is found by the following condition This equation is solved using several iterations making use of the Levenberg-Marquardt solver. Thus, successive iterations are computed from where K = ∂F /∂x is called the weighting function. The a priori profile was used for the 0th step x 0 = x a . For γ = 0, this method is equivalent to the Gauss-Newton method. The damping term γ D ensures the iteration to converge, even under poor conditions, making this method more robust but also slower compared to the Gauss-Newton scheme.

A priori atmospheric information
The retrieval algorithm is initialized using an ECMWF climatology. The climatology was obtained averaging daily mean ECMWF data between 2014-2017 smoothed by a 30 d running window. The resulting seasonal a priori temperature behavior is shown in Fig. 2. Based on this climatological mean atmospheric state, the radiative transfer equations are solved for several molecular species, e.g., O 2 , H 2 O, O 3 , and N 2 . However, not all of them contribute significantly to the radiation intensity between 50-60 GHz. Spectroscopic data for O 2 was taken from the HITRAN database (Gordon et al., 2017). These quantum numbers are necessary to account for the Zeeman effect in the radiative transfer model. The magnetic field strength for the location of Bern at the altitude of the mesosphere is taken from ARTS.

Tropospheric correction
The new retrieval still incorporates a tropospheric correction. The received signal is the integral along the line of sight of all emitted microwave radiation, also including tropospheric altitudes. However, the main goal of the new retrieval is the improvement of the stratospheric and mesospheric temperature soundings, which requires a higher frequency resolution in the line center at the cost of the much broader tropospheric signal, which still dominates the overall brightness temperature in the line wings and the center. Therefore, the tropospheric signal is separated and removed from the stratospheric and mesospheric intensities by implementing a tropospheric correction. The method is based on the assumption that the troposphere can be approximated by a homogeneous layer with a weighted mean brightness temperature Where the integral is taken from the ground z 1 to the top of the troposphere z t , ν is the frequency, α denotes the absorbing coefficient, τ the opacity, and T (z) is the physical thermal equilibrium temperature. The weighted mean temperature is used to estimate a mean tropospheric opacity τ trop (ν). After estimating all these parameters the brightness temperature on the top of the troposphere is determined by solving the radiative transfer equation. The integrals above are dominated by the lowest altitudes because α is pressure dependent and decreases quickly with increasing altitude. Assuming a linear relationship between the surface temperature T s and T m leads to To determine the coefficients a and b radiosonde measurements at Payerne launched from MeteoSwiss were used. The coefficients for the TEMPERA frequency range are found in Navas-Guzmán et al. (2015) and take values for a = 0.8159 and b = 47.211. Further details about the method are described in Ingold and Kämpfer (1998). All previous studies based on TEMPERA have applied such a tropospheric correction (Stähli et al., 2013;Navas-Guzmán et al., 2015. Although, hitherto observations with TEM-PERA indicate that the tropospheric correction seems to work well, it represents a coarse approximation that is worth further investigation for various weather conditions. In particular, tropospheric inversion layers might have a more critical impact on the mean tropospheric opacity τ trop (ν).

Measurement errors
Statistical measurement errors arise from two sources. The first error source is the receiver noise and the second one is atmospheric noise, which originates from fluctuations and turbulent processes in the field of view. Typically, receiver and atmospheric noise are considered as zero-mean Gaussian random processes. Both together, measurement-noisevariance and atmospheric-noise-variance contribute to the measurement-error covariance (Rodgers, 2000). Other systematic errors, such as a systematic frequency shift in the channels, are often hard to identify and, thus, are not taken into account.
In the following, we briefly discuss how the measurement errors are obtained. Considering y ij = y(ν i , t j ) as the measurement matrix, ν i is the frequency of channel number i, and t j is the time of spectrum number j in a time series with N spectra. The channels are assumed to be uncorrelated with the variance σ 2 i . The final measurement spectrum is the mean of y ij over time y i = 1 N j y ij , so that the variance σ 2 i of y i is related to σ 2 i as From this, one can calculate σ 2 i by taking the sample variance of y i . A more stable method is to consider the variance σ 2 i of the differences y ij = y ij − y i+1j , which is related to σ i as σ 2 i = 2σ 2 i and, hence, Assuming that all channels are uncorrelated, the measurement error covariance matrix S takes a diagonal form with entries

A priori covariance
The a priori covariance determines the uncertainty of the a priori state. For temperature profiles, one usually chooses a constant value for each grid point and a with distance exponentially decreasing correlation. However, varying the a priori covariance with altitude can improve the retrieval significantly with respect to the measurement response obtained.
Since the platform altitude was set at 12 km (see tropospheric correction), lower altitudes have to be excluded from the inversion. For this purpose, the a priori covariance σ a (z i ) was set to 0.1 K up to 12 km (z i is the altitude at the ith grid point). Above the virtual platform altitude, the a priori covariance increases linearly with altitude to a value of 6 K at 50 km, and higher up in the atmosphere, the value increases to 8 K at 60 km and beyond that height, the covariance reaches 12 K at 70 km altitude. A linear increase with altitude avoids numerical oscillations due to sharp "jumps" in the profile, which would occur when a step function is implemented instead. The larger values at the upper altitudes of the retrieval domain are beneficial to optimize the information content of the measurement vector. However, we have to note that this method tends to be more prone to generate some unwanted numerical effects such as spurious oscillations. On the other hand, a smaller choice of the a priori covariance for these altitudes forces the retrieval to stay close to the a priori state, and, thus, information content would be lost. The values described above were optimized through empirical tests prioritizing an optimal balance between numerical stability and high sensitivity of the solution at the stratosphere and lower mesosphere. Considering these aspects, the covariance matrix S a takes the form where h is the correlation length, which was set to be h = 1 km.

Other sources of uncertainty
The advantage of the optimal estimation implementation of the retrieval is the possibility to derive the information gain from the observations (Shannon, 1948;Shannon and Weaver, 1949). An important quantity, which is widely used for error analysis in information theory, is the gain matrix given by The gain matrix can be interpreted as the sensitivity of the retrieval R to the measurement y. Furthermore, the gain matrix can be used to define the averaging kernel matrix by According to Rodgers (2000) the averaging kernel is the sensitivity of the retrieval to the (unknown) true state. The rows of A provide correlations and a distinct maximum, which defines the altitude of maximum measurement response for a vertical grid point. Their half-width can be regarded as a measure of the effective vertical resolution. The measurement response vector mr is defined as (Rodgers, 2000;Eriksson et al., 2005) where A i is row i of the averaging kernel matrix, and x ai is the ith entry of the a priori state. For an ideal retrieval, the value of mr equals 1. The lower the measurement response, the more a priori information is included in the solution. Measurement responses below 0.6 indicate that the retrieved state depends mostly on our a priori information.
Weighting the measurement-error covariance matrix S with G y one obtains the retrieval noise (or observational error) covariance matrix Another indicator is the modeling error s M , obtained by weighting the retrieval residuals with G y In theory, this vector should be evaluated at the true state x and b, which is, of course, not known. Evaluating this quantity at the retrieved statex instead will lead to slightly increased values.
As an example, a set of quality control parameters are illustrated in Fig. 3. The modeling error, which is directly related to the forward model residuum, leads to the conclusion that the retrieved profile is underestimated around 30-40 km and overestimated around 45-55 km by about 2 K.
The AVK up to 40 km shows the expected and already documented behavior (Stähli et al., 2013;Navas-Guzmán et al., 2017), where the best performance is reached at an altitude around 30 km. From 40 km upwards the Zeeman calculation leads to a second but lower peak between 40-50 km. The last peak between 60-65 km is due to the increased a priori error in this region. This behavior is also reflected in the measurement response. As a rule of thumb, the altitude range of a retrieved profile is usually defined as the region where the measurement response is above 0.8, which can be found at altitudes between 22-53 km.

Temperature retrievals including the Zeeman effect
The revised temperature retrieval was applied to data collected with TEMPERA in Payerne between 2014-2017. The main differences compared to previous work by Navas-Guzmán et al. (2015 is the inclusion of the Zeeman effect in the center of the oxygen emission lines and the use of updated a priori and measurement covariances to improve numerical stability and the retrieval sensitivity. Furthermore, there is the new retrieval emphasis on stratospheric and mesospheric altitudes to observe tidal waves and their temporal intermittency. The temporal resolution was slightly decreased from 2 h (Navas-Guzmán et al., 2017) to about 2.5 h. The increased integration time resulted in more robust temperature estimates. On average, we obtained 8-9 integrated spectra per day. Each integrated spectra consists of about 150-160 individual atmospheric soundings and spectra obtained from atmospheric observations lasting 0.5 s (mirror pointing towards sky) using the stratospheric and mesospheric measurement mode with the high-resolution FFTspectrometer.
We also implemented a quality control before the averaged spectra is computed. Some spectra are removed from the averaging due to increased atmospheric noise, mainly caused by tropospheric weather, e.g., strong precipitation or temporary technical issues with the instrument. On average, about 3.6 % of the integrated spectra are removed from the analysis within the 4 years of observations. Figure 4 shows temperature soundings for TEMPERA, MERRA2, and NAVGEM-HA for the whole period (2014-2017). The seasonal pattern indicates higher temperatures at all retrieved altitudes during the summer season and lower temperatures at the stratosphere during the winter months. The winter months are characterized by an increased planetary wave activity and the frequent occurrence of suddenstratospheric-warmings (Scherhag, 1952;Matsuno, 1971;Limpasuvan et al., 2016;Matthias et al., 2013). Also, the spring transition is clearly distinguishable from the temperature data (Matthias et al., 2021).
First of all, we investigated how critically the final retrieved temperatures depend on our a priori data. Figure 5 shows a difference between TEMPERA and the ECMWF climatology. The larger the differences at some altitudes, the less critical is the choice of the a priori data, which indicates that at these levels, the solution is only given by the measurements. Furthermore, from 60 km and higher up, the colors become brighter, which points out that these heights depend more on the a priori information. This is also reflected by the lower measurement response and is consistent with the averaging kernels presented in Fig. 3 for these altitudes.

Comparison of temperature retrievals to MERRA2 and NAVGEM-HA
The performance of the new temperature retrievals is assessed by comparing our observations to state-of-art reanalysis data from MERRA2 and the meteorological analysis of NAVGEM-HA. Therefore, we compute correlation coefficients based on monthly medians and corresponding variances for all data sets. These monthly medians essentially remove all atmospheric waves on short timescales, such as tides and gravity waves, from the model fields as well as the temperature soundings. However, we have to note that atmospheric time series cannot necessarily be considered as Gaussian random variables. Often, the atmospheric natural variability exceeds the statistical uncertainty of the observations (e.g., Stober et al., 2017, see Fig. 3), and, thus, an overestimation or inflation of the correlation coefficients is the result. Assuming a linear regression model between the TEMPERA profiles T TMP (z) and the profile for cross comparison T CCP (z) the coefficients m, q were determined through linear regression. For two statistically identical data sets, we would obtain m = 1, and q = 0. The coefficient q gives an absolute offset of the two profiles, while a slope m above 1 indicates a higher sensibility of the profile T TMP (z) (see Sect. 8) relative to the compared profile T CCP (z). This method gives a quantitative estimation of the absolute offset but provides no information as to at which altitude this occurs. In Figs. 6 and 7 we show linear correlation coefficients of the median monthly temperature profiles for the year 2016 for TEMPERA vs MERRA2 and TEMPERA vs. NAVGEM-HA, respectively. The error bars correspond to the temperature variance for each data set. The correlations are estimated after subtracting the median temperature from each profile, which was estimated to be approximately 250 K The shift of the temperature profile to lower values is necessary because the linear regression would otherwise falsely give good values for m. The other years can be found in Appendix A. The monthly median temperature correlation coefficients exhibit a range between 0.93-0.98 for the comparison with MERRA2 and about 0.94-0.98 for NAVGEM-HA. The highest correlation coefficients are achieved during the summer months from April to September and in December. The lowest correlations are found during January and February and are the result of the increased planetary wave activity and the more variable polar vortex dynamics in 2016 Stober et al., 2017;Matthias and Ern, 2018). NAVGEM-HA indicates a similar seasonal behavior for the year 2016 and occasionally has minimal larger correlations. The mean temperature bias |q| between the new TEMPERA retrievals and MERRA2 is smaller than 1.5 K. The temperature bias relative to NAVGEM-HA takes values between −0.1 up to −2.2 K (excluding the exceptional January 2016). The slopes m of the linear regression with MERRA2 and NAVGEM-HA are in a range between m = 0.8 and m = 0.96, indicating a lower sensitivity of TEMPERA to the atmospheric variability relative to the model fields. We also estimated yearly median altitude-resolved Pearson correlation coefficients. These are shown in Fig. 8. It is remarkable that the correlation coefficients for retrievals with activated Zeeman effect (upper panels) are most of the time larger than 0.8 and often exceed 0.9 for MERRA2 and NAVGEM-HA, respectively. Furthermore, the seasonal correlation coefficients reveal a sharp drop off at about 53-55 km, which appears to be the limiting altitude for TEMPERA temperatures and the new retrieval. Above this altitude, the solutions of the retrieval are dominated by a priori information. The comparison also indicates that NAVGEM-HA exhibit a slightly higher correlation concerning TEMPERA temperatures relative to MERRA2.
In addition, Fig. 8 (lower panels) show yearly correlation coefficients for TEMPERA retrievals with deactivated Zeeman effect. For calculations without the Zeeman effect, usually the line center is blanked and only the line wings are used. This approach results in retrievals with a limited upper altitude of about 45 km. Above this altitude, the retrieved profile quickly converges towards the a priori profile, because due to the missing line center the measurements do not provide any information beyond these heights. Besides some small effects, the profiles with activated and deactivated Zeeman effect would match up to this altitude. The plots with Zeeman on/off are shown in Figs. 8 and 9. However, we retrieved temperatures keeping the line center but turned off the Zeeman effect by setting the magnetic field essentially to zero to investigate the impact of the Zeeman broadening. Correlation coefficients for retrievals with deactivated Zeeman effect are lower than 0.9 everywhere and even lower than 0.8 for most of the altitudes. This underlines that accounting for the Zeeman effect has not only an impact on the upper altitudes. Due to the energy conservation in the radiative transfer, almost all altitudes are affected with decreasing impact for the lower altitudes.
Another important aspect to compare are altitude-timedependent systematic differences between MERRA2 and NAVGEM-HA. Therefore, we compute altitude time residuals by subtracting MERRA2 and NAVGEM-HA from the temperatures observed by TEMPERA. Figure 9 shows the resulting temperature residuals for both model data sets and the complete time series. Similarly to the Pearson correlation coefficients MERRA2 and NAVGEM-HA reflect the same characteristic systematic differences. Furthermore, the installation and upgrade of the receiver around 5 June 2015 is clearly visible in the residual comparison. The new receiver reduced the standing wave contamination in the line wings and, thus, mostly affected the data quality below 40 km altitude.
In addition, Fig. 9 shows difference plots for TEMPERA retrievals with deactivated Zeeman effect and the full line center (the same as Fig. 8).
The vertical residuals show some systematic and altitudedependent differences. Below 35 km there is a tendency  for TEMPERA to show warmer temperatures compared to MERRA2 and NVAGEM-HA. Between 35-50, the models seem to have a warm bias compared to the radiometric temperature sounding. Above 53 km MERRA2 indicates a clear tendency to underestimate the temperatures relative to TEM-PERA, whereas NAVGEM-HA shows a more variable vertical structure of the residual temperature exhibiting times and altitudes with warmer, but also periods and heights with colder temperatures. It is also evident from the residual comparison that during the winter season the increased planetary wave activity leads to larger differences between our temperature observations and the model data.
The plots with Zeeman effect turned off shows cold biases of 20 K above 45 km and hot biases around 5-20 K below. The cold bias is due to an overestimation of the pressure broadening since the Zeeman broadening is treated as pressure broadening when Zeeman calculations are deactivated. Hot biases occur mainly as an effect of compensation since the total radiation intensity along the beam path has to be preserved.
Finally, Fig. 10 presents a comparison of the 3-hourly resolved temperature time series at 50 km altitude. The comparison underlines that the TEMPERA observations still exhibit such a high measurement response at this height that the temperature amplitude and phase of planetary waves is well captured in comparison to MERRA2 and NAVGEM-HA. There is also a characteristic diurnal tidal oscillation in MERRA2, NAVGEM-HA, and the radiometer data visible. Overall the measurements from TEMPERA and the MERRA2 and NAVGEM-HA temperature agree within a few kelvin (5-10 K). Furthermore, the comparison supports that it is feasible to obtain tidal information from the TEM-PERA temperature soundings on a daily basis.

Discussion
The main goal of the new retrieval algorithm was the implementation of the Zeeman effect in the temperature retrievals, which was not available in previous versions of the radiative transfer model for both oxygen emission lines. Thus, the new temperature retrieval yields an increased measurement response and altitude coverage up to 55 km compared to for-mer TEMPERA observations where 45-48 km seemed to be the limiting altitude (Stähli et al., 2013;Navas-Guzmán et al., 2015). Furthermore, the new retrieval was optimized concerning the a priori state vector and covariances, which also led to some improvement at the upper stratosphere and lower mesospheric heights. TEMPERA observations offer the possibility to perform continuous temperature measurements at altitudes between 16-55 km.
While implementing a new retrieval method it is always necessary to achieve a balance between numerical stability and sensitivity to the atmospheric state. A small a priori covariance or a too large measurement error results in low sensitivity of the retrieval, although such a retrieval is stable concerning numerical oscillations. On the other hand, such a retrieval likely underestimates the natural or true variability of the estimated parameters, and the solution would stay tied to the a priori state. A large a priori covariance improves the sensitivity of the retrieval, but at the cost of numerical oscillations, which can dominate the whole retrieved profile. The new retrieval is well balanced to achieve the highest possible sensitivity at 50 km while avoiding numerical instabilities and oscillations.
Statistical measurement errors are known with high precision; the final error on the temperature profile is rather a measure of the information content rather than an error in the classical sense. The state-of-the-art method is a cross comparison of different and independent data sets. Calculations of correlation coefficients or goodness-of-fit values (R 2 ) in a linear regression always requires some information about the uncertainty of the data set. Since this information is missed, usually the sample variation is taken instead. This approach should, however, be used with appropriate caution because atmospheric profiles or time series are not random variables, and natural variations could be bigger than the actual errors. This circumstance leads directly to an overestimation of the correlation coefficients of two compared data sets.
Continuous temperature observations at the stratosphere and mesosphere are rare. Lidars are often limited by the tropospheric weather conditions, and only a few long observations are available (e.g., Stober et al., 2017;Baumgarten and Stober, 2019;Eixmann et al., 2020). However, these lidar studies underline that continuous temperature observations are essential to investigate atmospheric wave and their intermittency covering periods from gravity waves, tides, and planetary waves at the source region, to research wave-wave interactions.
Satellite observations from MLS or SABER provide neither the temporal nor the spatial resolution to resolve all atmospheric waves and their intermittent behavior. Due to the spacecraft orbit and viewing geometry very often only one measurement per day is available for a specific geographic location. However, satellite observations are a key information source for data assimilation into MERRA2 and NAVGEM-HA at the stratosphere and mesosphere for the temperature and dynamical fields (Gelaro et al., 2017;Kuhl et al., 2013;Eckermann et al., 2018). Other meteorological observations such as radiosondes reach only altitudes of about 28-38 km and, thus, provide only temperature, wind, or chemical information at the lower and middle stratosphere. Furthermore, radiosondes are launched every 12 h, or at some stations occasionally every 6 h, which limits their impact to capture atmospheric tides at the stratosphere. Navas-Guzmán et al. (2017) has already performed an intercomparison of the TEMPERA observations with MLS satellite data, lidar, and radiosondes, as well as WACCM simulations. The Pearson correlation coefficients obtained were between 0.9 to 0.94 for a 3-year-long time series and were interpolated to match the different temporal resolutions and emphasized altitudes between 22-43 km where the measurement response was larger than 0.8. In this study, we have already achieved this degree of correlation using median monthly profiles and for yearly observations for the altitude range from 20-55 km. However, the comparison to MERRA2 and NAVGEM-HA still exhibits a warm bias of TEMPERA for the altitude range between 20-30 and 35 km and a cold bias between 30(35)-48 km, which was already found in Navas- . Some of the systematic biases at the lower altitudes as well as at the upper altitudes occur at heights with a low measurement response, and thus we investigated a potential a priori dependence by computing similar climatologies for MERRA2 and NAVGEM-HA as shown in Fig. 2 for ECMWF. A comparison of these a priori climatologies between all three reanalysis data sets revealed similar altitude-dependent offsets to the TEMPERA comparison and explains most of the upper stratospheric bias and at least partly the lower stratospheric offset.

Conclusions
In this study, we reprocessed observations of the TEMPERA radiometer conducted between 2014 and 2017 with a recently developed and updated temperature retrieval. The new algorithm accounts for the Zeeman effect in the line center for both oxygen emission lines and uses revised a priori information for the state vector and covariances. We demonstrate with the new retrievals that TEMPERA temperature soundings can be carried out nearly continuously and with an increased altitude coverage by leveraging the updated radiative transfer model (ARTS) and HITRAN quantum numbers, which were not available previously.
We validated the retrieved temperature against the MERRA2 reanalysis and the meteorological analysis NAVGEM-HA for the years 2014-2017. Seasonal Person correlations coefficients remained between 0.85-0.95 between 20-55 km altitude. Therefore, we conclude that considering the Zeeman effect in the line center together with the revised a priori information resulted in an extended altitude coverage of about 8-10 km compared to the previous algorithm applied to the same TEMPERA measurements while sustaining the temporal resolution (Stähli et al., 2013;Navas-Guzmán et al., 2017).
Furthermore, we assessed the correlation coefficients and mean biases for monthly median temperature profiles of TEMPERA and the validation data MERRA2 and NAVGEM-HA. We obtained correlation values between 0.8-0.96 throughout the course of the year. The smallest correlations are found in January and February during strong planetary wave activity or for stratospheric warming evens, which supports that high-quality local observations could still provide a benefit to studying dynamical processes in more detail. The months from April to September reached correlations between 0.94-0.96. The mean temperature bias between MERRA2 and the radiometric temperatures was smaller than ±1 K and basically vanished for some months. However, the comparison to NAVGEM-HA resulted in a cold bias between 1-2 K for the TEMPERA temperatures.
Altitude-dependent differences were examined by computing temperature residuals of TEMPERA and both model data sets. We identified that the lower altitudes between 20-35 km tend to exhibit a warm bias of approximately 5 K for the radiometer, and from 35-50 km we found a systematic cold bias of approximately 5 K for TEMPERA compared to MERRA2 and NAVGEM-HA. Above 50 km altitude, MERRA2 and NAVGEM-HA also start to show some discrepancies in the vertical temperature structure. During strong planetary wave activity in the winter months the differences between MERRA2, NAVGEM-HA, and TEM-PERA exceed ±10 K. However, it remains unclear whether these biases or differences are due to the instrument or due to the sparsity of the assimilated data in the models, which might be not sufficient to capture all dynamical details.

Appendix B: Comparison between MERRA2 and NAVGEM-HA
In Fig. B1 we show a differences between MERRA2 and NAVGEM-HA. Both models exhibit a very good agreement up to an altitude of 53 km. Above 53 km a systematic difference is obvious. NAVGEM-HA tends to show larger temperatures compared to MERRA2. However, we did not investigate the nature for the increasing discrepancy between both models above this altitude, which is beyond the scope of the paper. A detailed overview of the assimilated data sets and altitude range can be found in Eckermann et al. (2018) for NAVGEM-HA and for MERRA2 in Gelaro et al. (2017). Figure B1. Absolute temperature difference between the MERRA2 and NAVGEM-HA datasets shows seasonal patterns above 45 km. The color scale is the same as in Fig. 9. Figure B2. Correlation coefficients between the MERRA2 and NAVGEM-HA datasets. Correlation coefficients over all altitudes are above 0.9.
Author contributions. WK and GS conceptualized the content of the manuscript. WK implemented the retrieval and performed the data analysis of TEMPERA observations. GS reduced the MERRA2 and NAVGEM-HA data for the validation. AM, FNG, and DK guided and supported the preparation of the paper. All authors contributed to the editing of the paper.
Competing interests. The contact author has declared that neither they nor their coauthors have any competing interests Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. The authors acknowledge the European Centre for Medium-Range Weather Forecasts (ECMWF) for the data supplied. This research has been supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. 200021-200517/1), and the Swiss Polar Institute (SPI) supports the development of the TEMPERA-C radiometer. Francisco Navas-Guzma´n received funding from the Ramon y Cajal program (ref. RYC2019-027519-I) of the Spanish Ministry of Science and Innovation. We thank the ARTS developer team for their support in implementing the Zeeman effect into the retrievals. Scientific color maps (Crameri et al., 2020) are used in this study to prevent visual distortion of the data and exclusion of readers with color-vision deficiencies.
Financial support. This research has been supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. 200021-200517/1), Swiss Polar Institute (SPI), and Ramon y Cajal program of the Spanish Ministry of Science and Innovation (grant no. RYC2019-027519-I).
Review statement. This paper was edited by Christian von Savigny and reviewed by two anonymous referees.