A neural-network-based method, quantile regression neural networks (QRNNs), is proposed as a novel approach to estimating the a posteriori distribution of Bayesian remote sensing retrievals. The advantage of QRNNs over conventional neural network retrievals is that they learn to predict not only a single retrieval value but also the associated, case-specific uncertainties. In this study, the retrieval performance of QRNNs is characterized and compared to that of other state-of-the-art retrieval methods. A synthetic retrieval scenario is presented and used as a validation case for the application of QRNNs to Bayesian retrieval problems. The QRNN retrieval performance is evaluated against Markov chain Monte Carlo simulation and another Bayesian method based on Monte Carlo integration over a retrieval database. The scenario is also used to investigate how different hyperparameter configurations and training set sizes affect the retrieval performance. In the second part of the study, QRNNs are applied to the retrieval of cloud top pressure from observations by the Moderate Resolution Imaging Spectroradiometer (MODIS). It is shown that QRNNs are not only capable of achieving similar accuracy to standard neural network retrievals but also provide statistically consistent uncertainty estimates for non-Gaussian retrieval errors. The results presented in this work show that QRNNs are able to combine the flexibility and computational efficiency of the machine learning approach with the theoretically sound handling of uncertainties of the Bayesian framework. Together with this article, a Python implementation of QRNNs is released through a public repository to make the method available to the scientific community.

The retrieval of atmospheric quantities from remote sensing measurements constitutes an inverse problem that generally does not admit a unique, exact solution. Measurement and modeling errors, as well as limited sensitivity of the observation system, preclude the assignment of a single, discrete solution to a given observation. A meaningful retrieval should thus consist of a retrieved value and an estimate of uncertainty describing a range of values that are likely to produce a measurement similar to the one observed. However, even if a retrieval method allows for explicit modeling of retrieval uncertainties, their computation and representation are often possible only in an approximate manner.

The Bayesian framework provides a formal way of handling the ill-posedness of
the retrieval problem and its associated uncertainties. In the Bayesian
formulation

For a given retrieval, the a posteriori
distribution can generally not be expressed in closed form, and different
methods have been developed to compute approximations to it. In cases that
allow a sufficiently precise and efficient simulation of the measurement, a
forward model can be used to guide the solution of the inverse problem. If
such a forward model is available, the most general technique to compute the
a posteriori distribution is Markov chain Monte Carlo (MCMC) simulation. MCMC
denotes a set of methods that iteratively generate a sequence of samples,
whose sampling distribution approximates the true a posteriori distribution.
MCMC simulations have the advantage of allowing the estimation of the a
posteriori distribution without requiring any simplifying assumptions on a
priori knowledge, measurement error or the forward model. The
disadvantage of MCMC simulation is that each retrieval requires a
high number of forward-model evaluations, which in many cases makes the
method computationally too demanding to be practical. For
remote sensing retrievals, the
method is therefore of interest rather for testing and validation

A method that avoids costly forward-model evaluations during the retrieval
has been proposed by

The optimal estimation method

Compared to the Bayesian retrieval methods discussed above, machine learning
provides a more flexible approach to learning computationally efficient retrieval
mappings directly from data. Large amounts of data available from simulations,
collocated observations or in situ measurements, as well as increasing computational
power to speed up the training, have made machine learning techniques
an attractive alternative to approaches based on (Bayesian) inverse modeling.
Numerous applications of machine learning regression methods to retrieval
problems can be found in the recent literature

In this article, quantile regression neural networks (QRNNs) are proposed as a
method to use neural networks to estimate the a posteriori distribution of
remote sensing retrievals. Originally proposed by

A formal description of QRNNs and the retrieval methods against which they
will be evaluated is provided in Sect.

This section introduces the Bayesian retrieval formulation and the retrieval methods used in the subsequent experiments. Two Bayesian methods, Markov chain Monte Carlo simulation and Bayesian Monte Carlo integration, are presented. Quantile regression neural networks are introduced as a machine learning approach to estimating the a posteriori distribution of Bayesian retrieval problems. The section closes with a discussion of the statistical metrics that are used to compare the methods.

The general problem considered here is the retrieval of a scalar quantity

Bayesian retrieval methods are methods that use the expression for the a
posteriori distribution in Eq. (

MCMC simulation denotes a set of methods for the
generation of samples from arbitrary posterior distributions

The BMCI method is based on the use of importance sampling to approximate
integrals over the a posteriori distribution of a given retrieval case. Consider an
integral of the form

Neglecting uncertainties, the retrieval of a quantity

Machine learning regression models are trained using supervised training, in
which the model

Given the cumulative distribution function

By training a machine learning regressor

A neural network computes a vector of output activations

Adversarial training is a data augmentation technique that has been proposed
to increase the robustness of neural networks to perturbations in the input
data

A problem that remains is how to compare two estimates

The quantile loss function given in Eq. (

The scoring rules presented above evaluate probabilistic predictions against a single observed value. However, since MCMC simulations can be used to approximate the true a posteriori distribution to an arbitrary degree of accuracy, the probabilistic predictions obtained from BMCI and QRNN can be compared directly to the a posteriori distributions obtained using MCMC. In the idealized case where the modeling assumptions underlying the MCMC simulations are true, the sampling distribution obtained from MCMC will converge to the true posterior and can be used as a ground truth to assess the predictions obtained from the other methods.

Calibration plots are a graphical method for assessing the calibration of
prediction intervals derived from probabilistic predictions. For a set of
prediction intervals with probabilities

Example of a calibration plot displaying calibration curves for overly confident predictions (dark gray), well-calibrated predictions (red) and overly cautious predictions (blue).

In this section, a simulated retrieval of column water vapor (CWV) from passive microwave observations is used to benchmark the performance of BMCI and QRNN against MCMC simulation. The retrieval case has been set up to provide an idealized but realistic scenario in which the true a posteriori distribution can be approximated using MCMC simulation. The MCMC results can therefore be used as the reference to investigate the retrieval performance of QRNNs and BMCI. Furthermore the influence of different hyperparameters on the performance of the QRNN, as well as how the size of the training set and retrieval database impact the performance of QRNNs and BMCI, is investigated.

For this experiment, the retrieval of CWV from passive
microwave observations over the ocean is considered. The state of the
atmosphere is represented by profiles of temperature and water vapor
concentrations on 15 pressure levels between

The Atmospheric Radiative Transfer Simulator (ARTS;

Observation channels used for the synthetic retrieval of column water vapor.

The simulations take into account only absorption and emission from water
vapor. Ocean surface emissivities are computed using the FASTEM-6 model

The MCMC retrieval is based on a Python implementation of the Metropolis
algorithm

The MCMC retrieval is performed in the space of atmospheric states described
by the profiles of temperature and the logarithm of water vapor
concentrations. The multivariate Gaussian distribution that has been obtained
by fit to the ERA-Interim data is taken as the a priori distribution. A random
walk is used as the proposal distribution, with its covariance matrix taken as
the a priori covariance matrix. A single MCMC retrieval consists of eight
independent runs, initialized with different random states sampled from the a
priori distribution. Each run starts with a warm-up phase followed by an
adaptive phase during which the covariance matrix of the proposal distribution
is scaled adaptively to keep the acceptance rate of proposed states close to
the optimal 21 %

The implementation of quantile regression neural networks is based on the
Keras Python package for deep learning

For the training of quantile regression neural networks, the quantile loss
function

Custom data generators have been added to the implementation to incorporate information on measurement uncertainty into the training process. If the training data are noise free, the data generator can be used to add noise to each training batch according to the assumptions on measurement uncertainty. The noise is added immediately before the data are passed to the neural network, keeping the original training data noise free. This ensures that the network does not see the same noisy training sample twice during training, thus counteracting overfitting.

An adaptive form of stochastic batch gradient descent is used for the neural network training. During the training, loss is monitored on a validation set. When the loss on the validation set has not decreased for a certain number of epochs, the training rate is reduced by a given reduction factor. The training stops when a predefined minimum learning rate is reached.

The reconstruction of the CDF from the estimated quantiles is obtained by using the quantiles as nodes of a piecewise linear approximation and extending the first and last segments out to 0 and 1, respectively. This approximation is also used to compute the CRPS on the test data.

The BMCI method has likewise been implemented in Python and added to the
typhon package. In addition to retrieving the first two moments of the
posterior distribution, the implementation provides functionality to
retrieve the posterior CDF using Eq. (

Just as with common neural networks, QRNNs have several hyperparameters that cannot be learned directly from the data but need to be tuned independently. For this study the dependence of the QRNN performance on its hyperparameters has been investigated. The results are included here as they may be a helpful reference for future applications of QRNNs.

For this analysis, hyperparameters describing the structure of the QRNN model
are investigated separately from training parameters. The hyperparameters
describing the structure of the QRNN are

the number of hidden layers,

the number of neurons per layer,

the type of activation function.

the batch size used for stochastic batch gradient descent,

the minimum learning rate at which the training is stopped,

the learning rate decay factor,

the number of training epochs without progress on the validation set before the learning rate is reduced.

To investigate the influence of hyperparameters 1–3 on the performance of
the QRNN, 10-fold cross validation on the training set consisting of

Mean validation set loss (solid lines) and standard deviation (shading)
of different hyperparameter configurations with respect to layer width (number of neurons).
Different lines display the results for different numbers of hidden layers

For the optimization of training parameters 4–7, a very coarse grid
search was performed, using only three different values for each parameter.
In general, the training parameters showed only little effect (

In this section, the performance of a single QRNN and an ensemble of 10 QRNNs
is analyzed. The predictions from the ensemble are obtained by averaging the
predictions from each network in the ensemble. All tests in this subsection are
performed for a single QRNN, the ensemble of QRNNs and BMCI. The retrieval database used
for BMCI and the training of the QRNNs in this experiment consists of

Figure

In the displayed cases, both methods are generally successful in predicting the
a posteriori distribution. Only for the

Retrieved a posteriori CDFs obtained using MCMC (gray), BMCI (blue), a single QRNN (red line) and an ensemble of QRNNs (red marker). Cases displayed in the first row correspond to the 1st, 50th, 90th and 99th percentiles of the distribution of the Kolmogorov–Smirnov statistic of BMCI compared to the MCMC reference. The second row displays the same percentiles of the distribution of the Kolmogorov–Smirnov statistic of the single-QRNN predictions compared to MCMC.

Another way of displaying the estimated a posteriori distribution is by means
of its probability density function (PDF), which is defined as the derivative
of its CDF. For the QRNN, the PDF is approximated by simply deriving
the piecewise linear approximation to the CDF and setting the boundary values
to zero. For BMCI, the a posteriori PDF can be approximated using a histogram of the
CWV values in the database weighted by the corresponding weights

Retrieved a posteriori PDFs corresponding to the CDFs displayed
in Fig.

To obtain a more comprehensive view on the performance of QRNNs and BMCI,
the predictions obtained from both methods are compared to those obtained
from MCMC for 6500 test cases. For the comparison, let the

Distribution of effective quantile fractions

For an ideal estimator of the quantile

Finally, we investigate how the size of the training data set used in the training
of the QRNN (or as a retrieval database for BMCI) affects the performance of the
retrieval method. This has been done by randomly generating training subsets
from the original training data with sizes logarithmically spaced between

Figure

MAPE

Finally, the mean of the quantile loss

Mean quantile loss for different training set sizes

The results presented in this section indicate that QRNNs can, at least under idealized conditions, be used to estimate the a posteriori distribution of Bayesian retrieval problems. Moreover, they were shown to work equally well as BMCI for large data sets. What is interesting is that, for smaller data sets, QRNNs even provide better estimates of the a posteriori distribution than BMCI. This indicates that QRNNs provide a better representation of the functional dependency of the a posteriori distribution on the observation data, thus achieving better interpolation in the case of scarce training data. Nonetheless, it remains to be investigated if this advantage can also be observed for real-world data.

A possible approach to handling scarce retrieval databases with BMCI is to artificially increase the assumed measurement uncertainty. This has not been performed for the BMCI results presented here and may improve the performance of the method. The difficulty with this approach is that the method formulation is based on the assumption of a sufficiently large database and thus can, at least formally, not handle scarce training data. Finding a suitable way to increase the measurement uncertainty would thus require either additional methodological development or invention of a heuristic approach, both of which are outside the scope of this study.

In this section, QRNNs are applied to retrieve cloud top pressure (CTP) using
observations from the Moderate Resolution Imaging Spectroradiometer (MODIS;

The QRNN uses the same data for training as the reference NN-CTTH algorithm.
The data set consists of MODIS Level 1B data

The same training scheme as described in Sect.

The main difference in the training process compared to the experiment from
the previous section is how measurement uncertainties are incorporated. For
the simulated retrieval, the training data was noise free, so measurement
uncertainties could be realistically represented by adding noise according to
the sensor characteristics. This is not the case for MODIS observations;
instead, adversarial training is used here to ensure well-calibrated
predictions. For the tuning of the perturbation parameter

Calibration of the QRNN prediction intervals on the validation set
used during training. The curves display the results for no adversarial training
(

Except for the use of adversarial training, the structure of the underlying network and the training process of the QRNN are fairly similar to what is used for the NN-CTTH retrieval. The QRNN uses four instead of two hidden layers with 64 neurons in each of them instead of 30 in the first and 15 in second layer. While this makes the neural network used in the QRNN slightly more complex, this should not be a major drawback since computational performance is generally not critical for neural network retrievals.

Most data analysis will likely require a single predicted value for the cloud top pressure. To derive a point value from the QRNN prediction, the median of the estimated a posteriori distribution is used.

The distributions of the resulting median pressure values on the

Figure

Even though both the QRNN and the NN-CTTH retrieval use the same input and training data, the predictions from both retrievals differ considerably. Using the Bayesian framework, this can likely be explained by the fact that the two retrievals estimate different statistics of the a posteriori distribution. The NN-CTTH algorithm has been trained using a squared error loss function which will lead the algorithm to predict the mean of the a posteriori distribution. The QRNN retrieval, on the other hand, predicts the median of the a posteriori distribution. Since the median minimizes the expected absolute error, it is expected that the CTP values predicted by the QRNN yield overall smaller errors.

Distributions of predicted CTP values

Error distributions of predicted CTP values

The NN-CTTH algorithm retrieves CTP but does not provide case-specific uncertainty estimates. Instead, an estimate of uncertainty is provided in the form of the observed mean absolute error (MAE) on the test set. In order to compare these uncertainty estimates with those obtained using QRNNs, Gaussian error distributions are fitted to the observed error based on the observed MAE and mean squared error (MSE). A Gaussian error model is chosen here as it is arguably the most common distribution used to represent random errors.

A plot of the errors observed on the testing-during-development data
set and the fitted Gaussian error distributions is displayed in panel a of
Fig.

The Gaussian error model based on the MAE fit has also been used to produce
prediction intervals for the CTP values obtained from the NN-CTTH algorithm.
Figure

Predicted and observed error distributions. Panel

Calibration plot for prediction intervals derived from the Gaussian error model for the NN-CTTH algorithm (blue), the single QRNN (dark gray) and the ensemble of QRNNs (red).

As shown above, the predictions obtained from the QRNN are statistically
consistent in the sense that they predict probabilities that match observed
frequencies when applied to test data. This, however, requires that the test data
are statistically consistent with the training data. Statistically consistent
here means that both data sets come from the same generating distribution or, in
more Bayesian terms, the same a priori distribution. What happens when this is
not the case can be seen when the calibration with respect to different cloud
types is computed. Figure

For the NN-CTTH algorithm, the results look different. While for low clouds the calibration deteriorates, the calibration is even slightly improved for high clouds. This is not surprising as the Gaussian fit may be more appropriate on different subsets of the test data.

In this article, quantile regression neural networks have been proposed as a method to estimate a posteriori distributions of Bayesian remote sensing retrievals. They have been applied to two retrievals of scalar atmospheric variables. It has been demonstrated that QRNNs are capable of providing accurate and well-calibrated probabilistic predictions in agreement with the Bayesian formulation of the retrieval problem.

The synthetic retrieval case presented in Sect.

While the optimization of computational performance of the BMCI method has not been investigated in this work, at least compared to a naive implementation of BMCI, QRNNs allow for retrievals that are at least 1 order of magnitude faster. QRNN retrievals can be easily parallelized, and hardware optimized implementations are available for all modern computing architectures, thus providing very good performance out of the box.

Based on these very promising results, the next step in this line of research should be to compare QRNNs and BMCI on a real retrieval case to investigate if the findings from the simulations carry over to the real world. If this is the case, significant reductions in the computational cost of operational retrievals and maybe even better retrieval performance could be achieved using QRNNs.

Calibration of the prediction intervals obtained from NN-CTTH (blue) and a single QRNN (red) with respect to specific cloud types.

In the second retrieval application presented in this article, QRNNs have been used to retrieve cloud top pressure from MODIS observations. The results show that not only are QRNNs able to improve upon state-of-the-art retrieval accuracy, but they can also learn to predict retrieval uncertainty. The ability of QRNNs to provide statistically consistent, case-specific uncertainty estimates should make them a very interesting alternative to non-probabilistic neural network retrievals. Nonetheless, the sensitivity of the QRNN approach to a priori assumptions has also been demonstrated. The posterior distribution learned by the QRNN depends on the validity of the a priori assumptions encoded in the training data. In particular, accurate uncertainty estimates can only be expected if the retrieved observations follow the same distribution as the training data. This, however, is a limitation inherent to all empirical methods.

The second application case presented here demonstrated the ability of QRNNs
to represent non-Gaussian retrieval errors. While, as shown in this study,
this is also the case for BMCI (Eq.

The application of the Bayesian framework to neural network retrievals opens the
door to a number of interesting applications that could be pursued in future
research. It would for example be interesting to investigate if the a priori
information can be separated from the information contained in the retrieved
measurement. This would make it possible to remove the dependency of the
probabilistic predictions on the a priori assumptions, which can currently be
considered a limitation of the approach. Furthermore, estimated a posteriori
distributions obtained from QRNNs could be used to estimate the information
content in a retrieval following the methods outlined by

In this study only the retrieval of scalar quantities was considered. Another aspect of the application of QRNNs to remote sensing retrievals that remains to be investigated is how they can be used to retrieve vector-valued retrieval quantities, such as concentration profiles of atmospheric gases or particles. While the generalization to marginal, multivariate quantiles should be straightforward, it is unclear whether a better approximation of the quantile contours of the joint a posteriori distribution can be obtained using QRNNs.

The implementation of the retrieval methods that were used
in this article has been published as parts of the

All authors contributed to the study through discussion and feedback. PE and BR proposed the application of QRNNs to remote sensing retrievals. The study was designed and implemented by SP, who also prepared the manuscript including figures, text and tables. AT and NH provided the training data for the cloud top pressure retrieval.

The authors declare that they have no conflict of interest.

The scientists at Chalmers University of Technology were funded by the Swedish National Space Board.

The authors would like to acknowledge the work of Ronald Scheirer and Sara Hörnquist, who were involved in the creation of the collocation data set that was used as training and test data for the cloud top pressure retrieval.

Numerous free software packages were used to perform the numerical
experiments presented in this article and visualize their results. The
authors would like to acknowledge the work of all the developers who
contributed to making these tools freely available to the scientific
community, in particular the work by