Global tropospheric ozone column retrievals from OMI data by means of neural networks

In this paper, a new neural network (NN) algorithm to retrieve the tropospheric ozone column from Ozone Monitoring Instrument (OMI) Level 1b data is presented. Such an algorithm further develops previous studies in order to improve the following: (i) the geographical coverage of the NN, by extending its training set to ozonesonde data from midlatitudes, tropics and poles; (ii) the definition of the output product, by using tropopause pressure information from reanalysis data; and (iii) the retrieval accuracy, by using ancillary data (NCEP tropopause pressure and temperature profile, monthly mean tropospheric ozone column from a satellite climatology) to better constrain the tropospheric ozone retrievals from OMI radiances. The results indicate that the algorithm is able to retrieve the tropospheric ozone column with a root mean square error (RMSE) of about 5–6 DU in all the latitude bands. The design of the new NN algorithm is extensively discussed, validation results against independent ozone soundings and chemistry/transport model (CTM) simulations are shown, and other characteristics of the algorithm – i.e., its capability to detect non-climatological tropospheric ozone situations and its sensitivity to the tropopause pressure – are discussed.


Introduction
Ozone is one of the most important trace gases in the Earth's atmosphere.Ozone is most abundant in the stratosphere, where it shields the troposphere from harmful ultraviolet radiation.In the troposphere, ozone acts as a precursor of the hydroxyl (OH) radical, which is able to remove pollutants from the troposphere via oxidation reactions (Jacob, 1999).Furthermore, tropospheric ozone is a pollutant itself, since it is harmful for the biosphere when it reaches high concentrations near the Earth's surface (Heck et al., 1982;Lippmann, 1989).Finally, tropospheric ozone acts as a greenhouse gas (Shindell et al., 2006).
Tropospheric ozone variations may occur over relatively small spatial scales.Concentrations of tropospheric ozone are affected by several factors.First, they depend on the concentrations of its precursors -namely, nitrogen oxides (NO x ), carbon monoxide (CO) and volatile organic compounds (VOCs) -which are either emitted as a consequence of human activities or due to natural causes (e.g., lightning, which produces NO x ).Since tropospheric ozone is produced from its precursors via photochemical reactions (Chameides and Walker, 1973), the intensity of the solar radiation reaching the troposphere is another important factor.A further source of tropospheric ozone is the downward transport of air rich in ozone from the stratosphere, during the so called stratosphere-troposphere exchange (STE) (Holton et al., 1995).This process is particularly significant at midlatitudes (see, e.g., Shapiro, 1980).Long-range transport of tropospheric ozone and its precursors also affects its spatial distribution (Carmichael et al., 1998;Creilson et al., 2003).
Monitoring tropospheric ozone using satellite instruments is important in order to obtain a global picture of its distribution.However, several difficulties are encountered in inferring tropospheric ozone concentrations from satellite observations.First, the contribution of tropospheric ozone to the measured radiances is much weaker than the contribution coming from stratospheric ozone.Second, current ultraviolet or thermal infrared measurements have usually a reduced sensitivity to lower tropospheric ozone (Natraj et al., 2011, and references therein).
The first attempts to derive information on tropospheric ozone from satellite observations date back to the 1980s.Fishman et al. (1986Fishman et al. ( , 1987) ) first suggested that total ozone measurements made from the Total Ozone Mapping Spectrometer (TOMS) could contain information on cases of enhanced tropospheric ozone.In the first algorithms for quantitative tropospheric ozone retrievals, the information on tropospheric ozone was obtained by subtracting a stratospheric ozone column measurement from a colocated total ozone measurement.The stratospheric ozone column was estimated from limb observations (Fishman and Larsen, 1987;Fishman, 2000, and references therein) or from ozone column measurements above high convective clouds (Ziemke et al., 1998(Ziemke et al., , 2001;;Ahn et al., 2003;Newchurch et al., 2003) or high mountains (Jiang and Yung, 1996;Kim and Newchurch, 1996;Newchurch et al., 2001).An alternative approach, specifically designed for TOMS observations, was to directly infer tropospheric ozone information based on the dependence of TOMS total ozone columns on the scan angle of the instrument (Kim et al., 1996(Kim et al., , 2001(Kim et al., , 2004)).
More recently, after the development of new satellite instruments, with hyperspectral measurement capabilities, the direct determination of tropospheric ozone from the UV/VIS part of the spectrum has become feasible (Munro et al., 1998;Liu et al., 2005Liu et al., , 2006Liu et al., , 2010)).Nevertheless, residual techniques similar to those described above for TOMS have been also applied to hyperspectral data.For instance, Valks et al. (2003) developed a cloud-slicing algorithm for the Global Ozone Monitoring Experiment (GOME), whereas nadir-limb residual techniques have been used by Ziemke et al. (2006), Schoeberl et al. (2007), and Yang et al. (2007) to estimate tropospheric ozone column by subtracting Microwave Limb Sounder (MLS) limb stratospheric ozone columns from Ozone Monitoring Instrument (OMI) total ozone columns.
Another possibility to directly retrieve tropospheric ozone from satellite hyperspectral observations is the application of neural networks (NNs).NN algorithms for tropospheric ozone retrievals from OMI and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) have been recently developed (Sellitto et al., 2011(Sellitto et al., , 2012, respectively), respectively).In particular, Sellitto et al. (2011) developed an algorithm to retrieve tropospheric ozone from OMI data at northern midlatitudes, named the OMI-TOC NN.The algorithm yields daily estimates of the tropospheric ozone column from the surface to 200 hPa at the northern midlatitudes by using OMI reflectance spectra, solar zenith angle (SZA) at 19 wavelengths, and the total ozone column from the OMI-TOMS total ozone (OMTO3) Level 2 product (Bhartia and Wellemeyer, 2002).The performances of the OMI-TOC NN algorithm were shown to be comparable with those of the physics-based algorithms of Schoeberl et al. (2007) and Liu et al. (2010) by means of a validation exercise with ozonesonde measurements at northern midlatitudes, with root mean square (RMS) errors around 8 DU and correlation coefficients around 0.60 between the actual and the retrieved tropospheric ozone columns (Sellitto et al., 2011). Di Noia et al. (2013) further validated the OMI-TOC NN over a number of European ozonesonde stations, finding similar results, and pointing out the possible presence of a negative bias in the OMI-TOC NN in cases of low tropopauses (tropopause pressures larger than approximately 250 hPa).
The main limitations of the OMI-TOC NN algorithm are its coverage, which is limited at the northern midlatitudes, and the choice to use the 200 hPa level as the upper integration limit for the retrieved ozone columns, regardless of the actual tropopause conditions.In particular, this latter feature raises the question of whether it is legitimate to say that the retrieved ozone columns are "tropospheric" since even at midlatitudes the actual tropopause pressure can be very different from 200 hPa (Hoinka, 1998).In order to overcome this problem, a preprocessed tropopause height can be used as upper integration limit for the retrieved ozone columns.By doing so, it is possible to produce estimates that represent the actual "tropospheric" ozone column more realistically.For this study the thermal tropopause given by the National Center for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) Reanalysis (Kalnay et al., 1996) has been used.
In this paper the results of an improved NN algorithm for tropospheric ozone retrieval are presented.The improvements can be summarized as follows: (i) the geographical coverage of the algorithm is extended to the entire globe, whereas the OMI-TOC NN was limited to the northern midlatitudes; (ii) an estimate of the ozone column from the surface to the NCEP/NCAR tropopause is produced; (iii) a number of ancillary data are used as additional inputs for the algorithm in order to better constrain the retrieval problem; (iv) the observation geometry is better parameterized in the input vector by including the viewing zenith angle (VZA) and the terrain height; (v) the TOMS total ozone column is not used anymore in the input vector so as to make the new algorithm independent from other ozone products.The main differences between the two algorithms are summarized in Table 1.Besides these three key points, a number of additional technical issues are addressed in the preprocessing of OMI radiance and irradiance spectra (namely, several refinements were introduced in the data quality control and filtering routines).Furthermore a different input dimensionality reduction strategy is adopted, with a simple linear principal component analysis (PCA) used instead of the extended pruning (EP) technique.The new algorithm will be henceforth referred to as OMITROPO3-NN.
The paper is organized as follows.In Sect. 2 a general description of NNs is given, with a particular focus on their use in the context of inverse problems; in Sect. 3 the generation of the OMITROPO3-NN dataset is described and all the preprocessing steps are discussed; in Sect.

Basic concepts and terminology
NNs can be considered as algorithms for nonlinear regression and function approximation.Although several types of NNs can be devised, they share a number of common characteristics: (i) the computation is distributed among elementary units (called neurons), and (ii) the relationship to be approximated is learned by the NN from a training dataset.
Mathematically, it can be said that an NN can be used to approximate an unknown relationship between two quantities x ∈ R n and y ∈ R m through a nonlinear model W such that where W is a set of free parameters to be adjusted from a training dataset.In the case of supervised training, which is the only relevant case for the purposes of this work, the training dataset is made of pairs (x i , y i ) of instances of the relationship to be approximated.The adjustment of the free parameters is made according to a learning algorithm, which basically consists of an iterative minimization of an error cost function of the kind with respect to W. According to the exact definition of the cost function and to the choice of the iterative method chosen for its minimization, several learning algorithms can be defined.The reader can refer to Bishop (1995) or Haykin (1999) for more detailed information.

Multilayer perceptrons
The multilayer perceptron (MLP) network (Werbos, 1974) is one of the most widespread NN architectures.Each neuron of an MLP realizes the input-output relationship where w and b are the weight vector and the bias of the neuron, respectively, and are its free parameters to be adjusted, and the function ϕ, chosen in advance, is the activation function of the neuron.The neurons of an MLP are organized in layers: (i) an input layer, which simply contains the input vector of the MLP; (ii) at least one hidden layer, containing neurons with nonlinear activation functions; and (iii) an output layer, whose neurons can either have linear or nonlinear activation functions and yield the output of the MLP.The output of each layer is the input for the next layer.
One reason for the popularity of MLPs among supervised NN techniques is their universal approximation capability: several studies have independently shown that, under rather general conditions, every continuous function on a compact set can be approximated to whatever accuracy by a MLP having only one hidden layer (Cybenko, 1989;Funahashi, 1989;Hornik et al., 1989).However, it must be pointed out that the universal approximation theorems only prescribe the existence of an approximating NN, but they do not indicate how such NN can be found in practice.
Since the MLP is the only relevant architecture in the context of this work, the terms MLP and NN will be used without distinction from here onwards.

Neural networks in retrieval problems
In the context of atmospheric remote sensing, a retrieval problem essentially consists of recovering the value of an atmospheric quantity (state) x from a set of radiometric measurements y.Such problems are usually ill posed, i.e., they cannot be solved by simply inverting a physical model of the measurements because the relationship between x and y is not bijective (Twomey, 1977;Tikhonov and Arsenin, 1977).In other words, simply solving with respect to x an equation of the kind where the function F represents the physics of the measurement process and b is a fixed vector of model parameters (i.e., quantities different from x which affect y), would not lead to an unique solution for x, even in the case of noise-free measurements.Instead, a space of possible solutions for x would be compatible with a single measurement vector y.This happens because of two concurrent reasons: (i) the elements of the measurement vector y are not mutually independent; (ii) the existence of measurement errors usually leads to unstable solutions of the retrieval problem.Therefore, the aim of a retrieval algorithm is to select, among a set of possible solutions for the state x, an "optimal" solution that is used as an estimator for the true state x.Two widespread approaches to address this issue are regularization and optimal estimation (OE) methods.
Regularization consists in a least mean square estimate, where the difference between actual measurements y and predicted measurements F(x, b) is minimized with respect to x with an arbitrary constraint q(x) measuring the degree of "smoothness" of the solution.Several choices can be made for q(x) -see, e.g., Doicu et al. (2010) -and the cost function to be minimized has the form where γ is a multiplicative term that weights the importance of the constraint with respect to the difference between actual and predicted observations.Of course, setting γ = 0 would mean not to use any constraint, and setting γ → ∞ would be equivalent to ignoring the measurements.One popular form of the regularization constraint is x T Hx, where H is a smoothing matrix.
In the OE approach (Rodgers, 2000), assumptions are made about the statistical properties of the state x to be retrieved and the measurement error .It is often assumed that both quantities follow Gaussian statistics, with mean values x a and 0, and covariance matrices S a and S , respectively.A model of the measurement process F is used to transform the probability density function (PDF) of x into the conditional PDF P y|x (y|x).Then, an a posteriori PDF P x|y (x|y) is obtained according to the classical Bayesian theory, and it is maximized with respect to x to yield a parametric estimator for x, the term "parametric" being used to indicate that a specific form for the PDFs and their parameters is assumed for the optimality condition to hold.The general form for the OE cost function to be minimized, under the assumption of Gaussian statistics, is (Rodgers, 2000) The subscripts y|x and x|y are used here to distinguish between the functional forms of the two PDFs.Given that F is a nonlinear function in most of the practical cases, its minimization is usually performed through iterative methods, such as Gauss-Newton or Levenberg-Marquardt. NN retrievals can be regarded as a non-parametric alternative to OE.The training set for a NN to be used in a retrieval algorithm consists of pairs (y i ; x i ), where the vector y i includes the measurements y i and any other parameter that is used as an input for the algorithm (e.g., geometrical parameters, ancillary data), and the x i includes the quantities to be retrieved.The training set can be seen as a set of samples drawn from the PDF P (x|y ).These samples are used to adjust the parameters of a model of the same kind as Eq. ( 1), minimizing a cost function similar to Eq. ( 2).Once the training is complete, a global retrieval model is constructed, where W * is the value of W determined at the end of the training process.This retrieval model yields a nonparametric estimator for x -here denoted by x -meaning that no assumptions about the statistical distribution of x are made to specify the model.The "global" adjective refers to the fact that, once the training phase is complete, the resulting function W * can be applied to every observation in order to obtain the retrieval.This is a difference between NNs and the aforementioned retrieval techniques, where the cost function has to be minimized for each observation.
NN retrieval algorithms have a number of advantages over other methods: (i) when the training set consists of real data, the absence of explicit modeling makes the retrieval insensitive to an incomplete knowledge of the measurement physics; (ii) the absence of assumptions about the statistical distribution of the quantity to be retrieved makes NNs robust to non-Gaussianity of the modeled processes (Blackwell and Chen, 2009); and (iii) NN retrieval schemes are fast and relatively easy to implement.However, NNs have also some disadvantages: (i) when they are trained on real data, the quality of such data is critical for the learning process; (ii) they are good interpolators, but may yield unpredictable results when forced to extrapolate (Krasnopolsky and Schiller, 2003); and (iii) NNs are not optimal estimators -the training process depends on random initialization of the NN parameters and may be trapped in local minima of the cost function to be minimized.Nevertheless, these shortcomings can often be handled with proper design and data quality control procedures.A more subtle shortcoming is that, since the NN training requires the minimization of a global cost function, it is possible that the cost function associated with a single observation could be better minimized with the conventional retrieval techniques.
A debated issue regarding the application of NNs to retrieval problems is the error propagation.A general review of the predictive uncertainty estimation methods for NNs is given by Dybowski and Roberts (2009).Aires et al. (2004a,b) suggest a method to define an error budget for NN retrievals that resembles that developed by Rodgers (1990) for the classical retrieval techniques.Specifically, Aires et al. (2004a) express the error covariance matrix of the retrieval as the sum of a "neural inversion" term, accounting for the effect of the suboptimalities in the NN architecture, the learning process and the training dataset, and an "intrinsic noise" term, accounting for all the other possible sources of error.However, this formulation makes it difficult to isolate sources like the null space error, which is important in order to assess the vertical resolution of a retrieval algorithm.

Neural network design principles
NN models have a relatively large number of free parameters.Some of these parameters -i.e., weights and biasesare determined during the training process, others -i.e., the activation functions, the number of hidden layers and neurons, and the learning algorithm and its internal parameters -must be chosen by the designer.While it would be impossible to discuss every aspect of the design of an NN within this paper (the interested reader is again referred to Bishop, 1995, or Haykin, 1999, for a comprehensive discussion of the heuristics that can be followed), it might be worthwhile to discuss some of the most important design aspects as this should clarify the reasons for some of the choices that were made during the development of the tropospheric ozone retrieval algorithm that is the main subject of this work.
The most critical design issues to be addressed during the development of a NN are the choice number of hidden layers and neurons to be used, and the choice of when to stop the training process.
As for the number of hidden layers and units, there are no universally valid rules, but heuristic methods must be used.Such methods basically consist of comparing different NN architectures on a common reference dataset, and selecting the architecture that achieves the best score in terms of some performance metric.The most elementary metric that may be used is simply the mean squared error (MSE) over the reference set.Other metrics, like the Akaike information criterion (AIC) (Akaike, 1973), combine the MSE with penalty terms for an excessive number of hidden neurons.
One or two hidden layers are often enough for a good NN model (Kecman, 2001).A rule of thumb that can be kept in mind in the selection of the number of hidden units is the "bias-variance dilemma" (Geman et al., 1992).According to this rule, NNs with too few hidden nodes tend to have poor approximation capabilities (large bias, or underfitting), whereas NNs with too many hidden nodes are prone to bad generalization, i.e., poor performances on data which were not seen during the training process (large variance, or overfitting).Therefore, the right choice for the number of hidden units must result from a trade-off between these two extremes.
Another crucial point is to decide when to stop the training of a NN.Although common sense criteria can be easily formulated to decide whether a learning algorithm has converged on a given training set (a typical approach is to fix a certain threshold on the decrease in MSE between two successive iterations of the algorithm, and to decide that the algorithm has converged if such decrease remains below the threshold for a certain number of iterations), it is often not advisable to continue the training process until a convergence criterion is met.In fact, as long as the training proceeds, there is the danger that the NN ends up memorizing the training data, reaching extremely low values of the MSE on the training data, but producing very poor results over data that are not included in the training set.This condition is named "overtraining", or "overfitting".In order to prevent this, the performances of the NN over an independent set should always be monitored during the training process, and the training should be stopped when a significant degradation in the NN performances over this set is observed.This method is called "early stopping cross-validation" (Haykin, 1999).

Definition of the input vector
The list of the input quantities used in the design of the OMITROPO3-NN is shown in  (Levelt et al., 2006).Wavelengths longer than about 335 nm are outside the ozone absorption bands, but have been included because they contain information about aerosols and surface albedo (Kleipool et al., 2008).Furthermore, the observation geometry was taken into account by including the SZA, the VZA and the terrain height in the input vector.The relative azimuth angle (RAA) was not used in the final specification of the algorithm because preliminary experimental work showed that its use does not seem to improve the retrieval performances.
Since the ozone absorption cross sections in the considered spectral range -which covers the ozone Huggins bandsare temperature dependent, the temperature profile from the NCEP/NCAR Reanalysis was used as an additional input.An additional advantage associated with the use of temperature as an input is the possibility of exploiting the correlations between ozone and temperature (Müller et al., 2003).
The tropopause pressure from the NCEP/NCAR Reanalysis was also included in the input vector in order to signal the upper integration limit for the ozone column to be retrieved.Furthermore, the significant positive correlation between tropopause height and the tropospheric ozone column outside the tropics (de Laat et al., 2005) can be exploited in order to regularize the retrieval.The radiative cloud fraction from the OMI rotational Raman scattering (OMCLDRR) product (Joiner and Vasilkov, 2006) was used to account for the enhanced UV radiances that are measured at the longer wavelengths of the considered spectral interval because of the presence of clouds around the field of view (FOV) of the instrument.Using the cloud pressure in the input vector did not improve the retrieval performances; therefore, it was left outside the input vector in the final version of the algorithm.
The choice of using a tropospheric ozone climatological value as an input for the algorithm is worth discussion.The retrieval of tropospheric ozone from UV satellite measurements is strongly ill posed because it is difficult to separate variations in the measured UV spectra caused by ozone variations in the troposphere from variations that are related to changes in stratospheric ozone.Therefore, the information content of radiometric measurements and parameters of the forward problem (i.e., observation geometry, temperature profile, etc.) may be not enough to perform the retrieval.Illposed problems are usually addressed by complementing the satellite measurements with ancillary data, a priori information about the retrieved state and/or regularization constraints (Twomey, 1977;Rodgers, 2000;Doicu et al., 2010).These quantities are used in retrieval algorithms in order to discard solutions of the inverse problem that are extremely unlikely and/or unphysical.As any other retrieval technique, even a neural algorithm can benefit from this kind of information, when available.In the context of neural algorithms, this role is partly played by the target outputs given in the training set as they allow an implicit regularization of the inverse problem by "teaching" the NN to map the radiometric observations into physically meaningful solutions.
However, using this constraint alone may not be enough to account for the local and seasonal variability of the retrieved quantity.This issue can be addressed either by training different NNs -one for each season and/or wide geographical area (e.g., latitude band) -or by introducing an input quantity that gives the NN relevant climatological information.The latter approach was preferred in this work because it leads to a global NN model flexible enough to perform reasonably well in a broad set of situations.Instead, the former approach would have led to specialized NNs, each trained with a reduced number of examples.This would have been especially true for tropics and southern midlatitudes, where the spatial coverage provided by the ozonesonde networks is much sparser than for northern midlatitudes and poles.
In the literature about the NN-based algorithms for satellite retrievals, several ways to include a priori or first-guess information in the input vector have been proposed.For instance, Aires et al. (2001) proposed the use of a first guess in NNs for atmospheric retrievals from microwave observations, while Müller et al. (2003)  When a priori information is used in a retrieval algorithm, the risk of biasing the retrievals towards the a priori should be monitored.This issue is discussed in Sect.5.3.

Geographical coverage and colocation procedure
A comprehensive dataset of colocations between OMI data and ozone soundings was created in order to train the NN and to assess its performances.
The dataset covers the time period from 2004 to 2011, and consists of ozone soundings taken from several sources; the archives of the World Ozone and Ultraviolet Radiation Data Centre (WOUDC), Southern Hemisphere Additional Ozonesondes (SHADOZ) network (Thompson et al., 2003) and  (Tarasick et al., 2010), and data from ozone soundings performed over Italy, provided by the Center for Integration of remote sensing techniques and numerical modeling for the prediction of severe weather (CETEMPS) of L'Aquila University, the Institute of Atmospheric Sciences and Climate (ISAC) of the Italian National Research Council (CNR), and the Italian Air Force Center of Aeronautical Meteorological Experimentation (ReSMA).
The geographical distribution of the ozonesonde stations whose data were used to create the dataset is shown in Fig. 1.
The ozone soundings were colocated with OML1BRUG data according to the overpass info provided by the Aura Validation Data Center (AVDC) for the OMTO3 Level 2 product.The following procedure was followed in performing the colocations: 1.For each ozone sounding, the OML1BRUG files corresponding to the overpass orbits indicated in the AVDC info were selected.
2. For each OML1BRUG file, the OMI pixel having its center closest to the ozonesonde station was selected as a candidate for the colocation.
3. The candidate pixel was discarded if its center and the station coordinates were more than 1 • apart in latitude or longitude or separated by more than 6 h in time.
Such colocation criteria were adopted in order to be reasonably sure that the tropospheric air volumes sampled by OMI were representative of the volume actually covered by the corresponding ozone soundings.

Preprocessing of OMI spectral measurements
The OML1BRUG radiance spectra were converted in topof-atmosphere (TOA) reflectance spectra by normalization to OML1BIRR irradiance spectra and cosine of the SZA.A natural logarithm was then applied to the computed reflectances.The following preprocessing steps were applied in order to compute the TOA reflectance spectra: 1.The quality of each radiance and irradiance spectral pixel was checked with respect to the OMI L1B quality flags, according to the guidelines given in van den Oord and Veefkind (2002).The spectral pixels that failed the quality test were discarded from the subsequent computations.
2. The spectra whose number of discarded wavelengths exceeded the 5 % of the total were discarded, and were not used in the colocation procedure.
3. The radiance and irradiance spectra that survived this screening procedure were linearly interpolated on a 0.1 nm wide common spectral grid.The linear interpolation has been chosen over more sophisticated techniques (e.g., cubic spline interpolation) because it is computationally less demanding.4. The TOA reflectance spectra were computed using the interpolated radiance and irradiance spectra, and the natural logarithm of the resulting values was computed.
As for the quality-flag-based filtering, particular care was taken in order to exclude pixels affected by row anomaly from the dataset.According to the information available from the Royal Dutch Meteorological Institute (KNMI), the row anomaly started to appear on 25 June 2007, affecting the rows 53-54 (0 based) in the OMI across-track direction.After about one year, it expanded to the rows 37-44, and began to assume an erratic behavior after 24 January 2009, randomly affecting subsets of the rows 24-59.Additional information about the row anomaly effect in OMI can be found at the webpage www.knmi.nl/omi/research/product/rowanomaly-background.php.According to this information, the flagging of row anomaly events in the OMI Level 1B products was not complete until 1 February 2010.Therefore, it was decided to exclude from the dataset all the OMI measurements over the rows 24-59 starting from 24 January 2009 in order to be reasonably sure that the test statistics did not contain contaminated pixels.
Apart from the filtering based on the quality flags, other screening actions were performed in order to strengthen the quality of the dataset.Specifically, pixels having cloud fractions larger than 0.3 were discarded.The choice of 0.3 as a threshold for the cloud fraction was made to establish a tradeoff between the need of excluding pixels that are excessively affected by clouds and the need of ensuring an adequate number of samples to train the NN and assess its performance in a wide range of situations.
The spectral interpolation procedure led to logreflectances computed at 351 wavelengths.As pointed out by several studies, the spectral features of UV radiances or reflectances usually exhibit a considerable correlation, and a spectral resolution of 0.1 nm is more than necessary for ozone retrievals (Chance et al., 1997;Richter and Wagner, 2011).Therefore, the information content of the computed log-reflectance spectra can be considerably compressed through a data dimensionality reduction technique.In this work a simple linear PCA was used.In order to choose an appropriate value for the number of principal components (PCs) to retain after the PCA procedure, the error in the reconstruction of the log-reflectance spectra from the compressed spectra was monitored as a function of the number of retained PCs.This procedure led to retainment of 20 PCs since adding further PCs did not improve the reconstruction significantly (reflectance reconstruction RMS error of about 0.01 %).

Training, validation, and test subsets
The colocation procedure described in the previous section has led to the generation of 10 017 input-output pairs.Such pairs were used to train the NN algorithm and assess its performance with data not used during the training phase.The network was trained using only colocations that cover the period from 2004 to 2008.This choice was made in order to set aside enough data to test the NN behavior outside the training period.The dataset was split into four subsets: i. 5489 pairs were used to train the NN; ii. 1737 pairs were used to determine when to stop the training process via early stopping cross-validation (see Sect. 2.4); iii. 2071 pairs were used to evaluate the generalization of the trained NN during the training period; iv.720 pairs were used to evaluate the trained NN generalization outside the training period.
From now on these four datasets will be referred to as D train , D valid , D test1 and D test2 , respectively.The union between D test1 and D test2 will be indicated as D test .
In order to ensure the independence between the datasets, without affecting their comprehensiveness, the data were assigned to each set based on the ozonesonde station they referred to.Stations used in the training dataset were not used for the test and validation datasets.A significant number of colocations pertaining to the different latitudinal bands were present in each subset.

Input preprocessing
The input vector of the OMITROPO3-NN consists of 43 inputs: 20 PCs of the reflectance spectra, SZA, VZA, terrain height, NCEP/NCAR temperature profiles at 17 pressure levels, NCEP/NCAR tropopause pressure, radiative cloud fraction, and monthly mean TCO.A logistic activation function was chosen for the hidden and the output layers of the NN.Before proceeding with the NN training, a further preprocessing step was applied to the input and target data in order to make them compatible with the mathematical properties of the logistic function.Specifically, since the output of the logistic function lies between 0 and 1, a linear scaling between these values was applied to the TCO data.Similarly, all the input data were linearly scaled between −1 and 1 in order to avoid the saturation of the hidden neurons after the initialization of the NN weights.

Training and architecture selection
The NN was trained using the scaled conjugate gradient (SCG) learning algorithm (Møller, 1993).A heuristic procedure, as described in Sect.2.4, was adopted to select the number of hidden layers and neurons.The selected NN architecture has one hidden layer with 5 neurons inside.For this architecture, the training was stopped after about 1000 cycles, using early stopping cross-validation.

Results
The results obtained over the whole D test set are shown in Fig. 2, where the performances of the algorithm are summarized through the mean bias, the root mean square error (RMSE) and the Pearson correlation coefficient between the reference values of TCO and those retrieved by the NN.More detailed insight into the error distribution is given in Fig. 3, where the histograms of the absolute and the relative differences between the retrieved and the "true" TCOs, respectively, are shown, together with some of the relevant statistical parameter.It can be seen that the retrievals have a small bias (0.31 DU), and that the error histograms are fairly symmetrical (skewness of −0.41 for the absolute differences and 1.38 for the relative differences).

Generalization during and after the training period
It is important to understand whether there are any differences in the performances of the algorithm between the years covered by the training set and those not covered by it as this may provide an indication on the degree of robustness of the NN with respect to changes of the instrumental response.Separate error statistics were computed for the D test1 , containing examples pertaining to the period between 2004 and 2008 and the D test2 sets, consisting of examples acquired after 2008.The results are summarized in Table 2.
The statistics of the comparison between the NN results and the sonde observations are similar to the results for the training period (bias smaller than 1 DU, RMSE smaller than 6 DU, correlation coefficient larger than 0.8).These results indicate that applying the NN to OMI data acquired after the period covered by the training set should not result in a significant performance degradation of the algorithm.This is consistent with the very good

Geographical features in the retrieval algorithm
The performances of the algorithm were evaluated after stratifying the D test set by latitude zone.Five zones were defined: Antarctica (latitude between 90 • S and 60 • S), southern midlatitudes (60 • S to 30 • S), tropics (30 • S to 30 • N), northern midlatitudes (30 • N to 60 • N) and the Arctic (60 Maps of mean biases, Pearson correlation coefficients, and RMSEs found over the ozonesonde stations having at least 35 measurements included in the test dataset are shown in Fig. 4.
The performances of the algorithm, in terms of mean bias, RMSE and Pearson coefficient, are comparable for four of the five zones.Only for the Arctic region the bias was larger.The causes of this bias are currently under study.A possible reason might lie in artifacts related to the occasionally difficult definition of the tropopause in this region.The results are summarized in Table 3.
Table 4 presents a summary of the comparison statistics for each of the stations with at least 20 measurements included in the D test set.The stations are sorted in order of increasing latitude.For most stations the NN results agree quite well with the sonde observations (correlations between 0.72 and 0.88, biases between −3 and 2 DU).The last 5 entries in Table 4 are the Arctic stations.It can be noticed that the OMITROPO3-NN has a negative bias over all these stations.Such bias is particularly significant at Sodankyla (−3.68 DU).Scatter plots and time series of the "true" and retrieved TCO as a function of the day of year (DOY) for the stations Broadmeadows (Australia) and Goose Bay (Canada) are shown in Figs. 5 and 6 as examples.Similar plots for other stations can be found in the supplementary information.

Non-climatological features
An important question is to what extent is the algorithm capable of recognizing anomalous events, i.e., cases of large departures of the actual TCO from its climatological value used as an a priori for the retrieval.In order to investigate this aspect, a TCO relative anomaly was defined as the percent difference between the actual TCO and its climatological value taken from the Ziemke climatology, and the difference between the retrieved and the actual TCO anomalies were analyzed.The results on the D test set are plotted in Fig. 7.The correlation coefficient between the actual and the retrieved TCO anomalies is smaller than the correlation found between the TCO absolute values.Nevertheless, there still exists reasonable agreement between the actual and the retrieved anomalies, as correlations decreased only from 0.83 to 0.72, indicating that the algorithm uses information other than the a priori in order to perform its retrievals.Such information comes from the satellite measurements as well as from the reanalysis data provided as inputs for the NN.An attempt to investigate the relative contribution of satellite measurements and satellite data to the retrieved TCOs is made in the next subsection.
The geographical dependence of the algorithm performances with TCO anomalies is shown in Fig. 8, where a map of the Pearson correlation coefficient between actual and retrieved TCO anomalies, over the ozonesonde stations having at least 35 measurements included in the test set, is shown.The map indicates that the anomaly detection capability of the NN at the tropics is worse than at mid-and polar latitudes.This could be related to the limited availability of training data in the tropics.However, it must be kept in mind that a precise TCO anomaly estimation in the tropics is a challenging task because the range of the anomalies over this area is usually small.

Contribution of OMI reflectances to the retrieved TCO
Given the large amount of ancillary data used by the OMITROPO3-NN, many of which are correlated with the TCO, it is important to evaluate to what extent using OMI reflectances improves the results with respect to using only the ancillary data themselves in a regression.Some insight on this point can be obtained by training a second NN using only the ancillary data as inputs.This second network achieved a RMSE of 6.11 DU on the test set, with a correlation coefficient of 0.78.The results divided by station are shown in Table 5.It can be seen that the NN trained using only the ancillary data performs worse than the OMITROPO3-NN over most of the ozonesonde stations.The use of OMI reflectances seems to produce the most significant improvements at high latitudes, whereas the differences between the two NNs are less significant over the tropics.The difference between the OMITROPO3-NN and the NN trained using only ancillary data becomes more evident if the performances of the two NNs are evaluated with respect to the TCO anomaly.Figure 9 shows the TCO anomalies estimated by this second NN, compared with those measured by the ozonesondes.The correlation coefficient between the actual and the estimated TCO anomalies decreases from 0.72 to 0.58.Comparing Fig. 9 with Fig. 7, it can be seen that the NN trained without OMI data has a tendency to drastically underestimate TCO anomalies larger than about 30 %, whereas this tendency does not seem to be present in the OMITROPO3-NN.Table 6 summarizes the differences in the anomaly correlation In Fig. 10, global TCO fields retrieved by the OMITROPO3-NN algorithm on 17 (top) and 26 (bottom) August 2006 -expressed in Dobson units -are shown.The grey areas -where no retrieval is provided -are either nonsunlit areas, areas where the cloud fraction exceeded the 30 % threshold, or areas over which the quality criteria imposed on the OMI spectra (Sect.3.3) were not satisfied.Apart from a striping effect that can be noticed in the along-track direction, a visual inspection of the results indicates that reasonable synoptic patterns can be identified.It is likely that the stripes are caused by several types of noise in the irradiance data, and that the effect can be partly mitigated by replacing standard irradiance products with composite products such as multiyear means, as explained by Veihelmann and Kleipool (2006).It must be noted, however, that in the ozone profile retrieval algorithm by Liu et al. (2010) the use of multiyear mean irradiance did not solve the problem completely because also the radiometric calibration of the OMI radiances contributes to the striping effect.Another feature that sometimes appears is represented by some abrupt meridional gradients in the retrieved TCOs (see, e.g., the northern edge of the "red" region in the Central Asia on 17 August 2006, above panel in Fig. 10).This might be due to the coarse resolution of either the tropopause or the a priori fields used as inputs in the OMITROPO3-NN.
The day of 26 August has been chosen as a sample date also because it allows a visual comparison with a TCO map shown in the paper by Liu et al. (2010).For the reader's convenience, such a map is reported in Fig. 11.A similar color scale was used in Figs. 10 and 11 in order to facilitate visual comparisons.For instance, it can be noticed that the ozone peak between southern Brazil, northern Argentina, and Paraguay is reproduced quite well by the OMITROPO3-NN algorithm.The same holds for the high-ozone areas around the Azores, the Eastern coast of the United States, the Black Sea, off the coast of southwestern Africa and south of Madagascar.Also, the ozone patterns over Australia look similar.The main differences exist over North Africa, where the OMITROPO3-NN seems to yield larger ozone values, and over Central Asia, where the OMITROPO3-NN seems to yield a more extended area of large TCO than Liu et al. (2010).Unfortunately, no correlative measurements over these areas were found to assess which of the two algorithms performed better.

Comparisons with the TM5 chemistry/transport model
In order to have a more quantitative assessment, the TCO fields retrieved on 17 and 26 August 2006 were compared to TCO fields simulated using the chemistry/transport model  (CTM) TM5 (Krol et al., 2005;Williams et al., 2012).The model provided simulated ozone fields at 34 pressure levels, on a grid of 3 • in longitude by 2 • in latitude.In order  to perform the comparison, both the NCEP tropopause pressure and the TCO fields retrieved by the OMITROPO3-NN were mapped on the same grid.The remapping has been done by selecting all the OMI pixels whose center lie within each TM5 grid cell, and associating the median non-missing TCO to the cell.The NCEP tropopause pressure was used as upper integration limit for the TM5 simulated ozone profiles.
The TCO fields simulated using TM5 on the two dates are shown in Fig. 12, and scatter plots of modeled versus retrieved TCOs are shown in Fig. 13.Such statistics show that the OMITROPO3-NN has a positive bias of about 4 DU with respect to TM5.The Pearson correlation coefficient between the TCO fields is slightly larger than 0.80 for both the dates.
The structure of the differences between the OMITROPO3-NN and the TM5 estimates is shown with more detail in Fig. 14, where the histograms of the absolute differences for the two dates are depicted.
Figure 15 shows a map of the NN-TM5 absolute differences for the two dates under study.
It can be noticed that spatial patterns in the differences between OMITROPO3-NN and TM5 exist.In particular,

Retrieval sensitivity to tropopause pressure
Whenever a retrieval algorithm is developed, it is important to assess its sensitivity to its input quantities.In the case of NNs, a powerful way to do this is represented by the analysis of the NN input Jacobians, i.e., the derivatives of the NN model W * with respect to its inputs x.An important property of single hidden layer NNs is that their input Jacobians can be written analytically (Blackwell and Chen, 2009).
Since NN mappings are nonlinear, a difficulty in using their Jacobians for sensitivity analyses lies in the fact that they are input dependent.One method to overcome this difficulty is to use the Jacobian to define a NN sensitivity factor (SF) of an output y j with respect to an input x i as the ratio between the fractional change of y j with respect to its actual value, and the corresponding fractional change of x i : As an example of the application of the NN Jacobians to the OMITROPO3-NN, its derivative with respect to the tropopause pressure was derived.It can be expected that the tropopause information plays an important role in the tropospheric ozone retrieval, especially outside the tropics, given the appreciable degree of correlation between the tropopause height and the TCO (de Laat et al., 2005).Thus, it is interesting to assess whether this kind of knowledge is well incorporated in the OMITROPO3-NN.Two maps of the algorithm SF with respect to the tropopause pressure -for 17 and 26 August 2006 -are shown in Fig. 16.It can be seen that the SF always assumes negative values.This result is reasonable because it indicates that the negative correlation between tropopause pressure and TCO is captured by the NN model.Furthermore, the SF absolute values tend to increase going from the tropics toward the poles.An increase of |SF| indicates a larger sensitivity of the retrieved TCO to the tropopause pressure.The increase in |SF| is not symmetric with respect to the Equator because of the motion of the Intertropical Convergence Zone (ITCZ) with the season.This could be an indication that the retrievals at midlatitudes are more sensitive to the tropopause pressure during winter.

Conclusions
A new neural network algorithm to retrieve tropospheric ozone from OMI data at global scale -named OMITROPO3-NN -has been presented.The OMITROPO3-NN inherits from previous work and adds novel characteristics like the global coverage, the use of tropopause information to better demarcate the actual troposphere, and the incorporation of ancillary data and a priori information into the NN input vector in order to improve the retrieval accuracy.As a result, the OMITROPO3-NN provides daily global estimates of the tropospheric ozone column.
The algorithm has been validated against ozonesondes and CTM simulations, and encouraging results have been obtained.Overall, the NN appears to be capable of determining the spatial and temporal TCO variability.
The OMITROPO3-NN retrievals were first compared to ozonesonde measurements collected in several geographical locations around the globe, both during and after the time period covered by the training set.As for the latter point, it was found that the OMITROPO3-NN performs reasonably well also after the training period, even though a slight increase in the global retrieval bias seems to be present.
Over all the latitude bands except the Arctic, a relatively low bias against the ozonesonde measurements was noticed.The correlation coefficients between retrieved and measured tropospheric ozone columns range approximately between 0.75 and 0.85, and the RMS errors are between 5 and 6 DU.On the other hand, over the Arctic a larger negative bias was detected, whose cause is a topic of ongoing research.
The ozonesonde data were also used in order to assess the capability of the OMITROPO3-NN to detect and estimate departures of the tropospheric ozone columns from their climatological values.A global correlation coefficient of about 0.70 was found between the actual and the retrieved relative anomalies.A geographical analysis of this correlation coefficient seems to suggest that the anomaly estimation capability of the OMITROPO3-NN over the tropics is worse than at other latitudes.This may indicate that an insufficient training was obtained in this latitude band due to the relatively low number of available ozonesonde data.Future versions of the algorithm will have to address this problem properly.A possible approach may consist of complementing ozonesonde data with radiative transfer simulations in tropical scenarios.Another alternative is the relaxation of colocation criteria over the tropics.
In order to assess the contribution of OMI reflectances to the retrievals, a second NN was trained using only the ancillary data.The estimation capabilities of this second NN were shown to be worse than those of the OMITROPO3-NN, especially in the estimation of TCO anomalies.
After the comparison with ozonesonde data, examples of operational use of the OMITROPO3-NN were provided.The tropospheric ozone fields retrieved by the OMITROPO3-NN in two dates during August 2006 were compared with simulations made with the TM5 CTM.Such comparisons suggest that the OMITROPO3-NN has a bias of about 4 DU with respect to TM5.However, the differences between retrieved and simulated tropospheric ozone fields exhibit a peculiar geographic pattern, with the OMITROPO3-NN that overestimates TM5 simulations over southern midlatitudes and underestimates between the tropics.Despite this, the simulated global spatial patterns are fairly well reproduced by the OMITROPO3-NN, as shown by the correlation coefficients, which are higher than 0.80.
In addition to providing daily fields of the tropospheric ozone column, the OMITROPO3-NN product also stores the input Jacobians of the neural model, which can be useful to evaluate its sensitivity to the input variables, as well as to assess how well the NN is incorporating the knowledge of the relationships between the input and output variables.Examples of the retrieval derivative with respect to the tropopause pressure show that the OMITROPO3-NN seems to capture the tropospheric ozone sensitivity to the tropopause pressure in a physically meaningful way.A similar procedure can be 4 the choices made in the NN training are explained; general validation results are shown in Sect.5; in Sect.6 global tropospheric ozone fields retrieved on two dates during August 2006 are used as examples in order to give further insight into some of the characteristics of the OMITROPO3-NN; Sect.7 presents conclusions and hypotheses for future work.
simply used the latitude as a climatological indicator in their Neural Network Ozone Profile Retrieval System (NNORSY) applied to GOME data.In the present work, the monthly mean tropospheric ozone column -taken from the Ziemke et al. (2011) OMI-MLS tropospheric ozone climatology -was used as additional input for the retrieval algorithm.This climatology was preferred to other climatologies -such as Fortuin and Kelder (1998) or Logan (1998) -because it represents the tropospheric ozone variations with longitude in a finer detail.The horizontal resolution of the Ziemke et al. (2011) climatology is 5 • in latitude and longitude.
the Network for the Detection of Atmospheric Composition Change (NDACC), data from the Intercontinental Chemical Transport Experiment-B (INTEX-B) Ozonesonde Network Study 2006 (IONS06) and the Arctic Intensive Ozonesonde Network Study (ARCIONS) campaigns, performed during 2006 and 2008, respectively

Fig. 1 .
Fig. 1.Spatial distribution of the ozonesonde stations used to construct the dataset to train and test the NN.

Fig. 2 .
Fig. 2. Overall validation results, obtained both during and after the time period covered by the training set.

Fig. 3 .
Fig. 3. Histograms of the absolute (top) and relative (bottom) differences between the retrieved and the target tropospheric ozone columns.

Fig. 4 .
Fig. 4. Mean bias (top), Pearson correlation coefficient (middle), and RMS difference (bottom) between ozonesonde measurements and retrievals for all the measurement stations having at least 35 measurements in the test dataset.

Fig. 8 .
Fig. 8. Pearson correlation coefficient between actual and estimated TCO anomalies for all the measurement stations having at least 35 measurements included in the test set.

Fig. 9 .
Fig. 9. Comparison between actual and estimated TCO anomaly for the NN trained only with ancillary data.

Fig. 10 .
Fig. 10.Global tropospheric ozone fields retrieved by the OMITROPO3-NN algorithm on 17 (top) and 26 August (bottom) 2006.No retrieval is performed on pixels with cloud fraction larger than 30 %.

Fig. 14 .
Fig. 14.Histograms of the absolute differences between the OMITROPO3-NN retrievals and the TM5 TCO simulations on 17 (top) and 26 August (bottom) 2006.The OMITROPO3-NN TCO fields are remapped on the TM5 grid.

Fig. 15 .
Fig. 15.Maps of the absolute differences between OMITROPO3-NN and TM5 TCO fields on 17 (top) and 26 August (bottom) 2006.The OMITROPO3-NN TCO fields are remapped on the TM5 grid.

Table 2 .
Retrieval results during and after the period covered by the training set.The training set covers the period from 2004 to 2008.

Table 3 .
Retrieval results on the test set, stratified by latitude band.
radiometric stability displayed by OMI throughout its operational lifetime.Details about the OMI calibration status can be found at the webpage www.knmi.nl/omi/research/calibration/instrument_status_v3/perf_plots/index.html.

Table 4 .
Retrieval validation results divided by station, sorted by increasing latitude.Only stations with at least 20 measurements included in the D test set were considered.

Table 5 .
Results for the NN trained without OMI reflectances, divided by station, sorted by increasing latitude.Only stations with at least 20 measurements included in the D test set were considered.

Table 6 .
Pearson correlation coefficients between observed and estimated TCO anomalies for the OMITROPO3-NN and for the NN trained without OMI reflectances, divided by station, sorted by increasing latitude.Only stations with at least 20 measurements included in the D test set were considered.