Global retrievals of near-infrared sun-induced chlorophyll fluorescence (SIF)
have been achieved in the last few years by means of a number of space-borne
atmospheric spectrometers. Here, we present a new retrieval method for medium
spectral resolution instruments such as the Global Ozone Monitoring
Experiment-2 (GOME-2) and the SCanning Imaging Absorption SpectroMeter for
Atmospheric CHartographY (SCIAMACHY). Building upon the previous work by

During the process of photosynthesis, the chlorophyll-

The first global SIF observations have been achieved in the last 4 years
by studies from

One intrinsic limitation of the GOSAT-FTS data arises from the coarse
resolution of global maps (2

A further achievement was the data-driven study of

Here, we present a new SIF retrieval method using a comparable methodology to
that developed for ground-based instrumentations by

The Global Ozone Monitoring Experiment-2

Sample GOME-2 spectrum in band 4. The spectral window that we use for SIF
retrievals (720–758

SCIAMACHY

SCIAMACHY measured alternately in nadir and limb mode, which leads to
blockwise rather than continuous nadir measurements. Due to this default scan
option, a global coverage is achieved within 6 days. The swath width of
960

As stated in

The main challenge when retrieving SIF from space-borne instruments is to
isolate the SIF signal from the about 100-times-more-intense reflected solar
radiation in the measured top-of-atmosphere (TOA) radiance spectrum. This
section describes a strategy which is similar to the data-driven method
proposed by

Assuming a Lambertian reflecting surface in a plane-parallel atmosphere, the
TOA radiance measured by a satellite sensor (

As stated above, we use a statistical modeling approach similar to

Here,

The statistically based approach assumes that spectra without SIF emission
can reproduce the variance of the planetary reflectance also for spectra
which contain SIF. For this purpose, the data are divided into a training set
(measurements over non-fluorescent targets) and a test set (basically all
measurements over land). According to Eq. (

The training set is exclusively composed of measurements over non-fluorescent
targets and we assume that Eq. (

First of all,

Estimation of apparent reflectance (green) from a sample GOME-2 measurement
in the 720–758

In the next step, the apparent reflectance estimate is used to normalize

An important advantage in using this method is that there is no need to perform explicit radiative transfer calculations to characterize atmospheric parameters, which affect the measurement (e.g., temperature and water vapor profiles).

According to the preliminary forward model in Eq. (

As it was shown by

Estimation of the effective atmospheric transmittance in down- and upward
direction (black) and in upward direction (red) using Eq. (

However, a few implications arise through this approach, which are evaluated in the following.

Firstly, it is necessary to assume zero SIF, although these measurements
potentially contain a SIF emission. Hence, an in-filling of atmospheric
absorption lines might occur, which potentially affects the estimation of

Another consequence of the

Our forward model is linearized as a result of the implementation from

Changes in the estimated effective ground to sensor transmittance
(

By using

A consequence of our approach is an increased number of state vector elements
compared to

The forward model in Eq. (

It should be mentioned that there are also other methods to compare and select models in order to avoid problems of overfitted measurements. Here, we decided to use the BIC because it penalizes the number of model parameters the most. It should further be noted that it would be necessary to test all possible combinations of model parameters to find the “best” model, which is computationally too expensive. For this reason, a stepwise model selection (backward elimination) is performed.

Using the backward elimination algorithm has the consequence that the number
of provided PCs is unimportant, as long as there are more PCs provided than
actually necessary for an appropriate fit. In reverse it means that the
optimum number of PCs is determined automatically. The detailed behavior of
this supplementary step in comparison to a simple linear regression using all
potential coefficients (all candidate variables) will be shown below in
Sect.

In order to assess the uncertainty of the SIF measurements, the 1

The measurement noise scales with the square root of the signal level, which
is an appropriate assumption for grating spectrometers such as GOME-2 and
SCIAMACHY. Here, we determine

In order to assess the restriction for the precision of spatiotemporal SIF
composites due to instrumental noise only, the standard error of the weighted
average can be calculated for each grid cell by

Furthermore, the standard error of the mean from monthly mapped SIF has been
computed for SCIAMACHY and GOME-2 as follows

The retrieval approach has been tested for a wide range of conditions using simulated radiances in order to assess retrieval precision and accuracy, the effect of the backward elimination algorithm as well as the optimal retrieval window. This section describes the underlying simulations briefly and examines retrieval properties and advantages with respect to a simple linear model without a backward elimination.

As in

Figure

Input vs. retrieved SIF at 740

The good correlation between simulated and retrieved SIF in
Fig.

Retrieved minus simulated SIF in dependence on the simulated solar zenith
angle (SZA), the water vapor column (TCWV) and aerosol optical thickness at
550

The only visible bias is with respect to the solar zenith angle. Low illumination angles cause a slightly higher variance, which can be expected since the noise level increases with a higher TOA radiance.

Figure

Estimated SIF error in dependence on the simulated solar zenith
angle (SZA) for the same retrieval properties as in Fig.

In view of the good correspondence between input and retrieved SIF in the
end-to-end simulation, it can be stated that the retrieval method is
appropriate. One limitation of this sensitivity study consists of the absence
of clouds, which inevitably impact the retrieval of SIF. The result of other
simulation studies by

We performed the retrieval several times using 5–25 PCs in order to assess
the sensitivity to the number of provided PCs. In addition, the backward
elimination algorithm was disabled (all

The plot on the left shows the average number of selected PCs (blue) and coefficients (green) as a function of PCs provided to the backward elimination algorithm. In the following, the retrieval results for the linear model fit using all potentials coefficients (red) are compared to the backward elimination fit (blue). Depicted are the bias (mean difference of retrieved minus input SIF), the average standard deviation of retrieved SIF for the 60 TOC SIF spectra (SD) and the mean Bayesian information criterion (BIC).

The first plot in Fig.

The bias which was calculated through the mean difference of retrieved minus input SIF represents the accuracy of the retrieval. It can be seen that the bias drops down to a value close to zero when providing more than seven PCs for both the linear model fit and the backward elimination fit. Using all coefficients leads to a slightly increasing positive bias associated with a larger number of PCs, which is not the case when the backward elimination is applied. Consequently, an unnecessary complex model would potentially result in overestimated SIF values.

A difference between the disabled and enabled backward elimination algorithm can also be seen in the comparison of the mean standard deviation. Here, the resulting standard deviations of the retrieved SIF under differing atmospheric conditions for the 60 TOC SIF spectra were averaged to assess the retrieval precision. It can be noted that the precision of the backward elimination fit remains constant, while the linear model fit loses precision with a larger number of PCs. Also this comparison suggests providing at least seven PCs to the retrieval.

More pronounced differences arise in the comparison between the mean Bayesian
information criterion (BIC) values. This fact is expected since the number of
coefficients serves as weight for this criterion (as described in
Sect.

Theoretically, there should be no more variability in data points for large
numbers of initial PCs if the model parameter selection is enabled. Although
Fig.

Even if differences are small, it can be concluded that the accuracy (no significant bias) and precision (decreased average standard deviation) are enhanced when the backward elimination is enabled. Hence the noise is reduced by selecting only appropriate coefficients which is expected to be of particular importance for real satellite measurements. Unfortunately, there is no ground truth against which the retrieval could be adjusted or validated for real satellite measurements. As a consequence, it is not possible to determine the most appropriate number of PCs for real satellite data. Thus, it is advantageous that the backward elimination algorithm ensures stable results, regardless how many PCs are provided (with the restriction that there is a minimum number of required PCs). Furthermore, an overfitting of the measurement is avoided by using the discussed algorithm.

As a conclusion of these findings, we decided to provide initially 10 PCs
for the retrieval applied to real GOME-2 and SCIAMACHY data. The backward
elimination algorithm selects the required model parameters (a subset of
candidate parameters from Eq.

Correlation error matrix of a sample retrieval. A total of 10 PCs were supplied,
atmospheric conditions were set to a middle latitude summer temperature
profile, 955

As can be seen, correlations of selected model parameters with

In order to justify the 720–758

Comparison of different retrieval windows. Shown is the linear fit
(retrieved SIF

We found that both confined and extended retrieval windows lead to a slight
bias. Furthermore, the extended retrieval windows require a larger number of
PCs. The selected retrieval window from 720–758

It is known that vegetation has a unique, spectrally smooth reflectance
signature in the considered wavelength ranges, which means that there are no
distinct absorption lines. Nevertheless, the spectral reflectance of
vegetated areas changes rapidly in the red edge region (680–730

Advantages of the 720–758

it covers the second peak of SIF emission at 740

it contains spectral regions with a high atmospheric transmittance (between
721.5–722.5 and 743–758

The presented SIF retrieval method has been used to produce a global SIF data
set from GOME-2 data covering the 2007–2011 time period. In addition, the
SIF retrieval has been implemented for SCIAMACHY data for the August 2002–March 2012
time span. This section describes the application of the algorithm
to the satellite data, results from spatiotemporal composites as well as
a comparison to the results from

In contrast to the synthetic data set, where training and test sets are clearly separated, the real satellite data are first partitioned as described below.

The selection of the training set is relevant to obtain meaningful results
because these data are used to model the planetary reflectance of the desired
measurements over land (test set). It is therefore essential to capture as
many atmospheric states and non-fluorescent surfaces as possible within the
training set to ensure its representativeness. Random samples of measurements
over areas where no SIF signal is expected (e.g., deserts, ice and sea
regardless of the degree of cloudiness) are used for this purpose. The
selection of such measurements is based on the determination of the land
cover using the International Geosphere and Biosphere Programme (IGBP)
classification

In general, the test set is composed of all available land pixels, but cloud-contaminated measurements might make it problematic to retrieve SIF since it can
be expected that the SIF signal is partly shielded in the presence of clouds,
which potentially biases the retrieval. For this reason, the range of cloud
fractions is limited to 0.5, which saves also computation time, whereas the
restriction is explicitly not applied to the training set. We use the
effective cloud fraction from the Fast Retrieval Scheme for Clouds from the
O

The FRESCO cloud fraction is not available for the presence of snow and ice,
which is of particular relevance in the winter time at higher latitudes. In
order to obtain a complete time series of SIF in affected regions, we
evaluate also measurements with an unknown cloud fraction in the presence of
snow. Hence, measurements over snow are determined using the ERA-Interim
re-analysis data

Following the results of our sensitivity analysis, at least eight PCs should
be provided for the SIF retrieval in the 720–758

As a consequence of these findings, we provided 10 PCs for the retrieval for
GOME-2 and 20 PCs when SCIAMACHY data are used. The processed data can be
retrieved from

Monthly composites of SIF at 740

Since it cannot be excluded that the retrieval fails for single measurements,
the retrieval results have to be checked. This is done by using the residual
sum of squares (RSS) from each retrieval. The resulting coefficients are used
to generate a synthetic measurement which is compared with the original
measurement. The RSS value is then the discrepancy between the data and the
model. One major issue, which causes high residuals, is the South Atlantic
Anomaly (SAA), discussed below in Sect.

In principle, it is possible to achieve a global coverage of SIF measurements
within 1.5

GOME-2 versus SCIAMACHY data of TOA reflectance at 720

Overall all three results compare very well concerning spatial patterns,
although the SIF composite derived with SCIAMACHY is provided in a spatial
resolution of

One reason might be a higher cloud contamination of the bigger footprints
from SCIAMACHY. As already stated in Sect.

As can be seen from the TOA reflectance comparison, there is most likely
no offset. When considering the cloud fractions, the overall linear fit is
comparable to that of the TOA reflectance but points scatter more widely
(

Relationship between SIF values derived from GOME-2 and SCIAMACHY
on a monthly basis to scale SCIAMACHY and V25 GOME-2 SIF values in Fig.

The January 2011 SIF composites in Fig.

Comparison of monthly SIF averages derived from SCIAMACHY (blue) and
GOME-2 (red) for croplands between 100–

Figure

In order to provide an typical estimate of the uncertainty of monthly SIF
composites due to instrumental noise, Fig.

Standard error of the weighted average (using Eq. (

Highest uncertainties occur over bright areas associated with a higher photon noise, e.g., deserts and regions with snow/ice. The resulting pattern can therefore be interpreted as an error increasing with TOA radiance.

Since we present a similar SIF retrieval method to that proposed by

Similar to Fig.

SIF retrieval results from the presented algorithm versus V25 results
provided by

It is immediately noticeable that the absolute SIF values obtained from the presented retrieval are about 2 times higher.

This is inconsistent with our simulation-based retrieval test with a similar
fitting window to that of the V25 algorithm from

At this point, we are not able to judge which absolute values are closer to
reality, since there is a lack of ground truth and validation. The only
possibility to assess the validity of results from the data-driven
approaches, besides the sensitivity analyses, is at present a comparison to
physically based SIF retrieval results from GOSAT data. GOSAT and
GOME-2/SCIAMACHY SIF values are expected to be different, which is due to
different overpass times, evaluated wavelengths (GOSAT SIF is evaluated
between 755–759

In Fig.

SIF retrieval results from GOSAT

The averaging interval has been selected according to the available
overlapping periods for all four data sets. The underlying retrieval results
are rastered on a

In view of these results, it can be assumed that the presented retrieval
approach is not less valid than that of

Monthly composites of the standard error of the mean (SEM,
Eq.

Another point to be considered is that the error estimation might be too
optimistic for large parts of the South American continent. The reason is the
South Atlantic Anomaly (SAA), which is a region of reduced strength in the
Earth's magnetic field. Hence, orbiting satellites are exposed to an
increased flux of energetic particles, which leads to increased noise in the
measurements. An impact of the SAA on the SIF retrieval using GOME-2 data has
also been described in the study of

The center of the SAA is in close proximity to the coast of Brazil at about
40

In contrast to the SCIAMACHY data set, the impact of the SAA on GOME-2 data
translates also to the residual sum of squares (RSS) of the SIF retrieval
(not shown). High RSS values occur in particular in the region of large SEM
values on the South American continent. This allows the conclusion that the
noise and thus the measurement error is much higher than expected from our
error propagation depicted in Fig.

Mean SIF of two different training sets derived with data from GOME-2 and SCIAMACHY averaged over latitudes for July 2011. The first training set (Training 1, solid lines) corresponds to a daily sampling of the data while the second training set (Training 2, dotted lines) is sampled on a 3-day basis.

Time series of SIF retrievals from GOME-2 data over a box in the Amazon Basin
(70–65

Consistency checks and plausibility controls of the derived SIF values have
lead to another source of error. We retrieved SIF values from the training
set (sea, desert and ice) with the expectation that all SIF values are close
to zero. It turned out that there is a slight time- and latitude-dependent
offset in retrieved SIF values which amounts to up to

As can be seen, there is almost no difference between the different
sampling methods of the SCIAMACHY training data. Nevertheless, a slight
offset of about

In contrast to the usually applied pre-filtering of cloud contaminated
measurements (discussed in Sect.

Tropical rainforest areas are frequently covered by clouds and therefore, it
is expected that clouds may affect the retrieval of SIF in particular in such
regions. Therefore, we produced a time series over a box in the Amazon Basin
(70–65

Overall it can be seen that SIF values are decreasing with an increasing
cloud fraction threshold, whereby the temporal pattern remains almost
unaffected. Using only retrievals with a low cloud fraction (

A new statistically based approach to retrieve SIF from GOME-2
and SCIAMACHY data has been introduced. Building upon previous works from

The basic assumption to retrieve SIF from space is that the contribution of SIF can adequately be separated from low and high frequency components due to surface and atmospheric properties. In our forward model, the low and high frequency components are represented by a combination of a third-order polynomial in wavelength with atmospheric PCs. A backward elimination algorithm selects the required model parameters automatically with respect to the goodness of fit balanced by model complexity. Our sensitivity analysis reveals that the precision is enhanced, retrieval noise is reduced and the risk of an overfitting is minimized by applying the stepwise model selection to our forward model.

The retrieval approach has also been applied to spectra acquired by GOME-2
and SCIAMACHY. Thus, we were able to present a continuous SIF data set
covering the 08/2002–03/2012 time period using SCIAMACHY data and the
2007–2011 time span using GOME-2 data. The number of selected PCs and total
model parameters for the GOME-2 data is in line with the expectation from the
sensitivity analysis while twice as many PCs (on average 14) are employed for
SCIAMACHY data. Nevertheless, our approach suggests using a significantly
smaller number of PCs compared to

Although the achievable spatial resolution of SIF maps from SCIAMACHY data is
coarser than from GOME-2, there is now a decade long continuous SIF time
series available. However, a significant discrepancy in absolute SIF values
arises when comparing our retrieval results with the V25 results from

Furthermore, it must be considered that the SAA causes high uncertainties in large parts of the South American continent when using GOME-2 data. In contrast, a significant impact of the SAA on SCIAMACHY data has not been found. The analysis of different training sets revealed that there is a slight zero level offset in GOME-2 SIF retrievals which is probably related to an instrumental issue. On the basis of the GOME-2 SIF data set in 2009, we have shown that the retrieval is only moderately affected by cloud contamination. It has essentially been found that the SIF signal decreases with an increasing cloud fraction, but the seasonality is maintained.

Finally, it has to be noted that the flexibility of the retrieval method
makes it also applicable to other instruments with a similar spectral and
radiometric performance as GOME-2 and SCIAMACHY, such as the upcoming TROPOMI

The research is funded by the Emmy Noether Programme of the German Research Foundation. With thanks to EUMETSAT to make the GOME-2 data available and ESA for providing the SCIAMACHY data. Jochem Verrelst and Luis Alonso from the University of Valencia are gratefully thanked for the reflectance and fluorescence simulations produced in the framework of the ESA FLUSS project. Edited by: B. Kahn