A thermal infrared instrument onboard a geostationary platform for CO and O3 measurements in the lowermost troposphere: Observing System Simulation Experiments (OSSE)

This paper presents observing system simulation experiments (OSSEs) to compare the relative capabilities of two geostationary thermal infrared (TIR) instruments to measure ozone (O3) and carbon monoxide (CO) for monitoring air quality (AQ) over Europe. The primary motivation of this study is to use OSSEs to assess how these infrared instruments can constrain different errors affecting AQ hindcasts and forecasts (emissions, meteorology, initial condition and the 3 parameters together). The first instrument (GEO-TIR) has a configuration optimized to monitor O 3 and CO in the lowermost troposphere (LmT; defined to be the atmosphere between the surface and 3 km), and the second instrument (GEO-TIR2) is designed to monitor temperature and humidity. Both instruments measure radiances in the same spectral TIR band. Results show that GEO-TIR could have a significant impact (GEO-TIR is closer to the reference atmosphere than GEO-TIR2) on the analyses of O 3 and CO LmT column. The information added by the measurements for both instruments is mainly over the Mediterranean Basin and some impact can be found over the Atlantic Ocean and Northern Europe. The impact of GEO-TIR is mainly above 1 km for O3 and CO but can also improve the surface analCorrespondence to: M. Claeyman (marine.claeyman@aero.obs-mip.fr) yses for CO. The analyses of GEO-TIR2 show low impact or O3 LmT column but a significant impact (although still lower than for GEO-TIR) for CO above 1 km. The results of this study indicate the beneficial impact from an infrared instrument (GEO-TIR) with a capability for monitoring O 3 and CO concentrations in the LmT, and quantify the value of this information for constraining AQ models.


Introduction
The atmospheric composition of pollutants in the lowermost troposphere (LmT; defined to be the atmosphere between the surface and 3 km) is a societal issue because it is associated with air quality (AQ). Poor AQ can lead to negative health effects such as respiratory problems, heart disease and lung cancer. Monitoring and forecasting AQ is becoming routine (e.g. Prev'air in France, Honoré et al., 2008). This concerns both gaseous and particle species and includes ground-level ozone (O 3 ), nitrogen oxides (NO x ) and suspended particulate matter (PM), all of which are identified as potential health hazards (Brunekreef and Holgate, 2002).
O 3 is a key trace gas in the troposphere that plays a significant role in atmospheric chemistry, air quality and radiative forcing (e.g. Jacob, 2000). It is a secondary pollutant produced by the photochemical oxidation of hydrocarbons and carbon monoxide (CO) in the presence of nitrogen oxides (NO x ). It is a precursor to the formation of the hydroxyl radical which impacts the oxidizing capacity of the atmosphere. It is also an irritant gas which can affect severely the respiratory tract and cause damage to vegetation (Seinfeld and Pandis, 1997). In Europe, tropospheric O 3 levels increased rapidly between 1970 and 1990 as a result of increases in precursor emissions (e.g. Lamarque et al., 2005); but this increase has slowed down or declined since 1990 (e.g. Oltmans et al., 2006). CO is a reactive gas which also plays an important role in tropospheric chemistry (Jacob, 2000). It is an O 3 precursor and a tracer of pollution (e.g. Turquety et al., 2009). In addition to atmospheric chemical sources, CO is also a primary pollutant, emitted during incomplete combustion processes, which makes CO a good tracer for urban/industrial fossil fuel burning (e.g. Branis, 2009), wildfires (e.g. Cristofanelli et al., 2009) and tropical biomass burning (e.g. Edwards et al., 2006;Pradier et al., 2006).
In Europe, despite the definition and the implementation of regulations and laws regarding pollutants, AQ is still a concern for the public and the authorities. Reduction of the AQ impact on health may be achieved both with long-and short-term actions (Menut and Bessagnet, 2010). Long-term actions concern global improvement of AQ by reducing anthropogenic emissions. Short-term actions consist in anticipating pollution events, a few days before they happen, to warn the public in advance in order to reduce exposure and help authorities take effective emission reduction measures. AQ monitoring and forecasting is required to achieve these actions.
Current monitoring and forecasting systems mostly rely on three-dimensional models (e.g. Vaughan et al., 2004;Mc-Keen et al., 2005;Honoré et al., 2008;Hollingsworth et al., 2008). Traditionally, AQ monitoring has been done using measurements from ground-based stations. Ground-based in situ observations have the disadvantage of an inhomogeneous spatial coverage, and present a strong variability in their spatial representativeness, their measurement methods and correction factors (Ignaccolo et al., 2008). The main advantage of satellite observations is the good spatial coverage. Ground-based observations and satellite observations of pollutants complement each other; the former sample the surface, the latter sample in the vertical, typically as a column. For AQ purposes, satellite observations have to measure tropospheric composition at adequate spatial (∼10 × 10 km 2 ) and temporal (∼1 h) resolution (Fishman et al., 2008;Martin, 2008). To complement in situ information (e.g. AQ networks, sondes, aircraft measurements), denser observations at continental scales in the lowermost troposphere (LmT; defined to be the atmosphere between the surface and 3 km) are needed for AQ relevant species (e.g. O 3 and CO). These observations can only be provided by a Geostationary Earth Orbit (GEO) platform (Bovensmann and Edwards, 2006). Several GEO missions have been proposed to monitor AQ. In the USA, the GEO-CAPE mission (National Research Council, 2007) dedicated to the measurement of tropospheric trace gases is planned toward the end of the decade. In Japan, a similar mission has been proposed by the Japan Society of Atmospheric Chemistry to monitor O 3 and aerosols (including their precursors) from GEO (Akimoto et al., 2008) and has been recently endorsed by the Japanese Space Agency (JAXA). In Korea, the National Institute of Environmental Research is planning GEMS (Geostationary Environment Monitoring Spectrometer, Lee et al 2010) program to be launched in 2017-2018 onboard a MP-GEOSAT (Multi-Purpose GEOstationary SATellite) which is supposed to be the successive mission of COMS (Communication, Ocean and Meteorological Satellite). In Europe, several GEO missions have been proposed to monitor tropospheric constituents at high temporal and spatial resolution such as GeoTrope  and GeoFIS Orphal et al., 2005). The Meteosat Third Generation -Thermal Infrared Sounder (MTG-IRS) is a planned mission to be launched from 2017. MTG-IRS will be able to provide information on horizontally, vertically, and temporally resolved water vapour and temperature structures of the atmosphere. It will also be able to provide O 3 and CO measurements in the troposphere, using the long-wave infrared and the mid-wave infrared bands respectively.
The sentinel 4 UVN (ultraviolet-visible-near infrared) payload is also a planned mission and will be deployed on the two MTG-Sounder (MTG-S) satellites in GEO orbit over Europe; UVN is expected to provide measurements of O 3 and nitrogen dioxide column, and aerosol optical depth. To complement Sentinel 4 UVN, the mission Monitoring the Atmosphere from Geostationary orbit for European Air Quality (MAGEAQ) has been proposed as a candidate for the Earth Explorer Opportunity Mission EE-8 call of the European Space Agency (Peuch et al., 2009. MAGEAQ is a multispectral instrument (thermal infrared and visible) designed to provide height-resolved measurements of O 3 and CO in the LmT.
A method to determine the beneficial impact of future instruments is the Observing System Simulation Experiment (OSSE) (Atlas, 1997). This method is widely used in the meteorological community for assessing the usefulness of new meteorological satellite data (e.g. Lahoz et al., 2005;Stoffelen et al., 2006;Masutani et al., 2010b). There are actually few studies concerning OSSEs on chemical species. However, two recent OSSE studies have been conducted concerning a GEO platform for AQ purposes. The first one consists of an OSSE for CO in the LmT using a multispectral (nearinfrared and thermal infrared) instrument ). The second one concerns a satellite imager to monitor the aerosol optical depth to improve ground level particulate analyses and forecasts (Timmermans et al., 2009). The aim of this paper is to present a new OSSE for a GEO instrument in the thermal infrared band (called GEO-TIR) with instrument characteristics optimized to monitor O 3 and CO in the LmT. GEO-TIR presents instrument characteristics (signal to noise ratio: SNR and spectral sampling interval: SSI) equivalent to the thermal infrared instrument proposed in the MAGEAQ mission and described in Claeyman et al. (2011). In order to accurately assess the impact of GEO-TIR O 3 and CO observations in an AQ model, we perform several OSSEs to evaluate the sensitivity of the analyses to various key parameters: emissions, meteorology and initial conditions, and for all these parameters simultaneously. We also perform OSSEs for another GEO thermal infrared instrument but with instrument characteristics optimized for temperature and humidity (GEO-TIR2) to evaluate the relative added value of GEO-TIR with respect to GEO-TIR2. GEO-TIR2 has SNR and SSI similar to those of MTG-IRS (Clerbaux et al., 2008a;Stuhlmann et al., 2005). We first evaluate the added value over Europe of GEO-TIR in the LmT column considering several statistical measures (correlation, bias, standard deviation) and then, the vertical impact of GEO-TIR, considering several AQ statistical measures (e.g. good detection, false alarms, missing events).
This paper is organized as follow. In Sect. 2, we describe the OSSE method, the chemistry transport model (CTM), the assimilation scheme used, the synthetic observations, the different experiments, and the statistical measures. In Sect. 3, we discuss the added value of GEO-TIR in an AQ model in the LmT, by comparison with GEO-TIR2. Summary and conclusions are presented in Sect. 4.

The Observing System Simulation Experiment
Observing System Simulation Experiments (e.g. Atlas, 1997) are used to assess the impact of future observing systems. To simulate a future observing system, existing observations are generally replaced by synthetic observations, generated by sampling a nature run, according to the instrument characteristics (observational geometry, spatial and temporal resolution, errors). In some cases, a subset of the future observations can be represented by current observations, but the observing platform of interest is always simulated (see Masutani et al., 2010a for further discussion). In this study, the nature run simulates the true state of the atmosphere and the synthetic observations are simulated through the nature run; no current observations are used. Synthetic observations are then assimilated in the control run of the OSSE. The OSSE discussed is composed of the following elements: 1. A nature run produced using a state-of-the-art model which represents the true atmosphere.
2. Synthetic observations which are sampled through the nature run corresponding to the instruments considered.
3. A control run, which yields an alternative representation of the atmosphere, different from the nature run. In this study, the control run is a free model run and includes no assimilated observations. The differences between the control run and the nature run should ideally be similar to the differences between a state-of-the-art model and the real atmosphere.
4. An assimilation run using synthetic observations from the instruments of interest generated from the nature run and the same model setup configuration as for the control run.
5. Assessment of the added value of the instruments of interest by statistical comparison between nature run, control run and assimilation run. In fact, the assessment is based on the differences between the nature run and control run, and between the nature run and assimilation run. If the difference between the assimilation and the nature run is significantly smaller than the difference between the control run and the nature run, we conclude that the instrument of interest has added value.
Note that in the OSSE described in this paper, the future observing system comprises two GEO observing platforms and no other observations (e.g. ground stations). We think this is justified because at this stage we are only interested in providing a reasonably accurate first order estimate of the added value of the proposed observing platform. Furthermore, because of model uncertainties, we focus in providing a comparison of the relative performance of two instruments and not predicting the absolute performance of the two instruments. In a later work, we will extend this study to include a more complete representation of the future observing system, including the ground-based network, and refine our estimate of the added value of the proposed observing platform. The different elements of the OSSE are described in more detail below.

The reference atmosphere
The MOCAGE (MOdèle de Chimie Atmospheriqueà Grande Echelle) model is used to simulate the nature run. MOCAGE is a three-dimensional CTM for the troposphere and stratosphere (Peuch et al., 1999) which simulates the interactions between the dynamical, physical and chemical processes. It uses a semi-Lagrangian advection scheme (Josse et al., 2004) to transport the chemical species. Its vertical resolution is 47 hybrid levels from the surface up to 5 hPa with a resolution of about 150 m in the lower troposphere increasing to 800 m in the higher troposphere. Turbulent diffusion is calculated with the scheme of Louis (1979) and convective processes with the scheme of Bechtold et al. (2001). The chemical scheme used in this study is RACMOBUS. It is a combination of the stratospheric scheme REPROBUS (Lefèvre et al., 1994) and the tropospheric scheme RACM (Stockwell et al., 1997). It includes 119 individual species with 89 prognostic variables and 372 chemical reactions.
MOCAGE has the flexibility to be used for stratospheric studies (El Amraoui et al., 2008a) and tropospheric studies (Dufour et al., 2004). It is used in the operational AQ monitoring system in France: Prev'air  and in the pre-operational GMES (Global Monitoring for Environment and Security) atmospheric core service (Hollingsworth et al., 2008).
The model uses 2 nested domains, at 2 • over the globe and at 0.5 • over Europe, from 32 • N to 72 • N and from 16 • W to 36 • E. The nature run simulation covers the period from 1 July 2009 to 1 September 2009. The simulated field for 1 July 2009 has been obtained from a free run with RACMOBUS started from a June climatological initial field. The meteorological analyses of Météo-France, ARPEGE (Courtier et al., 1991) were used to force the dynamics of the model every 6 h. The emission inventory used in the nature run is the inventory provided by TNO (Netherlands Organization for Applied Scientific Research) (Visschedijk and Denier van der Gon, 2005), for the Global and regional Earth-system Monitoring using Satellite and insitu data (GEMS) project (Hollingsworth et al., 2008); hereinafter noted GEMS-TNO. This inventory has a high spatial resolution of ∼8 × 8 km 2 , and a temporal resolution of 1 h. It is representative of the year 2003.

The synthetic observations
In this study, we generate synthetic observations for two nadir infrared GEO platforms. The first one (GEO-TIR) has a SSI (0.05 cm −1 ) and a Noise Equivalent Spectral Radiance (NESR: 1.00 nW/(cm 2 sr cm −1 ) and 6.04 nW/(cm 2 sr cm −1 ) for the CO and O 3 spectral windows, respectively) dedicated to monitoring CO and O 3 the LmT (Claeyman et al., 2011). The second one (GEO-TIR2) has the same SSI (0.625 cm −1 ) and NESR (6.12 nW/(cm 2 sr cm −1 ) and 24.5 nW/(cm 2 sr cm −1 ) for CO and O 3 spectral windows, respectively) as MTG-IRS (Stuhlmann et al., 2005). The spectral window for O 3 is taken between 1000 cm −1 to 1070 cm −1 and the one for CO is taken between 2085 cm −1 and 2185 cm −1 for both instruments. The instrument configurations are summarized in Table 1. Considering the high computing cost associated with generating OSSEs, we define a pixel size of 0.5 • × 0.5 • , corresponding to the model spatial resolution and a revisit time of 1 h for both instruments. A resolution of 0.5 • × 0.5 • for AQ monitoring over Europe is commonly used in operational systems (e.g. Prev'air in France, Honoré et al., 2008). Also, we focus here on O 3 and CO, not on NO 2 and PM that have more spatial variability.
To represent the synthetic observations in the OSSE, we need temperature and water vapour fields and their uncertainty. Following the MTG-IRS retrieval study of Clerbaux , we assign uncertainties at each vertical level of 0.5 K for temperature and 10 % for water vapour. The number of pixels at 0.5 • × 0.5 • of an instrument onboard a geostationary platform is very important. In our case, we have to consider about 100 000 profiles per instrument per species per day over the defined domain (Europe). Thus, to study 2 months of synthetic observations for the 2 instruments, we set up a method much faster than using detailed radiative transfer and retrievals models. In the following we define the method and its validation.
Retrievals of LmT O 3 and CO in the infrared strongly depend on the thermal contrast between the surface and the air immediately above it (see e.g. Deeter et al., 2007;Eremenko et al., 2008;Clerbaux et al., 2009). Several parameters (e.g. measurement and temperature error) have to be taken into account to assess the sensitivity of such retrievals. However, among these errors, the smoothing error is the main contributor to the shape of the averaging kernels, which represents the sensitivity of the retrieval to the true atmosphere at different altitudes. From these averaging kernels, one can deduce for example the surface sensitivity of the retrieval. Because of the strong dependence of the averaging kernels on the thermal contrast, we construct a look-up table containing the specific values of the thermal contrast and their corresponding averaging kernels. In addition, to refine the method, we include in the look-up table other errors such as the measurement error and the temperature error, assuming a linear regime between thermal contrast and retrieval. This look-up table is built using the forward model KOPRA (Karlsruhe Optimized and Precise Radiative transfer Algorithm) (Stiller et al., 2002). The retrieval system KOPRA-fit (Höpfner et al., 1998), based on the Tikhonov-Phillips regularization is also employed (Tikhonov, 1963;Phillips, 1962). We generate the averaging kernels and the corresponding covariance matrix error for several thermal contrast values between −20 K and 20 K with a step of 0.2 K representing a total number of 201 values for each instrument configuration. The range of thermal contrast values has been established using statistics on the thermal contrast found in the temperature analyses of the current version of the ARPEGE global Table 2. Correlation coefficient, standard deviation (%) and bias (%) between observations generated with the look-up tables and observations generated with KOPRA-fit, calculated with respect to observations generated with KOPRA-fit for O 3 (1st and 2nd columns) and CO (3rd and 4th columns) and for the configurations of GEO-TIR (2nd and 4 thcolumns) and GEO-TIR2 (1st and 3rd columns). These statistics have been calculated for data with altitudes between the surface and 10 km.
model. This method allows us to provide quickly (with a speed up factor of more than 70 in terms of CPU) the required parameters (errors and averaging kernels) that correspond to any thermal contrast. From these parameters we reconstruct the different trace gas profiles using the quantity (Rodgers, 2000): with x rsim the simulated retrieved profile, x t the true profile corresponding to the calculated profile (nature run) from MOCAGE CTM, x a the a priori profile -a climatology over Europe calculated from the MOCAGE model and A the averaging kernel matrix. ε is defined as a random Gaussian error with a standard deviation corresponding to the square root of the diagonal elements of the error covariance matrix. Note that these quantities are defined in terms of ln(VMR), where VMR stands for the volume mixing ratio. For further details on the averaging kernel shapes and covariance matrix errors of GEO-TIR and GEO-TIR2 the reader should refer to Claeyman et al. (2011). A similar method was used in Edwards et al. (2009) to simulate CO infrared observations using 3 different averaging kernel sets. We validate the method by comparing the values from the look-up table and the results calculated with the comprehensive KOPRA-fit method. The details of the statistics obtained for the validation exercise for both GEO-TIR and GEO-TIR2 and for observations at altitudes between the surface and 10 km are shown in Table 2. The statistics show a very good agreement between the values provided by the look-up table and the KOPRA-fit method. All the correlation coefficients are greater than 0.9 for both O 3 and CO, and for the two instrument configurations. In addition, standard deviations (between 1.7 % and 4.8 %) and biases (between −0.4 % and 1.3 %) are small. Moreover, the histograms of the relative difference between the look-up table and KOPRA-fit (not shown) show a Gaussian-like shape around the value 0 confirming the validity of the simplified approach.
We then use the look-up table to generate observations for two instrument configurations (GEO-TIR and GEO-TIR2) over the two months of the study. To account for cloudy scenes, cloud estimates from meteorological ARPEGE anal-yses are used to assign a cloud fraction to the observation pixels. Pixels with a cloud fraction greater than 0.5 are filtered out. The vertical grid is provided by the retrieval, with a step of 1 km from the surface to 39 km. Since we are interested in the relative added value, we use for both instruments the same approximations to generate the observations. This makes the problem tractable, and is not expected to change the results.

The assimilation scheme
The assimilation system used in this study is MOCAGE-PALM (Massart et al., 2005). The assimilation module is implemented within the PALM framework (Buis et al., 2006). The used assimilation technique is the 3D-FGAT (First Guess at Appropriate Time, Fisher and Andersson, 2001). This technique is a compromise between the 3D-Var (3d-variational) and the 4D-Var (4d-variational) methods. It has been validated during the assimilation of ENVISAT data project (ASSET, Lahoz et al., 2007) and has produced good quality results compared to independent data and other assimilation systems (Geer et al., 2006). Further details on the assimilation system can be found in Massart et al. (2009), El Amraoui et al. (2008b and Claeyman et al. (2010). We use in this study an assimilation window of 1 h.

The experiments
To study the sensitivity of the OSSEs to various key parameters, we perform several experiments summarized in Table 3. For these simulations, we also used MOCAGE but with different degraded configurations in order to have an alternative representation of the atmospheric composition, a priori less realistic than the nature run. For all experiments (except the nature run), we perform 3 simulations: the control run without data assimilation, the assimilation run with assimilation of GEO-TIR and the assimilation run with the assimilation of GEO-TIR2.
The first sensivity test concerns the input meteorological forcings. In the nature run we use the ARPEGE analysis every 6 h whereas in the control run and assimilation run we use instead 48 h forecasts every 6 h. It is denoted hereafter EXP1. In a second sensitivity test, we change the emission inventory. Instead of the detailed GEMS-TNO inventory used in the nature run, we use a global inventory where emissions are given as a monthly mean for biomass burning and a yearly mean for other sources (Dentener et al., 2006) representing the year 2000 (EXP2). Both inventories use different daily and monthly emission factors. Figure 1 shows the emission map of CO and NO x (NO + NO 2 , an O 3 precursor), emitted over Europe on 6 July 2009 according to both inventories. In the GEMS-TNO inventory, emissions show a higher variability than in the global inventory. For example, over Paris or over Madrid the maximum values are higher in the GEMS-TNO inventory, whereas in Northern Europe or in Spain over rural areas, CO and NO x emissions are lower in the GEMS-TNO inventory. However, both inventories show the same emissions of NO x from ships. In Fig. 2, the emission diurnal cycle is shown for CO and NO x and emissions are accumulated over Europe for each hour of 6 July 2009. Generally, more CO and NO x are emitted by the global inventory than by the GEMS-TNO inventory but locally over large European cities the opposite is the case. Three peaks are observed in the global inventory at 06:00, 12:00 and 18 UTC for both CO and NO x emissions whereas only 2 are observed at 08:00 and 17:00 UTC for CO, and at 08:00 and 18:00 UTC for NO x in the GEMS-TNO inventory.   In the third sensitivity test, the initial conditions are modified (EXP3). In the nature run, the initial condition from 1 July 2009 is provided from a previous free run. For the control run and the assimilation runs, we change the initial condition every week by taking the field from the nature run one week before (e.g. the initial field from 1 July 2009 in the control run and assimilation runs is provided by the field from 25 May 2009 from the nature run). We repeat this change every week to keep a significant difference between the nature run and the control run (see Sect. 3); after one week the influence of the initial condition is very low in the LmT on O 3 and CO concentrations (not shown). This modification introduces discontinuity in the O 3 and CO time-series, and this effect is considered in the next section.
The last experiment (EXP4) involves all of the 3 sensitivity tests (meteorology, emissions and initial condition). This experiment contains the main errors encountered in an AQ model (e.g. Menut and Bessagnet, 2010), except the chemical scheme and the transport scheme which are kept the same for all experiments presented here. Although this may impact the results of the study, we consider that for this OSSE, the nature run and the control run, and the nature run and the assimilation runs have enough realistic differences to make the experiments meaningful (see Sect. 3). Table 4 presents the correlation, the bias and the RMS between the 4 control runs (EXP1a, EXP2a, EXP3a and EXP4a) and the nature run averaged over 2 months over Europe (see domain in Fig. 1). The 4 sensitivity tests generate different errors: EXP1a is characterized by high RMS (∼10 % for O 3 and ∼7 % for CO) and low bias (0.19 % for O 3 and −1.02 % for CO); EXP2a by high bias (∼8 % for O 3 and CO), high correlation (>0.9) and low RMS (∼5 %); EXP3a by a low correlation (<0.7), high RMS (∼13 % for Table 4. Correlation, bias and RMS in % calculated for ozone and CO LmT column between the nature run and the control run (a), between the nature run and GEO-TIR2 assimilation run (b) and between the nature run and the GEO-TIR assimilation run (c) for the 4 experiments averaged over 2 months (July and August 2009).

Statistical analysis
The impact of the observations (GEO-TIR and GEO-TIR2) is evaluated by comparing the results from the control run and the assimilation runs with the "truth" represented by the nature run. To provide a degree of robustness to our OSSEs, we perform significance tests to check at the 0.95 and 0.99 confidence limit if differences between the control run and the nature run and differences between the assimilation runs and the nature run are significant, as was done in Lahoz et al. (2005). The null hypothesis is that the means of the differences between the control run and the nature run and the differences between the assimilation runs and the nature run are the same. The datasets have sufficient data to assume a normal distribution. We used the two-sample hypothesis z-test defined as: where NR is the nature run dataset, CR is the control run dataset, AR is the assimilation run dataset, σ is the root-mean square (RMS) and N is the number of grid points. Vertical lines indicate absolute value. Furthermore, in order to quantify the GEO-TIR and GEO-TIR2 added values, we compute indicators commonly used in AQ modelling evaluation: absolute difference, RMS difference and temporal correlation. For the protection of public health, the WHO (World Health Organization, 2005; Krzyzanowski and Cohen, 2008) has established a threshold at 100 µg m −3 of O 3 concentrations for the daily maximum of a 8-h running average. We use this threshold to calculate 3 contingency tables: the percentage of good detections (GD), the percentage of correct analyses above threshold (GD+) and the percentage of false alarms (FA) calculated as follows: where NR1 AR1 represents the number of grid points where the nature run is greater than 100 µg m −3 and the assimilation run (or control run) is above 100 µg m −3 ; NR0 AR0 represents the number of grid points where the nature run is less than 100 µg m −3 and the assimilation run (or control run) is less than 100 µg m −3 ; N is the number of all grid points; NR1 represents the number of grid points where the nature run is greater than 100 µg m −3 ; NR0 AR1 represents the number of grid points where the nature run is less than 100 µg m −3 and the assimilation run (or control run) is greater than 100 µg m −3 ; and AR1 represents the number of grid points where the assimilation run (or control run) is greater than 100 µg m −3 .

Evaluation of the nature run
We compare the nature run provided by the MOCAGE model to O 3 and CO ground-based station observations over France from 1 July 2009 to 31 August 2009, to verify that the nature run is representative of the "true atmosphere". Figure 3 shows the time-series of CO (panels a and b) and O 3 (panels e and f) simulated by MOCAGE (nature run) and observed by ground stations over France in July and August 2009. CO from the nature run is generally higher than CO from ground stations. Some maxima are well represented (e.g. 28 and 29 July 2009), some maxima are overestimated (e.g. 10 August 2009) and some other are underestimated (e.g. 19 August 2009). However, most importantly, the CO concentrations simulated in the nature run are in the same range of values (globally between 50 and 500 µg m −3 ) as those observed by ground stations, and show similar temporal variability. O 3 concentrations simulated in the nature run are also globally overestimated compared to ground measurements. However, the diurnal cycle of production and destruction of O 3 is well represented in the nature run. The minima of O 3 in the nature run are generally overestimated, except over particular periods, where the nature run and the observations show a good agreement (e.g. from 28 July to 1 August; from 5 August to 6 August or from 16 August to 20 August). Table 5 shows the correlation, the bias and the RMS between the nature run and the ground stations over France on a hourly mean basis for O 3 and CO. The correlation coefficients are 0.76 and 0.63 for O 3 and CO, respectively. For both O 3 and CO a positive bias is observed (12 µg m −3 (∼18 %) and 19.9 µg m −3 (∼17 %), respectively). The RMS is larger for CO (59.9 µg m −3 ∼52 %) than for O 3 (18.2 µg m −3 ∼26 %) likely because CO concentrations have a great variability and can be locally very high at the surface (>1000 µg m −3 ). Despite the fact that the simulations are performed using a horizontal resolution of 0.5 • × 0.5 • , the results concerning the comparison between ozone surface observations and the nature run over France are comparable to those commonly observed in the current state-of-the art AQ forecasting. For example, Pagowski et al. (2006) computed bias, RMS and correlation of hourly concentration forecasts over the Eastern USA and Southern Canada for July and August 2004. They used seven AQ models compared to hourly surface ozone measurements over 350 sites. The bias ranges be-tween 10.6 and 62.2 µg m −3 , the RMS between 33.0 and 74.9 µg m −3 and the correlation between 0.55 and 0.72. In another study using the French AQ forecasting system Prev'air, Honoré et al. (2008) found a bias for the ozone hourly forecasts of 12.3 µg m −3 , a RMS of 28.2 µg m −3 and a correlation of 0.67. Finally, the scores found for the nature run are in the same range of values than Pagowski et al. (2006) and Honoré et al. (2008) which indicates that the nature run can be assumed to be representative of the "true atmosphere" over the European domain. Fig. 4. 1st and 3rd column: z-test where the absolute difference between GEO-TIR and the NR, and between the CR and the NR are different at the 0.95 confidence level (orange and red) and 0.99 confidence level (red) for Ozone and CO LmT column, respectively. 2nd and 4th: same as 1st and 3rd column but for the absolute difference between GEO-TIR and the NR, and between the GEO-TIR2 and the NR. The 1st row is for EXP1 (change in the meteorology), the 2nd row is for EXP2 (change in the emissions), the 3rd row is for EXP3 (change in the initial condition) and the 4th row is for EXP4 (change in the meteorology, in the emissions and in the initial condition). See text for further details. Figure 4 presents the area of Europe where differences between various experiments are significant at the 0.95 and 0.99 confidence limit for the O 3 and CO LmT columns using the two-sample hypothesis z-test (Sect. 2.6). This test assesses whether the control run and the GEO-TIR assimilation run; and the GEO-TIR2 assimilation run and the GEO-TIR assimilation run, are significantly different (with a confidence limit of 95 and 99 %). Figure 4 shows that EXP2, EXP3 and EXP4 have large areas of significance at the 0.99 confidence limit (red areas). Areas which are not significant at the 0.99 confidence limit nor at the 0.95 confidence limit are generally over sea, which is less important for AQ purposes as we are interested in highly populated areas. However, EXP1 shows less significant areas at the 0.95 confidence limit than other experiments. All the statistics presented hereafter are for a period of 2 months (July and August 2009). Statistical differences almost everywhere indicate that the set ups are very different. Our objective is to have a statistically robust evaluation of the added value of GEO-TIR synthetic observations for air quality hindcasts. However, it will be difficult to substantiate the reasons for the spatial distribution of the OSSE increments averaged over two months; indeed, over such a period, there is a combination of different conflicting effects explaining variations of the strength of the constraint brought by GEO-TIR and GEO-TIR2 synthetic observations. These can only be understood by studying cases on a day-by-day basis, which is outside the scope of this paper. For the 2nd and 3rd rows: (i) the 2nd column shows the difference between the nature run and the assimilation of GEO-TIR (EXP1c); and between the nature run and EXP1a; (ii) the 3rd column shows the difference between the nature run and EXP1c; and between the nature run and the assimilation of GEO-TIR2 (EXP1b). Red colours indicate that the assimilation of GEO-TIR improves the correlation (1st column) and reduces the absolute difference (2nd row) or the RMS (3rd row) whereas blue colours indicate a deterioration by using GEO-TIR.

Sensitivity study on meteorology: experiment 1
We performed a sensitivity study using different meteorology for the control run (EXP1a) and assimilation runs (EXP1b and EXP1c) compared to the ones used for the nature run, to determine the capability of GEO-TIR to reduce differences generated by the meteorology used in our analyses. Figure 5 shows the correlation, the bias and the RMS for the O 3 LmT column between the nature run and the control run and the improvement added by the assimilation of GEO-TIR compared to the control run and to the assimilation run for GEO-TIR2.
The correlation between the nature run and the control run for O 3 ranges between 0.5 and 0.9. The added value of GEO-TIR (red colours) is mainly over Spain, North Africa and the Atlantic Ocean where the results are significant at the 0.95 confidence limit. The assimilation of GEO-TIR increases the correlation from ∼0.7 in the control run to ∼0.8 in the GEO-TIR assimilation run, mainly over the Atlantic ocean and over Spain. Similar results are observed concerning the added value of the GEO-TIR assimilation run compared to the GEO-TIR2 assimilation run: GEO-TIR is closer to the nature run. The bias between the nature run and control run for O 3 is low (between −8 % and 8 %) and mainly negative over the Mediterranean Basin and positive over Northern Europe. The GEO-TIR assimilation run reduces the bias over the Mediterranean Basin and over the Nordic countries, which are regions with significance at the 0.99 confidence limit compared to the control run and to the GEO-TIR2 assimilation run. The RMS between the nature run and the control run is between 4 and 25 % for O 3 . The GEO-TIR assimilation run reduces globally the RMS to 5 % over sea and land areas. Figure 6 shows the same diagnostics but for the CO LmT column. The correlation between the nature run and the control run for CO ranges also between 0.5 and 0.9. The positive impact of the GEO-TIR assimilation run on the control run is bigger than for O 3 with a significant improvement of the correlation (e.g. from 0.7 between the nature run and the control run to 0.85 between the nature run and the GEO-TIR assimilation run over Spain and France, or from 0.85 between the nature run and the control run to 0.95 between the nature run and the GEO-TIR assimilation run over Turkey). The assimilation of GEO-TIR2 also improves the correlation between the nature run and GEO-TIR2 assimilation run compared to the correlation between the nature run and the control run (e.g. over the Atlantic ocean or over Turkey) but the impact of GEO-TIR is higher. The bias between the control run and the nature run for CO is low and mainly negative (∼ −3 %) except over the Po valley where the bias is high and positive (15 %). This large difference between the control run and the nature run over the Po Valley can be explained by differences in the winds since the meteorology in the nature run is significantly different to that in the control and assimilation runs. In the control run, pollutants are trapped in the Po Valley which is surrounded by the Alps whereas in the nature run, pollutants are transported by the winds. For this particular event, the GEO-TIR assimilation run reduces considerably the bias observed compared to the control run and to the GEO-TIR2 assimilation run, and does this to a lesser extent over France and Eastern Europe. The RMS between the control run and the nature run for CO is ∼7 % but can reach 25 % over the Po valley. The GEO-TIR assimilation run reduces globally the RMS observed in the control run and in the GEO-TIR2 assimilation run (∼2 %), with a particular emphasis on the Po valley where the RMS added value is ∼11 % compared to the control run and ∼7 % compared to the GEO-TIR2 assimilation run. Note that results observed over the Po valley for CO are significant at the 0.99 confidence limit.
In this experiment, we have analysed the capabilities of both instruments to correct errors in the meteorology. The resulting control run generally shows low biases for both CO and O 3 but impacts the correlation and the RMS. For this particular experiment, the GEO-TIR assimilation run improves considerably the RMS and locally the bias and the correlation.

Sensitivity study on emissions: experiment 2
In this experiment (EXP2), we use another emission inventory in the control run (EXP2a) and assimilation runs with a coarser spatio-temporal resolution than the one used in the  Figure 7 shows also the correlation, the bias and the RMS for the O 3 LmT column between the nature run and the control run and the improvement added by the assimilation of GEO-TIR (EXP2c) compared to the control run and to the assimilation run for GEO-TIR2 (EXP2b).
The correlation between the nature run and control run is very high for O 3 (>0.95), especially over sea where both inventories use the same emissions. The impact of the GEO-TIR assimilation on the correlation coefficient is relatively small compared to the control run and the GEO-TIR2 assimilation run and is located over the Eastern Mediterranean Basin where the correlation between the nature run and the control run is lower (∼0.7). However, the bias between the nature run and control run is positive and high (up to 20 %) because emissions of NO x and CO are higher in the inventory used in the control run and assimilation runs (Fig. 2). The impact of the GEO-TIR assimilation run is very high and can reduce by a factor of 2 the bias over the Mediterranean Basin both for the control run and the GEO-TIR2 assimilation run. The RMS between the nature run and control run for O 3 is very low over sea (less than 4 %), but over land it can reach 15 % (e.g. Spain, South West of France, Northern Africa). The GEO-TIR assimilation run reduces by ∼1 % the RMS compared to the control run and to the GEO-TIR2 assimilation run over Southern Europe (except over the Atlantic ocean) but locally over specific areas (e.g. over Spain), GEO-TIR can bring an improvement of 5 %. Note that the significance is at the 0.99 confidence limit almost everywhere for O 3 for this experiment (except over a small region over the Atlantic ocean, see Fig. 4). Figure 8 shows similar diagnostics to Fig. 7 but for CO. As for O 3 , the correlation coefficient between the control run and the nature run is very high which leads to a very low impact of GEO-TIR compared to the control run and to the GEO-TIR2 assimilation run. This impact can locally be slightly negative (e.g. over the Atlantic ocean). This negative impact may come from the observation errors, which are discussed in detail in Claeyman et al. (2011) for an instrument similar to GEO-TIR. As for O 3 , the bias between the control run and the nature run is very high and can reach 20 % as the inventory used in the control run and assimilation runs emitted more CO, but only locally. Over large cities (e.g. Paris, Turin, Amsterdam, Saint Petersburg, consistent with the emission map in Fig. 1), the results for CO in the LmT reflect differences between the global and the GEMS-TNO emissions inventories. The GEO-TIR assimilation run reduces the overall bias to 15 % and 10 % over the Mediterranean Basin compared to the control run and GEO-TIR2 assimilation run, respectively, but brings little improvement over these large cities where CO concentrations in the control run and GEO-TIR2 assimilation run are low. The RMS between the nature run and the control run is ∼7 % over land and very low over the Atlantic ocean, but can locally reach 20 % (e.g. South Italy, Greece). GEO-TIR improves also the RMS compared to the control run and especially over land and over the Mediterranean Basin compared to the GEO-TIR2 assimilation run. The RMS of GEO-TIR degrades over the Atlantic ocean (where significance is not at the 0.95 confidence limit) but also in South East Europe compared to GEO-TIR2 assimilation run where the RMS between the control run and the nature run is low. This can also be explained by the GEO-TIR observation errors.
In this experiment, we analyse the capability of the 2 observing systems to correct errors in the emissions. This experiment shows that GEO-TIR is able to considerably reduce the global bias observed in the control run in the LmT for both O 3 and CO and can also bring significant skill compared to GEO-TIR2.

Sensitivity study on the initial condition: experiment 3
In this experiment (EXP3), we change the initial condition every week (see Sect. 2.5) in the control run (EXP3a) and in the assimilation runs to quantify the capability of GEO-TIR (EXP3c) and GEO-TIR2 (EXP3b) to correct for these differences. Figure 9 shows that the correlation for the O 3 LmT column between the nature run and the control run ranges between 0.3 (e.g. over Atlantic Ocean or Turkey) and 0.9 (e.g. over Italy). The correlation coefficient for O 3 is lower than in previous experiments (EXP1 and EXP2) since the artificial modification of the initial condition every week brings down considerably the correlation. The GEO-TIR assimilation run improves the correlation compared to the control run and to the GEO-TIR2 assimilation run, both over land and sea. This positive impact of GEO-TIR can improve the correlation (e.g. from 0.3 between the nature run and the control run and 0.5 between the nature run and GEO-TIR2 assimilation run up to 0.8 between the nature run and the GEO-TIR assimilation run over Turkey). The bias between the control run and the nature run for O 3 is low in the Southern part of Europe and is mainly positive over the Atlantic Ocean and over Russia. The added value of GEO-TIR compared to the control run and the GEO-TIR2 assimilation run is overall low but positive (∼1 %) and is higher over Russia where the significance is at the 0.99 confidence limit (but can reach 6 % and 4 % compared to the control run and the GEO-TIR2 assimilation run, respectively). The RMS between the nature run and the control run is higher in the Northern part of Europe (∼20 %) than in the Southern part (∼7 %). The assimilation of GEO-TIR reduces the RMS by ∼2%, particularly where the RMS difference between the nature run and the control run is high ∼5 % (e.g. Northern Atlantic ocean).
The correlation between the nature run and the control run for the CO LmT column ranges between 0.3 (e.g. over Aegean Sea) and 0.9 (e.g. over France and Germany). The assimilation of GEO-TIR improves considerably the correlation compared to the control run (from ∼0.7 between the nature run and the control run to ∼0.9 between the nature run and the GEO-TIR assimilation run) over the Mediterranean Basin, where the significance is at the 0.99 confidence limit. The GEO-TIR assimilation run also improves the correlation compared to the GEO-TIR2 assimilation run especially over the Aegean Sea, Spain and North Africa. The bias and the RMS between the nature run and the control run for CO are low: ∼2 % for the bias and between 4 and 12 % for the RMS. The impact of GEO-TIR assimilation run on the bias is then positive but very low compared to the control run and the GEO-TIR2 assimilation run; and the impact on the RMS is locally high, 7 % and 6 % compared to the control run and the GEO-TIR2 assimilation run, respectively over Turkey and over Spain, and is positive but low elsewhere.
The modification of the initial condition mainly impacts the correlation for both CO and O 3 . This experiment shows that the assimilation of GEO-TIR can improve considerably the correlation coefficient over land and sea for the CO and O 3 LmT column.

Sensitivity study on the emissions, meteorology and initial condition: experiment 4
We perform a final sensitivity test by simultaneously changing the emissions, the meteorology and the initial condition (Fig. 11). The control run (EXP4a) for the O 3 LmT column is characterized by low correlation (between 0 and 0.7), high bias (∼15 % on average), and high RMS (∼17 % on average) compared to the nature run. By construction, we expect this experiment to provide results that differ the most from the nature run. The impact of the assimilation of GEO-TIR (EXP4c) is high compared to the control run and the GEO-TIR2 assimilation run (EXP4b). The added value of GEO-TIR for the correlation coefficient is positive over Europe and increases significantly the correlation coefficient (e.g. over Turkey, Germany, Atlantic Ocean). The GEO-TIR assimilation run reduces the bias by 3 % and 2 % in average but locally the impact is ∼5 % and ∼6%, compared to the control run and the GEO-TIR2 assimilation run, respectively. The RMS is considerably reduced all over all Europe up to 12 % and 10 % compared to the control run and the GEO-TIR2 assimilation run, respectively. The differences between the nature run and the control run for the CO LmT column (Fig. 12) are similar to those for O 3 : low correlation coefficient (between 0 and 0.8), high bias (∼11 % on average) and high RMS (∼11 % on average). As for O 3 , this CO experiment provides results that differ the most from the nature run, as expected. The impact of the assimilation of GEO-TIR is positive over all the Europe, where the significance is at the 0.99 confidence limit: it increases the correlation (from 0.4 between the nature run and the control run and 0.6 between the nature run and the GEO-TIR2 assimilation run up to 0.8 between the GEO-TIR assimilation run over Turkey), reduces the bias (up to 20 % and 15 % over the Po valley compared to the control run and the GEO-TIR2 assimilation run, respectively); and reduces the RMS (up to 14 % and 9 % over Turkey compared to the control run and the GEO-TIR2 assimilation run, respectively).
We have presented a statistical analysis over 2 months to characterize the added value of the two instrument configurations. The results of the 4 experiments show that the assimilation of GEO-TIR improves significantly the O 3 and CO LmT columns compared to the control run and the assimilation of GEO-TIR2. The assimilation of GEO-TIR is able to effectively constrain the O 3 and CO fields perturbed by different sources of error in air quality prognoses: meteorology, emission, initial state (Table 4).
The added value of GEO-TIR is high over land and over sea. Concerning results over land, nadir infrared measurements are well known to be sensitive to the LmT with high thermal contrast and high surface temperature (namely over land during day) (e.g. Deeter et al., 2007;Eremenko et al.,  2008; Clerbaux et al., 2009). Concerning results over sea, they suggest that via direct assimilation and/or transport of successive increments by the model, the added value of GEO-TIR also impacts the sea (e.g. vertical and horizontal transport, Foret et al., 2009).
The largest effects are mainly located over the Mediterranean Basin, where the cloud fraction is smaller and surface temperatures and thermal contrasts are high over country surrounding coastal areas. In contrast, the added value of GEO-TIR is rather limited over the North Western part of the domain (Atlantic Ocean). Due to predominant winds blowing from the West in the area, air masses are largely influenced by incoming fluxes situated outside the field of view of our simulated geostationary platforms, and the effects of assimilation are mitigated. Also, the spatial distribution of the efficiency of GEO-TIR simulated observations to bring the assimilation run statistically close to the nature run are governed to a large extent by the spatial distribution of the differences between the nature run and the different control runs: GEO-TIR can in fact better constrain fields where the nature and control runs differ most, while where nature and control runs agree, little effect from the assimilation is expected, as seen in practice.

Vertical distribution of the impact of geostationary infrared measurements in the lowermost troposphere
In Sect. 3.2, we have quantified the added value of the assimilation of GEO-TIR for four sensitivity studies on the CO and O 3 LmT column over Europe. In this section, we concentrate on the vertically resolved added value of GEO-TIR in the lower troposphere (0-5 km) compared to the control run and the GEO-TIR2 assimilation run. Figure 13 show the correlation, the absolute relative difference and the RMS between the control run and the nature run, the GEO-TIR2 assimilation run and the nature run, and the GEO-TIR assimilation run and the nature run, for the four sensitivity studies (EXP1, EXP2, EXP3 and EXP4) averaged over Europe for 2 months (July and August 2009) as a function of altitude (surface up to 5 km) for O 3 . For the O 3 correlation, the impact of the assimilation of GEO-TIR improves considerably it for EXP3 and EXP4, slightly for EXP1 but is not significant for EXP2. The vertical improvement of the correlation by the assimilation of GEO-TIR is very low at the surface, slight at 1 km, but high from 2 to 5 km, whereas the impact of GEO-TIR2 is very low for all levels between the surface and 5 km for O 3 . For the absolute relative difference and the RMS, similar conclusions can be made: the impact of GEO-TIR is highly dependent on the experiment and the altitude, and reduces the absolute relative difference and the RMS mainly for altitudes above ∼1 km whereas the impact of GEO-TIR2 is very low for O 3 . The results are highly dependent on the experiments, but the impact of the assimilation of GEO-TIR improves considerably the O 3 analyses compared to the nature run above 1 km. Note that Honoré et al. (2008) showed that the mean model absolute relative difference of daily ozone maxima was mostly under 5 µg m 3 (∼7 %), RMS was generally less than 20 µg m 3 (∼30 %) and temporal correlation was more than 0.8 on average over Western Europe compared to O 3 surface observations, which indicates that the correlation and the absolute relative difference observed between the nature run and the control runs are realistic. The RMS in the control run is underestimated which may be because in the study from Honoré et al. (2008) the average is made over land and over Western Europe, whereas in this study the average is made over Europe (including the sea where O 3 concentrations show less variability at the surface). Figure 14 shows similar results as Fig. 13 but for CO. The assimilation of GEO-TIR improves considerably the CO correlation for EXP1, EXP3 and EXP4 but has little impact on EXP2, which has already a high correlation coefficient. The positive impact of GEO-TIR is mainly situated above 1 km except for EXP4, which has a lower correlation (∼0.7); the assimilation of GEO-TIR improves the correlation at the surface. The assimilation of GEO-TIR2 CO also improves the correlation (but not at the surface) but the GEO-TIR assimilation run is closer to the nature run. The assimilation of GEO-TIR and GEO-TIR2 also reduces the absolute relative difference and the RMS, especially for EXP2 and EXP4 which show high biases, but the GEO-TIR assimilation run is closer to the nature run than the GEO-TIR2 assimilation, particularly at the surface.

Ozone evaluation at the surface
As for AQ purposes we are mainly interested by pollutant surface concentrations, we focus on the added values of both geostationary instruments on ozone surface concentrations.
We compute the percentage of good detection (GD), the percentage of correct detection above treshold (GD+) and  Fig. 13. Correlation (left), absolute relative difference in % (middle) and RMS difference in % (right) between the nature run (NR) and the control run (black); between the nature run and the assimilation run of GEO-TIR2 (red) and between the nature run and the assimilation run of GEO-TIR (green). Percentages are with respect to the nature run. The 1 st row is for EXP1 (change in the meteorology), the 2 nd row is for EXP2 (change in the emissions), the 3 rd row is for EXP3 (change in the initial condition) and the 4 rd row is for EXP4 (change in the meteorology, in the emissions and in the initial condition). 42 Fig. 13. Correlation (left panel), absolute relative difference in % (middle panel) and RMS difference in % (right panel) between the nature run (NR) and the control run (black); between the nature run and the assimilation run of GEO-TIR2 (red) and between the nature run and the assimilation run of GEO-TIR (green). Percentages are with respect to the nature run. The 1st row is for EXP1 (change in the meteorology), the 2nd row is for EXP2 (change in the emissions), the 3rd row is for EXP3 (change in the initial condition) and the 4th row is for EXP4 (change in the meteorology, in the emissions and in the initial condition).
the percentage of false alarms (FA) (see Sect. 2.6) for the control run, the GEO-TIR2 assimilation run and the GEO-TIR assimilation run for the four experiments at the surface over land for the European domain (Table 6). The observations are simulated throughtout the nature run. We select as an indicator of skill the treshold at 100 µg m −3 for the daily maximum of the 8-h running average, established by the WHO (World Health Organization, 2005) for the protection of public health. We do not compute the same scores for CO since the treshold for the protection of public health for the maximum of the 8-h running average is 10 000 µg m −3 which is seldom observed outdoors. Furthermore, CO is interesting for AQ because it is a proxy for pollutant sources and transport processes and not because of its direct impact on human health.
We have already shown that in general the added value of GEO-TIR and GEO-TIR2 for O 3 at the surface is low. However, for particular cases (high concentrations of O 3 above the threshold) the results presented in Table 6 indicate that the assimilation of geostationary instruments can help better detect high concentration events. In all cases, except EXP1 for GEO-TIR2, good detection and false alarm scores are enhanced both for GEO-TIR and GEO-TIR2. Concerning threshold-overshoot detections, results are more contrasted. Table 6. Scores for O 3 8-h running average daily maximum (percentage of good detection (GD), percentage of correct forecast above threshold (GD+), percentage of false alarm (FA)) obtained over Europe during July and August 2009 by comparing the control run to the nature run (2nd column), the assimilation run with GEOTIR2 to the nature run (3rd column) and the assimilation run with GEO-TIR to the nature run (4th column). Bold scores indicates that the assimilation run is better than the control run by more than 0.1 %, * scores indicates that one of the assimilation runs (GEO-TIR or GEO-TIR2) is better than the other one by more that 0. The GEO-TIR assimilation is better than the control run for 2 experiments (EXP1 and EXP3) and GEO-TIR2 for 1 (EXP1).
Comparing the GEO-TIR and GEO-TIR2 assimilation runs, in 1 out of 12 cases (EXP1), GEO-TIR2 is better than GEO-TIR (Table 6). One explanation could be the positive larger bias of GEO-TIR2 compared to GEO-TIR in EXP1 at surface, which enhances the possibilities to detect thresholdovershoot detections but overestimates false alarm. Finally, GEO-TIR gives better scores than both the GEO-TIR2 assimilation run and the control run in 9 out of 12 cases.

Summary and conclusions
In this paper, we perform an OSSE for geostationary infrared instruments to determine their relative added values for O 3 and CO concentrations in the lowermost troposphere (LmT; defined to be the atmosphere between the surface and 3 km) in an AQ model over Europe. The originality of this study is to use an AQ model in an OSSE to assess the impact of various key parameters (emissions, meteorology, initial condition and the 3 parameters together) on analyses derived using two infrared instruments. The first one (GEO-TIR) has an instrument configuration (SNR and SSI) dedicated to monitoring O 3 and CO in the LmT, equivalent to the MAGEAQ infrared instrument ; the second one (GEO-TIR2) has an instrument configuration (SNR and SSI) mainly dedicated to measure temperature and humidity and is similar to the MTG-IRS instrument (Clerbaux et al., 2008b;Stuhlmann et al., 2005). For both instruments we use a pixel size of 0.5 • × 0.5 • and a revisit time of one hour.
We first concentrate on the capability of GEO-TIR and GEO-TIR2 to simulate the distributions of the O 3 and CO LmT column over Europe, using statistical diagnostics averaged over 2 months (July and August 2009). The GEO-TIR assimilation runs are closer to the nature run than the GEO-TIR2 assimilation runs for almost all experiments. The pos-itive impact of GEO-TIR is highly dependent on the experiment and similar behaviour is observed for the O 3 and CO LmT columns. For experiments involving changes in emissions GEO-TIR is able to significantly reduce the systematic bias produced by excessive emissions. For experiments involving changing the initial conditions or the meteorology, GEO-TIR is also able to considerably increase the correlation coefficient with respect to the nature run and reduce the RMS in comparison to the control run. The added value of GEO-TIR impacts both over land and sea areas, but is mainly situated near the Mediterranean Basin. The different experiments also show that when the bias and the RMS are very low or the correlation very high, the GEO-TIR assimilation run has little impact and can even slightly degrade the analyses at particular locations if the control run error is very small and the observation error is big. We show that the added value of the two instruments is experiment dependent and is mainly governed by the spatial distribution of the differences between the nature run and the different control runs. Even if nadir infrared instruments are well-known to be sensitive in the LmT for high thermal contrast (mainly over land during daytime), the assimilation and the successive transport of increments by the model during 2 months bring added value of GEO-TIR and GEO-TIR2 also over the sea in the LmT.
We quantify the vertically resolved impact of both GEO-TIR and GEO-TIR2 from the surface to 5 km over Europe during 2 months (July and August 2009). For O 3 , the impact of GEO-TIR is significant (the GEO-TIR assimilation run is closer to the nature run) from 1 to 5 km whereas at the surface the impact of GEO-TIR is low. In general, the impact of the assimilation of GEO-TIR2 is very low for O 3 (GEO-TIR2 assimilation runs are very close to the control runs for all experiments). For CO, the GEO-TIR assimilation runs are mainly closer to the nature run, but the assimilation of GEO-TIR2 also has a positive impact above 1 km. However at the surface, the assimilation of GEO-TIR provides significantly more improvement than the assimilation of GEO-TIR2.
We also analyse the impact of the assimilation of GEO-TIR on O 3 AQ scores at the surface. The assimilation of GEO-TIR reduces the percentage of false alarms and increases the percentage of good detections for all experiments although improvement can be slight.
Finally, the results shown in this paper using OSSEs suggest that the assimilation of GEO-TIR into an AQ model can considerably improve the information on O 3 and CO fields in the LmT. However, the OSSE used in this study is based only on the assimilation of profiles and can certainly be improved by assimilating radiances and a much bigger observing system including ground-based stations, sondes, ballons, aircraft, low earth orbit satellites, and other observations. Such a wider study is not attainable with current supercomputing capabilities but would give a more accurate assessment of the added value of GEO-TIR. Another perspective for the GEO-TIR instrument would be to add channels in the visible (Chappuis bands) as for the MAGEAQ instrument, and to perform an OSSE for O 3 combining this new instrument with ground-based measurements. It would be very useful to perform further OSSEs to characterize how this combination of satellite and ground-based data could improve AQ monitoring and forecasting.