Validation of the Absorbing Aerosol Height Product from GOME-2 using CALIOP Aerosol Layer Information

Within the framework of aviation safety, knowledge on the location and height of volcanic ash layers is of extreme importance. Several ground based instruments (such as lidars) can provide detailed information on the height and vertical extent of these ash layers, however with a limited spatial coverage. The biggest advantage of satellite instruments is their 10 ability to have near daily global coverage which makes them the perfect candidate for locating and tracking aerosol layers around the globe. Since the Global Ozone Monitoring Experiment 2 (GOME-2) instrument is carried on the MetOp series of operational satellites, it is designed to cover a long time period from 2007 until 2022 (and beyond) and global coverage is achieved within one day. The GOME-2 Absorbing Aerosol Height (AAH) is a new product for aerosol detection, developed by the Royal Netherlands 15 Meteorological Institute (KNMI) which uses the Absorbing Aerosol Index (AAI) to detect the presence of absorbing aerosol and derives the actual height of the absorbing aerosol layer in the O2-A band using the Fast Retrieval Scheme for Clouds from the Oxygen A band (FRESCO) algorithm. The first results of a quantitative validation of the AAH product focusing on case studies of volcanic eruptions will be presented here. For a total of 15 different volcanic eruptions, GOME-2 AAH data are compared to the minimum and maximum aerosol layer height provided by Cloud-Aerosol Lidar with Orthogonal 20 Polarization (CALIOP) for pixels within 100 km distance from each other. For GOME-2A and -2B, about 50 to 60% of the AAH pixels are within the EUMETSAT threshold requirements (for layers which are located lower than 10km, the maximum absolute difference should be within 3km; for layers which are located higher than 10km, the maximum absolute difference should be within 4km), while for GOME-2C this is about 70%. The optimal requirement threshold (for layers which are located lower than 10km, the maximum absolute difference should be within 1km; for layers which are located 25 higher than 10km, the maximum absolute difference should be within 2km) is reached for GOME-2A, GOME-2B and GOME-2C in 17%, 28% and 41.5% of the cases. If only tropospheric aerosol species are studied, the results improve. This can also be seen when looking at the mean error of GOME-2. GOME-2A, GOME-2B and GOME-2C are able to represent the minimum CALIOP layer height with a mean error of -2.5 ± 5km, -1.2 ± 5.9km and -2 ± 5.8km respectively. If the stratospheric aerosol layers are removed from the data, the errors obtained are -0.2 ± 3.6km, -0.1 ± 5.4km and -0.8 ± 3.8km 30 for GOME-2A, GOME-2B and GOME-2C respectively (for the minimum CALIOP layer height). The results from two https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c © Author(s) 2020. CC BY 4.0 License.

or no radiation (SCI = SCattering Index). The advantage of this product is that it is not sensitive to surface type and that it can be defined in the presence of clouds, which is where most aerosol retrieval algorithms have problems. The aerosol types 65 most clearly seen with the AAI are desert dust, biomass burning and volcanic ash. The AAI is most sensitive to the Aerosol Optical Depth (AOD) and the aerosol layer height. Generally, thick and/or high altitude aerosol layers produce larger AAI values than thin and/or lower altitude aerosol layers (Balis et al., 2016). The AAI however does not provide information on the altitude of aerosol layers.
To answer the need for global coverage of the altitude of volcanic ash layers, KNMI developed a new GOME-2 product, the 70 Absorbing Aerosol Height (AAH). This product builds on the above described AAI product and derives the actual height of absorbing aerosol layers in the O2-A band using the Fast Retrieval Scheme for Clouds from the Oxygen A band (FRESCO; Wang et al., 2008, Wang et al., 2012 algorithm. First results of GOME-2A aerosol layer height validation were presented by Balis et al. (2016). They compared the GOME-2A layer height information with layer heights retrieved by ground based lidar measurements and showed that GOME-2A underestimates the aerosol layer height observed by the lidars. However, 75 their study was only based on the results of one volcanic eruption case (the Eyjafjallajökull eruption in 2010) and they concluded that more dedicated validation campaigns were needed.
In this work, the new Absorbing Aerosol Height product from GOME-2 is validated against CALIOP data, focusing on case studies during multiple volcanic eruptions. Only GOME-2 AAH pixels with an AAI value higher than 4 are included in the validation study to ensure that there are enough absorbing aerosols in terms of absorption intensity (Tilstra et al., 2019b). 80 The GOME-2 AAH values are compared to the CALIOP aerosol layer height for pixels located within a maximum distance of 100 km from each other. Sect. 2 describes the GOME-2 and CALIOP instrument and the method applied to validate GOME-2 AAH using CALIOP vertical layer information. The results from this validation exercise will be presented and discussed in Sect. 3 and 4.

GOME-2 instrument information
The GOME-2 instrument onboard the MetOp satellite platforms is a nadir looking and scanning UV-VIS spectrometer that measures backscattered solar light (Munro et al., 2016). The instrument measures in a spectral range from 240 to 790 nm with a spectral resolution of 0.26-0.51 nm. The MetOp satellites are flying in sun-synchronous orbits with equator crossing 90 times of approximately 09:30 local time (descending node) and a repeat cycle of 29 days. The default swath width of the GOME-2 scan is 1920 km, which gives a nadir pixel size of 80 x 40 km and enables global coverage in about 1.5 days. The current primary GOME-2B (and also GOME-2C) is operated in this mode, whereas the older GOME-2A instrument is https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License. operated in a reduced swath with a swath width of 960km and nadir ground pixel size of 40 x 40 km. GOME-2C is in orbit since the 7 th of November 2018. A more detailed description of the instrument can be found in Munro et al. (2016). 95

GOME-2 Absorbing Aerosol Height
The GOME-2 Absorbing Aerosol Height (AAH) is a new product developed by KNMI within EUMETSAT's Atmospheric Composition Satellite Application Facility (AC SAF). This product builds on a previously developed product, the AAI  and derives the actual height of the absorbing aerosol layer in the O2-A band using the FRESCO algorithm. The AAH is very sensitive to cloud contamination because aerosols and clouds can prove difficult to distinguish. 100 Therefore, the AAH is computed for different FRESCO cloud fractions. FRESCO is able to determine the height of an absorbing aerosol layer in the absence of clouds, but under certain conditions also in the presence of clouds (Wang et al., 2012). More details can be found in the Product User Manual (Tilstra et al., 2019b) and Algorithm Theoretical Basis Document (Tilstra et al., 2019a).

105
In summary, the AAH algorithm retrieves the following parameters: (1) CF: effective aerosol/cloud fraction Two different aerosol/cloud layer heights (CH and SH) are determined by the AAH algorithm. It is up to the algorithm to decide which of the two is the best candidate to represent the actual AAH. To determine whether CH or SH should be reported as the AAH, the algorithm distinguishes three situations (regimes) and the effective cloud fraction is used to check in which of these regimes the solution is likely to be found: The above scheme is based on the results of the study presented in Wang et al. (2012). Regime A refers to the situation in 115 which there is only a low degree of cloud cover or if the AOD is sufficiently large to compensate for the presence of a cloud layer below the aerosol layer. In this case the results reported in Wang et al. (2012) clearly show that CH is close to the real height of the aerosol layer in almost all cases. Exceptions are cases with low aerosol amounts, but these scenes were filtered out beforehand by demanding that the AAI must be higher than the threshold value of 4.0 index points. Regime C is the situation of a thick cloud layer present in the scene. In this case, an aerosol layer is only retrieved successfully when the 120 aerosol layer is sufficiently thick. According to the results presented in Wang et al. (2012), the best value for the AAH is that of the cloud height. In most cases, however, the AAH is severely underestimated. The reliability is therefore characterized as "low". Finally, regime B is an intermediate regime, and the best estimate is the highest value from cloud height and scene https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License.
height. The AAH found this way is likely to underestimate the AAH in some cases, and the reliability attributed to this regime is "medium". 125 Due to the use of the O2-A band in the FRESCO algorithm, the retrieval is insensitive to the signal above 15 km, hence the AAH is limited to a maximum value of 15 km (Wang et al., 2010).
The accuracy requirements for the AAH product, as defined in the Product Requirements Document (Hovila et al., 2019), can be found in Table 1. The GOME-2 AAH product is available in Near-Real time (NRT) and offline processing, from the Level-1 data generated from the GOME-2 instruments onboard the MetOp-A, MetOp-B and MetOp-C satellite platforms. 130  A suite of algorithms has been developed to identify aerosol and cloud layers and to retrieve a variety of optical and microphysical properties. The Scene Classification Algorithm (SCA) consists of a set of algorithms that perform typing of the detected layers based on layer height and layer-integrated properties. If the layer is classified as aerosol, SCA uses a decision tree to classify the aerosol type. "Type" stands for a mixture of aerosol components that is characteristic of a region 150 or an air mass. The mixture observed at a given location depends on local aerosol sources, wind trajectories and remote sources of aerosol, the state of internal and external mixing, chemical transformation processes that may have occurred during transport, and the state of hydration.
In this study, CALIOP version 4.20 Vertical Feature Mask (VFM) data are used. This version of data allows for the detection of stratospheric aerosol layers (which was not possible in previous versions; Kim et al., 2018). The stratospheric aerosol 155 subtyping algorithm performs well at identifying volcanic ash and sulfate above the tropopause (Kim et al., 2018). Note that below the tropopause, ash and sulfate plumes are given tropospheric aerosol subtypes: volcanic ash is often classified as dust or polluted dust and volcanic sulfate is often classified as elevated smoke (CALIPSO Users Guide at https://wwwcalipso.larc.nasa.gov/resources/calipso_users_guide/qs/cal_lid_l2_all_v4-20.php). As a result, contiguous aerosol features crossing the tropopause will have aerosol subtypes which switch from tropospheric to stratospheric subtypes, depending on 160 the relationship between the attenuated backscatter centroid altitude of the layer identified by the feature finder and the tropopause altitude. Weakly scattering stratospheric aerosol layers which are not classified as polar stratospheric aerosol are classified as "sulfate/other". Therefore, layers that are, in fact, ash and/or smoke could be misclassified as "sulfate/other" if they are weakly scattering (layer integrated attenuated backscatter less than 0.001 sr -1 ).
The VFM product provides the latitude and longitude of the laser footprint (at the temporal midpoint of a 15 shot average for 165 each 5 km layer of the feature classification flag data), the profile time, a day/night flag, a land/water flag and a feature classification flag. This feature classification flag is stored as a 16 bit integer and provides an assessment of (a) the feature type: e.g. cloud vs aerosol vs stratospheric layer (b) the feature subtype (see Table 3) (c) layer ice-water phase 170 (d) amount of horizontal averaging required for layer detection.
(e) type and subtype quality assurance flag In this study, only data which are defined by the CALIOP retrieval data as tropospheric or stratospheric aerosol are used and cloud layers are excluded from our analysis.   Table 4. GOME-2 AAH and CALIOP aerosol layer height were compared when the distance between the center pixel of GOME-2 and CALIOP was less than or equal to 100 km to maximize the probability of both instruments observing the same aerosol layer. There was no threshold used to limit the time difference between both satellite overpasses. Only AAH data with AAI>4 are validated to assure that there are enough absorbing aerosols in terms of absorption intensity (Tilstra et al., 2019b). Using this approach, one GOME-2 AAH pixel can be 190 compared to different CALIOP overpasses and also to different CALIOP vertical layers at the same location.

General results 195
All studied cases are analyzed together and it is determined how well GOME-2A, GOME-2B and GOME-2C perform for these specific cases by taking into account the accuracy requirements defined in Table 1. Results from this exercise are shown in Table 5 and Fig. 1. Figure 1 limits the minimum CALIOP layer height to 15 km as this represents the detection limit for the GOME-2 AAH algorithm. It must be mentioned that the datasets for the three GOME-2 instruments are not the same (i.e. different volcanic 200 cases are considered for each instrument) so caution must be taken when comparing them. Also, the dataset from GOME-2C is noticeably smaller than those for the other two instruments, due to its shorter time in orbit. For GOME-2A, -2B and -2C, the amount of compared pixel pairs is 5985, 6705 and 427 respectively. A list of used data is given in Table 4.
Overall, just about 50-60% of the AAH pixels from GOME-2A and -2B reach the threshold requirements. For GOME-2C, the number is higher, around 70%. The optimal requirement threshold is reached for GOME-2A, GOME-2B and GOME-2C 205 in 17%, 28% and 41.5% of the cases (when comparing the AAH with the minimum CALIOP layer height). If only the tropospheric aerosol species (as defined by CALIOP) are studied, the results improve. This can also be seen from Table 5 (values between brackets). The accuracy requirement analysis was also performed per aerosol type (as defined by CALIOP).
The results and discussion can be found the supplement (Tables S1, S2 and S3) https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License.  In Fig. 1 there is a cloud of points for which the CALIOP layer height is between 12-15 km and the corresponding GOME-2 220 AAH is much lower (< 5 km) (especially for GOME-2A and GOME-2B). For GOME-2A, most of these pixels (85 %) were classified as volcanic ash, sulfate or elevated smoke layers and are classified by GOME-2 as pixels with high reliability. For GOME-2B however, only 28% of the pixels were classified as stratospheric aerosol species by CALIOP but 95% of the pixels have a medium or low reliability level.
For the selected case studies, all GOME-2 and CALIOP pixels within a 100 km distance range were compared (AAH versus 225 minimum (minC) and maximum (maxC) CALIOP layer height). The results (height difference as a function of the distance) are shown in Fig. 2 for GOME-2A, GOME-2B and GOME-2C. Similarly, the height differences as a function of difference in overpass time for GOME-2A, GOME-2B and GOME-2C are presented in Fig. 3. (In both Fig. 2 and Fig. 3, results also include comparisons with CALIOP layers higher than 15 km.) From Fig. 2 and Fig. 3, it can be concluded that for all three GOME-2 instruments, there is a large spread in the difference between the AAH and the CALIOP layer heights and there is 230 no clear relation in function of the distance or time difference between overpasses. It needs to be specified that care must to https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License. be taken in comparing the three instruments as the plots are not based on the same data for each instrument. E.g. for GOME-2C, only data from the Raikoke eruption have been used.
The overall performance of the three GOME-2 instruments is shown in Table 6 and is expressed by the mean and standard deviation of the difference between GOME-2 AAH and CALIOP layer height. 235    GOME-2A, GOME-2B and GOME-2C are able to represent the minimum CALIOP layer height with a mean error of -2.5 ± 250 5 km, -1.2 ± 5.9 km and -2.0 ± 5.8 km respectively. For the maximum CALIOP layer height, the mean errors are -3.3 ± 5.1 km, -2.1 ± 5.9 km and -2.6 ± 5.9 km respectively. The high standard deviation is due to the inclusion of stratospheric aerosol species. It is clear that for the 'stratospheric' aerosol types (volcanic ash, sulfate and elevated smoke), the AAH is in most cases not able to represent the height of these layers. Layers higher than 15 km cannot be detected by GOME-2 as the FRESCO algorithm is currently not sensitive at these altitude levels. The performance for the tropospheric aerosol subtypes 255 is much better. If the stratospheric aerosol layers are removed from the data, the errors become -0.2 ± 3.6 km, -0.1 ± 5.4 km and -0.8 ± 3.8 km for GOME-2A, GOME-2B and GOME-2C respectively for the minimum CALIOP layer height and -1.0 ± 3.6 km, -1.0 ± 5.4 km and -1.4 ± 3.9 km for GOME-2A, GOME-2B and GOME-2C respectively for the maximum CALIOP layer height. Only the height of dust layers seems to be problematic. The height of the other species is approximated by GOME-2A to within ~5 km. For GOME-2B, the differences tend to be a bit higher, but as not exactly the same dataset was 260 used, it could be due to the contents of the dataset. Table 7 shows the mean and standard deviation of the difference between GOME-2 AAH and CALIOP layer height for the different reliability levels (high, medium and low reliability level; as defined in Sect. 2.1.2). This study shows that most AAH pixels in this study are classified as having medium reliability (74%, 72% and 84% of the pixels for GOME-2A, -2B and -2C respectively). For GOME-2A, the mean and standard deviation are lowest for the pixels with low reliability. A 265 possible explanation can be found in the fact that for the low reliability GOME-2A dataset, the mean height of the CALIOP aerosol layers was clearly lower (around 3 km) than for the high and medium dataset (with a mean CALIOP layer height of respectively 7 and 6 km). For GOME-2B and -2C, the performance of the high reliability pixels is better than for the other reliability levels. For each AAH pixel, the error on the AAH is also given. Figure 4 shows the AAH plus and minus this error and the minimum and maximum CALIOP layer height in function of the GOME-2 AAH for all three instruments. In the 270 plots, the height of the CALIOP layers is limited to 15 km, which is the detection limit of GOME-2 as a result of the application of the FRESCO algorithm. On average, the errors are quite small: 0.4 km, 0.4 km and 0.3 km for GOME-2A, -2B and -2C respectively. Figure 5 shows the boxplots of the differences between the AAH from each GOME-2 instrument and the minimum CALIOP layer height for the different aerosol types (as defined by CALIOP). All boxplot results need to be analyzed with caution as they are based only on specific case studies. Especially in the case of GOME-2C, only a very 275 limited amount of data was examined. However, even with only case studies, a clear difference can already be seen between the tropospheric and stratospheric aerosol species, with differences between GOME-2 AAH and CALIOP layer height clearly higher for volcanic ash, sulfate and elevated smoke. Within the tropospheric aerosol species, differences are also https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License.
obvious. Dust and polluted dust have a larger spread compared to aerosol types that typically can be found very close to the surface (e.g. clean marine). 280 Table 7: Overview of the mean difference (mean) and its standard deviation (stdev) between GOME-2 AAH and CALIOP minimum (mindif) and maximum (maxdif) layer height for the different reliability levels. Behind each reliability level, the available number of GOME-2 data points is given.

Figure 4: GOME-2 AAH plus (in grey) and minus (in black) its error and the minimum (in blue) and maximum (in red) CALIOP layer height in function of GOME-2 AAH for GOME-2A (upper left), GOME-2B (upper right) and GOME-2C (lower middle).
CALIOP pixels are only shown up to a height of 15 km, which is the detection limit of GOME-2.

Case studies
The most important findings for a few of the volcanic case studies listed in Table 4 will be presented here. The focus will be on the eruption of the Calbuco volcano in 2015 and the Sarychev Peak eruption in 2009.

Calbuco eruption 300
The Servicio Nacional de Geología y Minería reported that an eruption from Calbuco occurred on the 23 April 2015 around 01h00, which lasted six hours and generated an ash plume that rose higher than 15 km and drifted towards the N, NE and E    Figure 7 shows the observed aerosol layer height from CALIOP and the AAH detected by GOME-2A in 315 function of latitude for the 23 rd of April 2015. All CALIOP pixels from Fig. 7 were classified as volcanic ash (with the exception of one pixel classified as elevated smoke). The volcanic ash was detected by CALIOP between 13.2 and 18.6 km while the AAH detected by GOME-2A for pixels within 100 km of CALIOP pixels was between 10.5-14.5 km ( Table 8).
The time difference between the CALIOP and GOME-2A overpass was around 4 hours and the closest GOME-2A pixel was located about 26 km from a CALIOP pixel. GOME-2A was not entirely able to capture the volcanic ash layer detected by 320 CALIOP as it was located at an altitude higher than 15 km. https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License.
On the 24 th of April 2015, CALIOP detected volcanic ash at heights between 13.1-17.5 km, dust between 0.2-5.6 km, polluted dust between 0.2-5.5 km, sulfate and elevated smoke between 14.5-14.9 km (Table 8). Figure 8 shows the observed volcanic ash and dust layer height from CALIOP and the AAH detected by GOME-2A in function of latitude. The time difference between both overpasses is around 8 hours and the closest GOME-2A pixel is located ~5 km from a CALIOP 325 pixel. For this case it seems that GOME-2A does not see the volcanic species, but more likely the tropospheric dust and/or the polluted dust layer, detected by CALIOP.   Figure 9 shows the location of the overpass pixels of GOME-2B and CALIOP located within 100 km from each other for the 23 rd and 24 th of April 2015. Figure 10 shows the observed aerosol layer height from CALIOP and the AAH detected by https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License. GOME-2B in function of latitude for the 23 rd of April 2015. All CALIOP pixels from Fig. 10 were classified as volcanic ash 340 (with the exception of one pixel of elevated smoke). The AAH detected by GOME-2B was between 9.1-14.7 km (Table 9).  The time difference between the CALIOP and GOME-2B overpass was between 4 and 5 hours and the closest GOME-2B pixel is located ~40 km away from a CALIOP pixel. Again, due to the inability of GOME-2 to observe layers higher than 15 350 km, the volcanic ash layer's height was underestimated by the AAH from GOME-2B.

GOME-2B
The observed volcanic ash and dust layer height from CALIOP and the AAH detected by GOME-2B in function of latitude for the 24 th of April 2015 are shown in Fig. 11. The time difference between both overpasses is around 8 hours and the closest GOME-2B pixel is located ~8 km away from a CALIOP pixel. The exact layer heights can be found in Table 9. On https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License. this day, the high reliability AAH pixels of GOME-2B follow the height of the volcanic ash layer, whereas the medium 355 reliability AAH matches the tropospheric dust and/or the polluted dust layer heights from CALIOP (Fig. 11).

Sarychev Peak
On the 14 th of June 2009, a large eruption of the Sarychev Peak produced an ash plume that rose to an altitude of 12 km a.s.l.
an altitude of 8 km a.s.l. Data from the 14 th and 16 th of June 2009 from GOME-2A were studied. Figure 12 shows the position of the GOME-2A and CALIOP overpasses near the Sarychev Peak volcano on the 14 th and 16 th of June 2009. 370

GOME-2A
On the 14 th of June 2009, CALIOP detects clean marine, dust, polluted dust and dusty marine aerosol layers (Table 10). No volcanic species have been observed. Fig. 13 shows the observed aerosol layer height from CALIOP and the AAH detected by GOME-2A in function of latitude for the 14 th of June 2009. The GOME-2A AAH slightly follows the height of the 375 CALIOP dust and dusty marine layers (Fig. 14). The time difference between GOME-2A and CALIOP overpasses is quite large (15h) and the distance between the closest GOME-2A and CALIOP pixels is 11 km.

390
On the 16 th of June 2009, CALIOP detects dust, polluted dust, smoke, dusty marine and volcanic ash layers (Table 10). Even though a volcanic ash layer was observed by CALIOP at an altitude lower than 15 km (which should be detectable by GOME-2A), the AAH from GOME-2A does not match the height of this layer (Fig. 15). It seems that the AAH from GOME-2A agrees more with the height of the CALIOP dust and polluted dust layers ( Fig. 15 and Table 10). The time difference between GOME-2A and CALIOP overpasses is large (15 h) and the distance between the closest GOME-2A and 395 CALIOP pixels is 23 km.  The above results show that for the Sarychev Peak eruption, the GOME-2A AAH more or less follows the height of the CALIOP (polluted) dust or dusty marine layers. Even though a volcanic ash layer is present below 15 km, within the detection limits of GOME-2, the instrument does not capture this layer. The time difference between the overpasses is large, so it should be taken into account that the instruments are not looking at the same air mass. 410

Discussion
A few issues and complications encountered during the validation of GOME-2 AAH need to be addressed. First of all, it was challenging to find collocations both in space and time between GOME-2 and CALIOP overpasses. CALIOP has a very narrow footprint (100 m) and a far from global daily coverage, whereas GOME-2 has a near global and daily coverage with ground pixels with a footprint of 80x40 km 2 . Perfect collocations were therefore difficult to find, hence it was decided to set 415 a threshold of 100 km for the maximum distance between the center of a GOME-2 pixel and the CALIOP coordinates. If we allowed for a larger distance threshold, the size of the dataset would increase, however it would also become more difficult to ensure that both satellites are looking at the 'same' air mass.
Currently, no threshold is fixed for the time difference between overpasses as this would limit our dataset even more.
However, by accepting all time differences, it might be possible that GOME-2 and CALIOP are not looking at the same air 420 mass. Apart from finding collocations, not every volcanic eruption has GOME-2 and/or CALIOP overpasses within its plume and without trajectory modelling it is difficult to determine whether overpasses should observe volcanic species in their path. It was in some cases decided to look for overpasses further away from the actual volcano site, but again, it was challenging to state with absolute certainty that volcanic species should be present at that location.
Another factor limiting the dataset was that only cases with AAI higher than 4 were taken into account to ensure that the 425 amount of absorbing aerosols is high enough (as discussed in the Tilstra et al., 2019b).
Difficulties also arose from the aerosol type classification used by CALIOP, which is partly based on the position of the layers in the atmosphere. CALIOP distinguishes between tropospheric (clean marine, dust, polluted continental, clean continental, dust, smoke and dusty marine types) and stratospheric (PSC aerosol, volcanic ash, sulfate and elevated smoke) aerosol layers. It is known that due to this distinction based on altitude, volcanic aerosol types in the troposphere are 430 sometimes misclassified as dust or polluted dust. As a consequence, not only the height of volcanic ash layers needs to be taken into account, but also the height of dust and polluted dust layers, while being not completely sure whether the dust layers are actually misclassified or not.
The performance of the AAH algorithm in representing the general aerosol layer height is far from optimal, as shown by the results of the requirement analysis. The target threshold is only reached in 39%, 45% and 53% of the cases for GOME-2A, -435 2B and -2C respectively. The algorithm performs better in the troposphere, where the percentages increase to 52%, 47% and 57% and the mean errors improve (to -0.2 ± 3.6 km, -0.1 ± 5.4 km and -0.8 ± 3.8 km for GOME-2A, -2B and -2C respectively for the minimum CALIOP layer height). It is shown that the algorithm performs better in retrieving the aerosol layer height for specific aerosol types (mainly the marine and continental aerosols). However, for the species that are of interest in this study (i.e. volcanic ash in the stratosphere and (polluted) dust in the troposphere), GOME-2 AAH shows the 440 biggest deviation to CALIOP aerosol layer heights. This indicates that GOME-2 is more sensitive to the signal of certain aerosol species.
It was already shown in Balis et al. (2016) that GOME-2A seems to strongly underestimate the ground based values. Here it was stated that it is highly likely that the large GOME-2 pixel size smooths out any small scale variability in the plume height, which can be observed by the narrow measurement of CALIOP. Michailidis et al. (2020) validated the GOME-2 445 AAH by comparison with the aerosol layer height obtained from EARLINET stations and found a mean bias of -0.18±1.68 km for 172 screened collocations. This bias is smaller than the one we obtained. But we need to take into account first that the validation strategy is different. Michailidis et al. (2020) used ground based measurements for their validation, whereas we compare with satellite retrievals. Also, Michailidis et al. (2020) included AAH values for AAI between 2 and 4 in their study, while this study is limited to AAI higher than 4. The allowed time and distance threshold is also different for both 450 studies (100 km and no time limit (this study) vs 150 km distance and a 5 hour time limit (Michailidis et al., 2020)). This study focusses on the height of volcanic ash layers, whereas Michailidis et al. (2020) was focusing more on the general agreement in layer height for all aerosol types. So both studies have a different focus on the available dataset.
This study was based on a selection of case studies, representing volcanic events with a clear AAI signal (>4), which are 455 important for aviation safety. The statistics presented in Sect. 3 are only valid for this dataset and cannot be extrapolated to the entire GOME-2 AAH dataset. At the moment, the product should be used carefully when assessing the height of volcanic layers and interpretations should only be made on a qualitative scale.

460
Within the framework of aviation safety, it is important to know the height of the volcanic ash layers. The GOME-2 AAH product was developed to provide a near global image of the height of absorbing aerosol layers. We presented the results of a validation exercise in which the GOME-2 AAH is validated using the aerosol layer height provided by CALIOP for a series of 15 confirmed volcanic eruptions. It is important to mention that only GOME-2 pixels for which the AAI was higher than 4 were taken into account. Also, a maximum difference of 100 km between the GOME-2 center pixel and the CALIOP 465 overpass was allowed. No threshold was defined for the time difference between GOME-2 and CALIOP overpasses.
Overall, GOME-2A, GOME-2B and GOME-2C are able to represent the minimum CALIOP layer height with a mean error of -2.5 ± 5 km, -1.2 ± 5.9 km and -2 ± 5.8 km respectively. For the maximum CALIOP layer height, the mean errors are -3.3 ± 5.1 km, -2.1 ± 5.9 km and -2.6 ± 5.9 km respectively. The high standard deviation is due to the inclusion of stratospheric 470 aerosol species. If these stratospheric aerosol types are removed from the dataset, the errors become -0.2 ± 3.6 km, -0.1 ± 5.4 km and -0.8 ± 3.8 km for GOME-2A, GOME-2B and GOME-2C respectively for the minimum CALIOP layer height and -1.0 ± 3.6 km, -1.0 ± 5.4 km and -1.4 ± 3.9 km for GOME-2A, GOME-2B and GOME-2C respectively for the maximum CALIOP layer height. In the GOME-2 AAH product, reliability flags are used to define the confidence level of the AAH. It would be expected that the high reliability AAH pixels have a better agreement with the CALIOP layer height, however, this 475 was not always the case (e.g. for GOME-2A), which can be related to the difference in observed cases for the three sensors.
Some more conclusions could be drawn from looking at the volcanic case studies individually. It is obvious that the AAH product from GOME-2 does not work for elevated volcanic ash layers at altitudes higher than 15 km, due to the fact that the FRESCO algorithm using the O2-A band is currently not sensitive for the signal at these altitudes. The Calbuco volcanic ash 480 layer observed by CALIOP on the 23 rd of April 2015, at altitudes above 15 km, could therefore not be captured by GOME-2A. On the other hand, GOME-2B was able to see, to some extent, the lower part of the absorbing volcanic aerosol layer on the 24 th of April 2015. In the Calbuco case study, the AAH from GOME-2A pixels agreed with the height of layers classified by CALIOP as dust or polluted dust on the 24 th of April 2015. It remains difficult to distinguish if these layers were volcanic aerosol layers, actually misclassified by CALIOP, or if they were indeed dust or polluted dust. The study using the Sarychev 485 Peak eruption data showed that, even though volcanic aerosols were present at altitudes below 15 km, GOME-2A was not able to pick up their signal.
When different types of aerosol layers are present, the AAH often coincides with one of the CALIOP species, but in most cases not with the layer classified as volcanic ash (as shown in Table 6 and Tables S1-S3 of the supplement). GOME-2 is in some cases able to nicely capture the dust layer, but not always. The GOME-2A, -2B and -2C AAH pixels fall in the optimal 490 threshold category in about 20%, 25% and 46% of all dust cases respectively. In conclusion, the GOME-2 AAH often https://doi.org/10.5194/amt-2020-425 Preprint. Discussion started: 21 December 2020 c Author(s) 2020. CC BY 4.0 License.
underestimates the height of volcanic layers and as a result, the current product should be considered with care when using it for aviation safety purposes. Nevertheless, taking these uncertainties into account, the product can be considered as an important added value for near-real time monitoring of volcanic ash layers.