Automated detection of atmospheric NO2 plumes from satellite data: a tool to help infer anthropogenic combustion emissions

We use a convolutional neural network (CNN) to identify plumes of nitrogen dioxide (NO2), a tracer of combustion, from NO2 column data collected by the TROPOspheric Monitoring Instrument (TROPOMI). This approach allows us to exploit efficiently the growing volume of satellite data available to characterise Earth’s climate. For the purposes of demonstration, we focus on data collected between July 2018 and June 2020. We train the deep learning model using six thousand 28×28-pixel images of TROPOMI data (corresponding to ≃266×133 km) and find that the model can identify plumes with a success 5 rate of more than 90%. Over our study period, we find over 310,000 individual NO2 plumes of which ≃19% are found over mainland China. We have attempted to remove the influence of open biomass burning using correlative high-resolution thermal infrared data from the Visible Infrared Imaging Radiometer Suite (VIIRS). We relate the remaining NO2 plumes to large urban centres, oil and gas production, and major power plants. We find no correlation between NO2 plumes and the location of natural gas flaring. We also find persistent NO2 plumes from regions where inventories do not currently include emissions. 10 Using an established anthropogenic CO2 emission inventory, we find that our NO2 plume distribution captures 92% of total CO2 emissions, with the remaining 8% mostly due to a large number of small sources <0.2 gC/m/day for which our NO2 plume model is less sensitive. We argue the underlying CNN approach could form the basis of a Bayesian framework to estimate anthropogenic combustion emissions.


15
The Paris Agreement (PA) is the current inter-government vehicle that describes a progressive reduction in greenhouse gas (GHG) emissions to mitigate dangerous climate change, described as a larger than two-degree Celsius increase in global mean temperature above pre-industrial values. Whether it will achieve its stated goals depends on commitments of its signatories to establish and more importantly realise stringent plans to reduce effectively national GHG emissions. The PA includes two main activities: quinquennial Global Stocktakes (GST) and Nationally Determined Contributions (NDC) that describe 20 pledged emission reductions during successive GSTs. Given the implications of non-compliance and the need to make large and rapid emission reductions, measurement, reporting and verification (MRV) systems are being developed that will help guide nations on the effectiveness of policies (Janssens-Maenhout et al., 2020). The main focus on these MRV systems are anthropogenic emissions of carbon dioxide (CO 2 ) and methane. One of the challenges faced by all these MRV systems is 1 separating the anthropogenic and natural components of CO 2 and methane fluxes. Here, we use a deep learning model to 25 identify automatically satellite-observed plumes of nitrogen dioxide (NO 2 ), a proxy for combustion, to locate combustion hotspots, e.g. oil and gas industry, cities, and powerplants.
Burning of fossil fuels, representing emissions of 9-10 PgC/yr (Friedlingstein et al., 2020), has been shown unequivocally to impact Earth's climate via rising atmospheric levels of gases such as CO 2 and methane that can absorb and radiate infrared radiation. The distribution of these emissions is heterogeneous across the globe, disproportionately focused on cities, oil 30 and gas extraction facilities, energy generation facilities, and flows of physical trade that rely heavily on shipping and road transportation (Poore and Nemecek, 2018). Compiled inventories, which rely on self-reporting, provide estimates on these emissions but rely on assumptions such as fuel consumption, combustion efficiencies and emission rates that can sometimes lead to inaccurate values. Cities are responsible for almost three quarters of the fossil fuel contribution to atmospheric CO 2 (Edenhofer et al., 2014), but questions remain about the veracity of reported emissions (e.g., Gurney et al. (2021)) and the 35 disproportionate role of a small number of super-emitters (e.g., Duren et al. (2019)). We know where most power plants are geographically located but new and large coal-fired power plants continue to be built and commissioned in countries such as China and India, potentially compromising their short-term climate ambitions within the PA. The rate of their construction often outpaces updates to inventory estimates. International shipping only represents a few percent of global CO 2 emissions but they appear to be going up (International Marine Organization, 2020). The importance of accurate emission estimates becomes 40 even more prevalent at smaller geographical and temporal scales. Reported annual country level emissions of CO 2 tend to be reasonably accurate, but are typically not sufficiently detailed to support targeted policy development. Given the importance of establishing accurate national and sub-national emission baselines from which to reduce emissions as part of the PA, it is essential we have a robust measurement-based approach to estimate emissions of CO 2 and methane to complement inventory estimates.

45
A growing body of work has been using satellite observations to study point sources of CO 2 (Bovensmann et al., 2010;Kort et al., 2012;Hakkarainen et al., 2016;Nassar et al., 2017;Broquet et al., 2018;Brunner et al., 2019;Kuhlmann et al., 2019;Zheng et al., 2019;Wang et al., 2019;Kuhlmann et al., 2020;Strandgren et al., 2020;Wang et al., 2020;Wu et al., 2020;Yang et al., 2020;Ye et al., 2020;Zheng et al., 2020) and methane (Varon et al., 2019;de Gouw et al., 2020;Varon et al., 2021), taking advantage of global measurement coverage, subject to clear skies. Even with the 0.3% precision of CO 2 50 columns detected by the NASA Orbiting Carbon Observatory-2 instrument, dilution of point source emissions across a 3 km 2 grid box could potentially result in the directly overhead column being elevated but not elevate the measurements immediately downwind except under exceptional circumstances. Other studies have recognized this shortcoming and have taken advantage of trace gases that are co-emitted with CO 2 and methane during the combustion process. For many industrial combustion processes, air provides the source of molecular oxygen necessary for the fuel to burn. While molecular nitrogen (N 2 ) in the air 55 does not take part in the combustion reaction, the temperatures involved can thermally dissociate N 2 to facilitate the production of NO (and to a lesser extent NO 2 ). In the absence of widespread use of scrubbers that remove nitrogen oxides from combustion exhaust and with the subsequent influence of photochemistry that rapidly interconverts NO and NO 2 , NO 2 is widely assumed to be a robust proxy for combustion CO 2 (Reuter et al., 2019;Liu et al., 2020;Hakkarainen et al., 2021;Ialongo et al., 2021).

2
The main advantage of using NO 2 as a tracer of combustion is its atmospheric e-folding lifetime, which ranges from hours to a 60 day in the lower troposphere. Consequently, any major surface emissions will result in an observable plume close to the point of emission.
All of these studies represent case studies or a small number of case studies, reflecting the difficulty of locating CO 2 plumes and coincident measurements of NO 2 . This piecemeal approach is inconsistent with the vast volume of data being produced by the current generation of satellite instruments, in particular the TROPOspheric Monitoring Instrument (TROPOMI), and 65 limits our ability to quantify the changing influence of CO 2 hotspots on the global carbon cycle. Here, we address this issue by using a deep learning algorithm to detect automatically NO 2 plumes. This work builds on earlier remote sensing image detection studies that use machine learning, e.g. Lary et al. (2016); Maxwell et al. (2018). As we show, the number of plumes found in any one year is O(10 5 ), allowing us to study more systematically how NO 2 can be used to study combustion emission of carbon. Although the NO 2 plume detection algorithm does not quantify anthropogenic emissions of CO 2 or methane, it 70 provides a method to refine the development of future MRV systems which can directly feed into policy decisions.
In section 2 we discuss the TROPOMI NO 2 and thermal anomaly data that we use to identify anthropogenic plumes of NO 2 .
We also describe the deep learning method we use, including our approach to supervised learning, which underpins our ability to detect automatically NO 2 plumes. In section 3 we report the performance of our NO 2 plume detection method, and use the ensemble of plumes to assess how well it detects CO 2 emissions described by an established inventory. We conclude the paper 75 in section 4, including a discussion of next steps.

Data and Methods
We describe the TROPOMI retrieved data of NO 2 columns that we use to study combustion, and VIIRS biomass burning data we use to isolate the influence of fossil fuel combustion. We also describe the development of our deep learning model to detect NO 2 plumes. We use level 2 retrieved tropospheric column NO 2 data retrieved from the TROPOspheric Monitoring Instrument (TROPOMI), launched in 2017. We use two years of NO 2 column data from July 2018 to June 2020. These data are taken from the Sentinel-5P Pre-Operations Data Hub (https://s5phub.copernicus.eu/dhus/). For further information about these level 2 data products we 85 refer the reader to studies dedicated to NO 2 (Boersma et al., 2010;Van Geffen et al., 2015;Lorente et al., 2017;Zara et al., 2018).
TROPOMI is a UV-Vis-NIR-SWIR spectrometer aboard the Copernicus Sentinel-5 Precursor (S5-P) satellite, which is in a Sun-synchronous orbit with a local equatorial overpass time of 13:30. TROPOMI has a swath width of 2600 km divided into 450 across-track pixels for which during our study period have dimensions of 7 km×3.5 km (across×along track) for NO 2 .

90
This sampling strategy results in near-daily global coverage (Veefkind et al., 2012), subject to cloud-free scenes. In this study, we only use pixels with a quality flag >0.75, as recommended by the TROPOMI Level 2 Product User Manuals.

VIIRS Thermal Anomaly Data
We use thermal anomaly data from the Visible Infrared Imaging Radiometer Suite (VIIRS) on board the Suomi National Polarorbiting Partnership (NPP) satellite, launched in 2011 as a proxy to identify NO 2 plumes from biomass burning. We use the 375 95 meter Level-2 VNP14 product from https://firms.modaps.eosdis.nasa.gov/download/. VIIRS provides near twice-daily global coverage at a spatial resolution of 750 m. During the study period we found 16,056,612 vegetation fires spotted by VIIRS, after discarding low confidence data.
We attribute an NO 2 plume to biomass burning if it is within 15 km of a biomass burning scene identified by VIIRS. We chose that distance criterion because it corresponds to approximately two TROPOMI pixels and should account for any offset 100 error in determining the plume centre. We find that a 5-10 km adjustment to this criterion does not significantly affect our results. Development of a more sophisticated method, taking account of other trace gas measurements, is outside the scope of this study.
For the purposes of this study, we discard biomass burning scenes to focus on anthropogenic combustion source but we acknowledge that the converse to this approach is also scientifically valid.

Deep Learning Model to Identify NO 2 Plumes
To automatically detect plumes of NO 2 from TROPOMI data, we used a convolutional neural network (CNN) based on a deep learning model that contains four convolutional and two fully connected (FC) layers.
CNNs first use a series of convolutional layers, each with multiple filters which extract features (e.g., lines, orientation, clustering) from small sections of the input image. Each layer has an increasing number of filters and finds higher levels of 110 features (progressively incomprehensible to humans). Maximum pooling layers are added between convolutional layers to reduce the spatial size of the convolved feature, reducing the computational power required. This is achieved by passing a 2×2 pixel kernel over the image and extracting the maximum value, helping extract dominant features. After the convolution, the data are passed to multiple FC layers that learn which features are important in categorising the image. The final FC layer is the output layer which returns a categorisation of the input image along with a confidence in the result.
115 Figure 1 shows a simplified schematic of the CNN architecture we use to create our plume identification model. The input image is first passed through two convolutional layers with 32 and 64 filters, respectively, followed by a maximum pooling layer. We then randomly drop 50% of the layers from the model, which helps to prevent over fitting of the data (as recommended by Srivastava et al. (2014)) . The remaining data is then passed through two more convolutional layers of 128 and 256 filters, respectively, and another maximum pooling layer. This is followed by dropping another 50% of the layers and flattening the 120 array to one dimension to be fed into the FC layer that contains 512 nodes. Each CNN layer is then passed into a Rectified Linear Unit (ReLU) activation function before going into next layer. The last FC is passed into a softmax function to calculate the probabilities that the image contains a plume or not. The optimiser used here is an AdamOptimizer, which helps to reduce the cost calculated by cross entropy. The model has a total of 4,892,770 trainable parameters.

125
To train our CNN model we use example images of TROPOMI NO 2 that can be classified as containing a plume or not. Each image is 28×28 pixels (approximately corresponding to 200 km×100 km) and were individually normalised to remove the influence of the magnitude of NO 2 features, a step that also ensures the model parameters have a similar data distribution and therefore improves the model efficiency and accuracy. We acknowledge that normalising each individual image could potentially lead to false detection if the background noise resembles a plume; the alternative of normalising the images to 130 a standard value decreases the models ability to detect smaller emission sources and may lead to a larger number of false negatives. Figure 2 shows three example images from the level 2 TROPOMI NO 2 data considered to contain plumes used in the training dataset. For an emission source to create a plume detectable by TROPOMI, the source must be subject to winds strong enough to disperse the emissions across multiple pixels within the lifetime of NO 2 . We anticipate that the number of occurrences where these conditions are not met will be relatively small compared to the entire dataset and therefore should not 135 have an adverse effect on our results.
Determining whether an image contains a plume or not is a non-trivial task that is subject to human judgement and is consequently prone to error. Plumes are highly variable in both size and shape, can potentially be obscured by other features in the image, or, in some instances, multiple plumes can be found within a single image. In the first instance, we used a crowdsourcing approach in which we posted images on https://plume-spotter.herokuapp.com and asked participants to determine 140 whether an image contained a plume. A total of 41 participants classified 1565 unique images and created 13,750 classifications, a mean of 8.8 classifications per image. This is further described in Appendix A. However, we found that this approach did  Due to the lack of agreement in our crowd-sourced approach to plume identification, we created a dataset for this study based on the authors' judgement. Subjective judgement of the images could lead to small variations in repeated experiments and therefore a more rigorous approach may be needed for future applications. We selected a total of 6086 images (3043 of which contained a plume) from across the globe for all times of year to minimize regional and seasonal biases. We used an iterative process to select images to train the model. We started with an initial set of images, randomly selected images that 150 contained at least one plume, corrected the classification if necessary, ensuring an equal number of true and false images were included in the training set. The images were then randomly split in a 80:20 ratio to train the CNN model and test the trained model. We find that the resulting CNN model achieves an accuracy of >90% when compared against the test data.
Using the developed plume identification model, we processed two years (July 2018-June 2020) of TROPOMI tropospheric NO 2 data, resulting in 18 million individual 28×28 pixels images. Prior to running the model, we discarded images that 155 included >40% invalid pixels, i.e., data that did not match the TROPOMI quality threshold as described above; this quality control step reduced the number of processed images to approximately 7.2 million. We then passed these images to our CNN model, which returned a Boolean variable that describes whether a plume was identified and an associated confidence level associated with the identification. We discard images for which the confidence threshold <75%. We find that our results are moderately sensitive to this value, with an approximate 10% change in the number of plumes found when changing the 160 confidence threshold by ±15%. This confidence threshold can be adjusted to increase the number of identified plumes but at the expense of the confidence of the plumes being reported. For each image in which a plume was identified, we extract the geographical coordinates of the plume by identifying the image pixel with the maximum value. We acknowledge this method could lead to inaccuracies as the maximum pixel value in the image will not necessarily correspond to the origin of the plume 6 and may not identify all plumes (e.g., images that contain multiple plumes), but we consider this to be a minor source of error.

165
The area of one TROPOMI NO 2 pixel is approximately 24 km 2 so the plume origin could easily fall within this area. Each image has an associated timestamp from the satellite allowing us to build a dataset of the location and time of plumes spotted by TROPOMI.

Results
First, we assess the performance of the CNN model to identify plumes on global and regional spatial scales. We then use the 170 locations of these plumes to study their ability to identify anthropogenic combustion sources of CO 2 .

CNN Model Performance
Over our two year study period, the CNN model identified 310,020 images that contained at least one plume. After extracting the geographical locations for each plume location, we identified 62,040 (20%) images that were within 15 km of an active fire as determined by VIIRS thermal anomaly data and categorise these NO 2 plumes as being associated with biomass burning. We 175 assign the remaining 247,980 NO 2 plumes as originating from anthropogenic combustion. Figure 3 shows the location of NO 2 plumes from fossil fuel and biomass burning over our study period. We find anthropogenic combustion is widespread across the globe (Figure 3a) with a focus over northern mid-latitudes, India and China, as expected.

Global Scale Plume Distributions
We also find coherent distributions of NO 2 plumes over the ocean along established shipping routes. On the global scale, this is 180 particularly noticeable in the Bay of Bengal between southern India and Southeast Asia, and between the Cape of Good Hope and north-west Africa and eastern Brazil. Shipping lanes are clearer on the regional scales we report below.
The cluster of plumes over northern Alaska (Figure 3a) is an excellent example of a geographic region where NO 2 emissions are dwarfed compared to other point sources on a global scale so will not typically appear as a hot spot using other detection methods. We believe these are genuine detections, which we link to petroleum extraction activities in the National Petroleum 185 Reserve-Alaska in the Alaska North Slope region.
The distribution of biomass burning NO 2 plumes (Figure 3b) identified using the CNN model and VIIRS data highlight geographical regions where we expect seasonal fire activity, with a high density of plumes over western, central and eastern Africa, Colombia, Venezuela, Brazil and over Australia. We account for the seasonal variation of fire activity by using daily VIIRS data, and remove fire-influenced scenes from those identified by our CNN. 190 We acknowledge that a number of plumes that we classify as anthropogenic combustion occur in locations where we expect biomass burning, e.g., central Australia, various regions across the tropics, Siberia, and North America. We also acknowledge that anthropogenic plumes could be incorrectly labelled as biomass burning, especially where these emission types are colocated. While this suggest that our use of VIIRS is imperfect we find that our approach broadly achieves its goal.  Figure 4 shows anthropogenic combustion plumes we identify from July 2018 to June 2020 over Europe, contiguous US and southern Canada, China, and the Middle East. We have broadly classified these hotspots as major urban areas, power stations, and flaring regions. We identify major urban areas (populations >200,000) based on data from https://www.naturalearthdata.

com/downloads/10m-cultural-vectors/10m-populated-places/, fossil fuel power stations taken from the Global Power Plant
Database (KTH Royal Institute of Technology, 2018), and oil and gas flaring regions based on data from https://skytruth.org/ 200 flaring/. We acknowledge that the power station database will be incomplete due to data availability and reliability across the globe (Byers et al., 2019). The location of oil and gas flaring used here, determined by nighttime thermal anomaly data from VIIRS, is clustered spatially and temporally and therefore may not coincide with the TROPOMI local overpass time of 13:30.
We find the highest density of plumes are found over large cities, e.g., Paris, Madrid, Riyadh, Beijing, Los Angeles and New York, and over busy ports such as Rotterdam, Porto, Cairo and Hong Kong (4a, b, c). Ship tracks are clearly seen through the  (Figure 4b). The poor correspondence between flaring regions and NO 2 plumes may be due to differences in the overpass times of the data used, as discussed above.

210
In general, the location of the NO 2 plumes and the coincidence with cities, power plants, and established shipping routes provides us with confidence of the CNN model we have developed. Discrepancies between known sources and the large areas of clustered NO 2 plumes, especially over China and India, and power plants that do not have any associated plumes suggest that inventories being used to identify power plants are out of date. Further discrepancies may be due to detecting sources outwith the inventories used in this analysis (e.g. small settlements with large industrial emissions). Achieving this level of 215 detail using conventional plume detection methods would be difficult. Table 1 shows the top ten countries with the most fossil fuel plumes identified over the two year study period. China contains the most plumes, representing 20% of all the plumes found during our study period. These plumes are mainly located around the highly urbanised and heavily industrial east of China (Figure 4c), encompassing Beijing, Hebei and Shenyang in the north east. India is a close second with 17% where most plumes are over New Delhi, Mundra Port in the north-western Gurjurat 220 region, and large coal mining areas to the north-east of the country (Figure 3a). Russia is responsible for 12% of plumes, spread over multiple cities and fossil fuel extraction works across the west of the country. The Middle East, including Iran, Saudi Arabia, and Iraq, are collectively responsible for more than 26% of plumes. These plumes are mostly coincident with known regions of petroleum extraction and processing. Values over eastern Egypt appear to follow the Nile and abruptly stop before Sudan. Plumes over the US mainly coincide with major urban areas and flaring regions, with clusters found over some 225 of the major oil and gas extraction sites, e.g., San Juan Basin, Permian Basin, Niobrara Formation, and Bakken Formation.
There is also some evidence of oil and gas extraction over Mexico, e.g. Burro Picachos and Sabinas, and over Kazakhstan, e.g.
the Aktobe oil fields.  We acknowledge that statistics reported here will reflect the number of cloud-free days over specific regions. The frequency of global plume detections does change every month but does not show any seasonal cycle (not shown), even though there is a 230 seasonal cycle of plumes detections at high latitudes due to low sun angles during winter. Over our study period, the monthly mean number of fossil fuel NO 2 plumes is 11,100 and biomass burning NO 2 plumes is 2,787. The largest number of fossil fuel plumes and biomass burning plumes were found during March 2019 and August 2018, respectively. Persistence of plume detection locations (Figure 4) provide confidence that we are observing point sources. We find a total of 21,802 plumes detected over the oceans, mostly focused along ships tracks.
235 Table 2 shows the 20 cities across the globe with the most fossil fuel plumes identified over our two year study period. As previously discussed, cities that are likely to have more high quality (cloud-free) retrievals are more likely to have plumes 3.2 What Fraction of Anthropogenic CO 2 Emissions are Identified Using NO 2 Plumes?
The low frequency of corresponding TROPOMI NO 2 measurements and satellite observations of CO 2 precludes any meaning-245 ful statistical analysis of CO 2 :NO 2 (not shown). We anticipate this will improve with the launch of new satellites, particularly with the Copernicus CO 2 constellation (CO2M) due for launch in 2025 and the Japanese Global Observing SATellite for Greenhouse gases and Water cycle (GOSAT-GW) due for launch in 2023.
To help us understand the fraction of global anthropogenic CO 2 emissions that are identified using our plume identification model, we sample the Open-source Data Inventory for Anthropogenic CO 2 (ODIAC, 2020 release, (Oda and Maksyutov, 2011;250 Oda et al., 2018;Oda and Maksyutov, 2021)) where there is an NO 2 plume. We use the monthly 1 • × 1 • ODIAC gridded land CO 2 emissions dataset for 2018 and 2019, and in the absence of 2020 data we use the 2019 ODIAC emissions for January-June 2020 to compare against the plume dataset. We do not anticipate that the COVID-19 related lockdowns of 2020 will significantly impact our results as the reduction in CO 2 emissions were less than expected (Tollefson, 2021). We sample the ODIAC dataset between -50 • -50 • north to remove the impact of fewer observations during winter months. For this comparison, 255 we assume that all anthropogenic combustion sources of CO 2 in the ODIAC dataset co-emit NO 2 and therefore can be used as geographical validation for the plume detection dataset.
We sample the ODIAC dataset at the location and month of each NO 2 plume identified using our model. Figure 5 shows the cumulative percentage of total emissions and the corresponding emission rate described as a function of the percentage of sources (from large to small) for all the ODIAC dataset and the ODIAC dataset sampled at the NO 2 plume locations. We 260 find that for ODIAC emissions, large sources (>1 gC/m 2 /day) account for approximately 25% of all sources but contribute approximately 90% to the total emissions. The ODIAC emissions sampled by the NO 2 plumes accounted for 92% of all global CO 2 emissions, described by 56% of all sources. The remaining 8% of emissions, described by 44% of all sources, typically have emissions <0.18 gC/m 2 /day. This suggests that our method of identifying NO 2 plumes is biased towards the largest end of the emission spectrum and is less sensitive to the smallest emissions. This limit of detection does not lead to a large discrepancy 265 in the total emissions being sampled by the NO 2 plumes, reflecting the disproportionate role of large emission sources on the total emission budget. Figure 6 shows the ODIAC emissions where no NO 2 plumes were detected. Out of these undetected sources, 95% have emission rates <0.18 gC/m 2 /day and only nine locations have an emission rate >1 gC/m 2 /day denoted by the green circles. Figure 5 also shows that 10% of our plumes do not correspond to ODIAC CO 2 emissions. This is due mostly to plumes over the ocean associated with ship tracks (Figures 3 and 4), but there will be instances where fires have not been removed using 275 our VIIRS criterion (described above) and possibly false detections. Here, we also consider the possibility that the emission inventory is incomplete for some reason. with these hotspots is outside the scope of this paper, we hypothesize based on satellite imagery from Google Maps that these are regions of fossil fuel extraction and processing (coal in China and oil and gas in Mali, Saudi Arabia and Iraq). Having the ability to detect these plumes automatically provide a method of frequently updating emission inventories. Although these clusters of plumes could be persistent errors from highly reflective features such as salt lakes and solar panels, it is unlikely that they would appear as plume shaped anomalies and therefore are less likely to be picked up by the CNN model. As well as 285 errors in the TROPOMI retrieval leading to false detections in the final dataset, errors may also occur during the creation of the model (e.g. mislabelled training data). A single plume data point may not represent a real-life plume and should be considered in context of other data (e.g. frequent recurrence, land use, proximity to other sources). Further refinement of the training dataset, model parameters and data analysis stages will reduce the number of false detections and feedback to the TROPOMI community could help reduce the number of retrieval errors.

Discussion and Conclusions
We have developed a convolutional neural network (CNN) to identify plumes of atmospheric nitrogen dioxide (NO 2 ), a tracer of combustion. We have trained the model using a small subset of available images from the TROPOspheric Monitoring Instrument, aboard Sentinel-5P. The resulting CNN, capable of identifying plumes with a success rate >90%, reveals a rich distribution of plumes across the globe, which correspond to large city centres, power plants, oil and gas production, and 295 shipping routes. Many of these features would be difficult to isolate without the use of a deep learning model.
The impetus for our study is using NO 2 as a tracer for anthropogenic emissions of CO 2 and methane from combustion.
We aim to demonstrate the potential of this method to exploit NO 2 observations in conjunction with other relevant data and known relationships with gases such as CO 2 and methane to improve emission estimates. The main advantage of using NO 2 is its comparatively short atmospheric lifetime, allowing to relate elevated values to local emissions. We have attempted to 300 remove biomass burning using thermal anomaly data, which is often used to locate open biomass burning. This is not a perfect method, but our results suggest it works reasonably well. To evaluate our ability to observe anthropogenic emissions of CO 2 we have used the Open-Data Inventory for Anthropogenic Carbon dioxide (ODIAC) (Oda et al., 2018), an established emission inventory used widely by the community. We have chosen this approach because we found the number of coincident measurements of TROPOMI NO 2 and OCO-2/GOSAT CO 2 was not sufficient to generate meaningful statistics (not shown).

305
By sampling ODIAC at the location of NO 2 plumes, we find that the CNN model describes 92% of global anthropogenic CO 2 emissions. The remaining 8% of emissions, mostly <0.2 gC/m 2 /day, provide an effective limit of detection for our method.
Our use of NO 2 to describe anthropogenic emissions of CO 2 and methane relies on them being co-emitted. We find no evidence in the literature of NO x scrubbers being used for power plants, although they are used by the chemical industry, which is a sector that represents a comparatively small emission of CO 2 . The validity of using NO 2 as a proxy for CO 2 310 emissions may change in the future as non-catalytic reduction and low-NO x burner technologies begin to mature. We find no correlation between NO 2 plumes and the location of natural gas flaring, which is unexpected since this will be an major form of combustion and therefore should result in a significant source of NO 2 . We have no explanation for this observation, except if flaring occurs at preferential times of day that do not coincide with the early afternoon overpass time of TROPOMI. Our approach will also miss direct CO 2 and methane emissions, e.g., pipeline leaks, coal mines (Palmer et al., 2021). For these 315 sources, we still have to rely on high spatially-resolved CO 2 and methane data (Varon et al., 2021). In contrast, we also find persistent NO 2 plumes from regions where ODIAC does not currently include CO 2 emissions that may be real or reflect false positives. False positives can result from data retrieval errors or from human error in the supervised learning strategy necessary to develop the CNN model. Based on the location and inspection of satellite imagery provided by Google Maps we suggest these are likely to be associated with new areas where fossils fuels are being extracted or combusted for energy generation. This 320 demonstrates how NO 2 plumes could be used to inform emission inventories about the location of new point sources across the globe. Generally, it is important for domain-level expertise to evaluate data products developed by deep learning models to minimize the influence of false positives.
The NO 2 plume detection algorithm does not quantify anthropogenic emissions of CO 2 or methane, but it provides a method to refine the development of measurement, reporting and verification systems that form the backbone of the Paris Agreement.

325
The launch of Copernicus CO 2 service, including a constellation of satellites that will measure CO 2 , methane, and NO 2 , will result in a step-change in the number of coincident measurements and thereby will improve our ability to use simultaneously use NO 2 with CO 2 and methane to quantify anthropogenic emissions of CO 2 and methane.

330
Appendix A: Supervised Learning Using Crowd Sourcing We created an online tool (https://plume-spotter.herokuapp.com) that briefly described what a plume is with a few examples of what they can look like and then displays 18 images in a 6×3 grid. These images were selected at random from an initial 1565 unique images which were compiled by the authors. Each participant was then invited to click on the images in the grid which they considered to contain a plume and then to submit their selection. Once their results were submitted, the participant 335 was asked to classify 18 more random images. We then used these results to determine how many true (contains a plume) or false (does not contain a plume) classifications each image received. We designed this method to reduce the amount of human error and individual judgement on what could be considered a plume or not. For this crowd sourcing experiment, there were approximately 580 images for which 0-10% of classifications were true, i.e. high confidence that these images do not contain a plume. There were approximately 130 images for which 90-100% of classifications were true. For the remaining (≃800) images there was little agreement between the participants about whether they included a plume or not. Since the majority of the images from the initial dataset did not have a high level of agreement 345 on whether they contained a plume or not, we decided that this dataset was not unsuitable to train our model.
Going forward, this experiment could be refined to help improve the results and give us more confidence in the classifications of the images. The experiment assumed all images did not contain a plume unless the participant changed the classification, this meant that if a participant did not see an image then it would be considered not to contain a plume. We also noticed that what was considered a plume changed depending on the surrounding images. If the participant is unsure whether an image 350 contains a plume or not, they may be more likely to keep the image classification as false if a surrounding image contained a clearer plume, or vice versa for is the surrounding images definitely did not contain a plume.