Neural-network-based estimation of regional-scale anthropogenic CO2 emissions using an Orbiting Carbon Observatory-2 (OCO-2) dataset over East and West Asia

CE1Atmospheric carbon dioxide (CO2) is the most significant greenhouse gas, and its concentration is continuously increasing, mainly as a consequence of anthropogenic activities. Accurate quantification of CO2 is critical for addressing the global challenge of climate change and for de5 signing mitigation strategies aimed at stabilizing CO2 emissions. Satellites provide the most effective way to monitor the concentration of CO2 in the atmosphere. In this study, we utilized the concentration of the column-averaged dry-air mole fraction of CO2, i.e., XCO2 retrieved from a CO2 mon10 itoring satellite, the Orbiting Carbon Observatory-2 (OCO2), and the net primary productivity (NPP) provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) to estimate the anthropogenic CO2 emissions using the Generalized Regression Neural Network (GRNN) over East and 15 West Asia. OCO-2 XCO2, MODIS NPP, and the Open-Data Inventory for Anthropogenic Carbon dioxide (ODIAC) CO2 emission datasets for a period of 5 years (2015–2019) were used in this study. The annual XCO2 anomalies were calculated from the OCO-2 retrievals for each year to remove the 20 larger background CO2 concentrations and seasonal variability. The XCO2 anomaly, NPP, and ODIAC emission datasets from 2015 to 2018 were then used to train the GRNN model, and, finally, the anthropogenic CO2 emissions were estimated for 2019 based on the NPP and XCO2 anomalies 25 derived for the same year. The estimated and the ODIAC CO2 emissions were compared, and the results showed good agreement in terms of spatial distribution. The CO2 emissions were estimated separately over East and West Asia. In addition, correlations between the ODIAC emissions and 30 XCO2 anomalies were also determined separately for East and West Asia, and East Asia exhibited relatively better results. The results showed that satellite-based XCO2 retrievals can be used to estimate the regional-scale anthropogenic CO2 emissions, and the accuracy of the results can be enhanced by 35 further improvement of the GRNN model with the addition of more CO2 emission and concentration datasets. Pl ea se no te th e re m ar ks at th e en d of th e m an us cr ip t. Published by Copernicus Publications on behalf of the European Geosciences Union. 2 F. Mustafa et al.: Neural-network-based estimation of anthropogenic CO2 emissions over AsiaTS4

signing mitigation strategies aimed at stabilizing CO 2 emissions. Satellites provide the most effective way to monitor the concentration of CO 2 in the atmosphere. In this study, we utilized the concentration of the column-averaged dry-air mole fraction of CO 2 , i.e., XCO 2 retrieved from a CO 2 mon-10 itoring satellite, the Orbiting Carbon Observatory-2 (OCO-2), and the net primary productivity (NPP) provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) to estimate the anthropogenic CO 2 emissions using the Generalized Regression Neural Network (GRNN) over East and 15 West Asia. OCO-2 XCO 2 , MODIS NPP, and the Open-Data Inventory for Anthropogenic Carbon dioxide (ODIAC) CO 2 emission datasets for a period of 5 years (2015-2019) were used in this study. The annual XCO 2 anomalies were calculated from the OCO-2 retrievals for each year to remove the 20 larger background CO 2 concentrations and seasonal variabil-ity. The XCO 2 anomaly, NPP, and ODIAC emission datasets from 2015 to 2018 were then used to train the GRNN model, and, finally, the anthropogenic CO 2 emissions were estimated for 2019 based on the NPP and XCO 2 anomalies 25 derived for the same year. The estimated and the ODIAC CO 2 emissions were compared, and the results showed good agreement in terms of spatial distribution. The CO 2 emissions were estimated separately over East and West Asia. In addition, correlations between the ODIAC emissions and 30 XCO 2 anomalies were also determined separately for East and West Asia, and East Asia exhibited relatively better results. The results showed that satellite-based XCO 2 retrievals can be used to estimate the regional-scale anthropogenic CO 2 emissions, and the accuracy of the results can be enhanced by 35 further improvement of the GRNN model with the addition of more CO 2 emission and concentration datasets. Climate change is one of the greatest challenges to the future of Earth, and it stems from global warming, which is accelerated by anthropogenic emissions of greenhouse gases (Lamminpää et al., 2019). The major warming effects are caused Moreover, defining the uncertainty in the inventory datasets is also a challenging task, and the intercomparisons of various inventories do not necessarily reveal all of the uncertainties, as different inventories sometimes use common sources of information (Konovalov et al., 2016)CE2 . It is becoming 40 increasingly important to find efficient and reliable ways of monitoring CO 2 reduction progress and to evaluate how well specific CO 2 reduction policies are working.
Satellites provide the most effective way of monitoring atmospheric CO 2 with great spatiotemporal resolution. Sev-45 eral satellites such as the Greenhouse Gases Observing Satellite (GOSAT), GOSAT-2, the Orbiting Carbon Observatory-2 (OCO-2), OCO-3, and TanSATCE3 are orbiting the Earth and are dedicated to monitoring atmospheric CO 2 (Crisp, 2015;Liu et al., 2018;Matsunaga et al., 2019;Taylor et al., 50 2020; Bao et al., 2020;Hong et al., 2021;Yang et al., 2018). These satellites calculate the average atmospheric CO 2 concentration in the path of sunlight reflected by the surface using spectrometers carried onboard. OCO-2 measures the CO 2 optical depth with bands centered around 1.6 and 2.0 µmTS1 55 and determines the O 2 optical depth using the A-band, which is centered around 0.76 µm O'dell et al., 2012). The information from these bands is combined to calculate the column-averaged dry-air mole fraction of CO 2 (XCO 2 ) (Crisp et al., 2012). Several studies suggest that 60 XCO 2 can be used to detect the CO 2 concentration induced by anthropogenic activities by removing the background concentration from the satellite XCO 2 retrievals (Bovensmann et al., 2010;Hakkarainen et al., 2019;Keppel-Aleks et al., 2013). The results from these studies have reported an en-65 hancement of nearly 2 ppm over megacities and high-density urban regions in the US and China. The XCO 2 retrievals derived from the satellite measurements show a positive correlation with the CO 2 emission inventories (Hakkarainen et al., 2016;Yang et al., 2019) which implies that these space-based 70 observations can be used to assess the anthropogenic CO 2 emissions by enhancing the anthropogenic XCO 2 concentration.
Asia is home to the world's most populous nations with the highest CO 2 emissions. East Asia, in particular China, 75 significantly contributes to the global carbon budget and has accounted for ∼ 30 % of the overall growth in global CO 2 emissions over the past 15 years (EDGAR TS2 , 2017). This increment in the CO 2 levels is mainly due to the rapid economic growth and anthropogenic activities (Shan et al., 80 1997). China has pledged to make aggressive efforts to reduce the CO 2 emissions per unit gross domestic product (GDP) by 60 %-65 % relative to 2005 levels, and peak carbon emissions overall, by 2030 (UNFCCCTS3 , 2015). West Asia is also a region with higher rates of anthropogenic CO 2 85 emissions (Mustafa et al., 2020), and some of its countries, such as Iran, Saudi Arabia, and Turkey, are listed among the 10 largest CO 2 emitting nations in the world. Several studies have been carried out to estimate the CO 2 emissions using various machine learning techniques, but most of them do 90 not deal with the spatial distribution. Rao (2021) estimated the CO 2 emissions using Support Vector Machine (SVM). Zhonghan et al. (2018) predicted the CO 2 flux emissions based on published data including latitude, age, potential net primary productivity (NPP), and mean depth using the Back 95 Propagation Neural Network (BPNN) and Generalized Regression Neural Network (GRNN) models. Yang et al. (2019) estimated the anthropogenic CO 2 emissions using GOSAT XCO 2 retrievals over China, and the results showed good agreement between the estimated values and the ODIAC 100 CO 2 emission dataset. In this study, we have improved the model initially developed by Yang et al. (2019) to estimate the regional-scale anthropogenic CO 2 emissions using OCO-2 XCO 2 retrievals over East and West Asia. MODIS NPP, OCO-2, and ODIAC CO 2 datasets were obtained for a pe-105 riod of 5 years from January 2015 to December 2019. XCO 2 anomalies were calculated from the OCO-2 retrievals for each year; the GRNN model was trained using XCO 2 anomalies, MODIS NPP, and ODIAC CO 2 emissions with 4 years 3 of data from 2015 to 2018; and then anthropogenic CO 2 emissions were estimated for the year 2019 based on 2019 NPP and XCO 2 anomalies. Atmospheric CO 2 monitoring satellites can detect and analyze the anthropogenic CO 2 signatures, and the satellite-based estimation of anthropogenic 5 CO 2 emissions can be helpful in investigating the carbon emissions as a data-driven method, which is different from the conventional method of calculating an emission inventory. Although the estimation of anthropogenic CO 2 emissions using satellite datasets is a challenging task, as some 10 other factors such as the atmospheric transport and the terrestrial ecosystem play notable roles in controlling the spatial distribution of atmospheric CO 2 (Cao et al., 2017), this data-driven method can still provide meaningful help with respect to quantifying anthropogenic CO 2 emissions that will 15 be important for evaluating the effects of anthropogenic CO 2 emission reduction at regional as well as global scales.
The remainder of this paper is structured as follows: the details of the datasets and methods are provided in Sect. 2, and the results, including the estimated CO 2 emissions, an 20 evaluation of these emissions, and the correlation between ODIAC CO 2 emissions and XCO 2 anomalies are discussed in Sect. 3. The Orbiting Carbon Observatory-2 (OCO-2) was launched by the National Aeronautics and Space Administration (NASA) on 2 July 2014 to monitor the concentration of atmospheric CO 2 at regional and global levels (Crisp, 2015).

Datasets
It carries a three-channel imaging grating spectrometer that collects high-resolution, bore-sighted spectra of reflected sunlight. Spectra are collected in the molecular oxygen Aband at 0.765 µm and the CO 2 bands at 1.61 and 2.06 µm (Hakkarainen et al., 2019). Information from all of these 35 bands is combined to calculate the XCO 2 . The spatial resolution of OCO-2 is 2.25 km × 1.29 km. More details about the instrument design, calibration approach, in-orbitCE4 performance, and measurement principles are provided in a previous study (Crisp, 2015). In this study, we used the OCO-2 At-40 mospheric Carbon Observations from Space (ACOS)/XCO 2 version 10r product that was generated using the ACOS Level 2 Full Physics (L2FP) retrieval algorithm, which used a Bayesian optimal estimation framework to derive estimates of XCO 2 from spectral measurements of reflected solar ra-45 diation (O'dell et al., 2012;Crisp et al., 2012). A comprehensive study on the validation of OCO-2 XCO 2 retrievals against the Total Carbon Column Observing Network (TC-CON) CO 2 dataset reported an absolute median difference of less than 0.4 ppm and a root-mean-square (RMS) differ-ence of less than 1.5 ppm between the two datasets (Wunch et al., 2017). Similar experiments have been carried out for the validation of different versions of OCO-2 XCO 2 products, and the results have shown that the OCO-2 dataset was consistent and reliable for atmospheric CO 2 monitoring (Kiel 55 et al., 2019;O'dell et al., 2018). The quality and the quantity of the XCO 2 product have been improved with the developments in the ACOS FP retrieval algorithm. The latest OCO-2 XCO 2 product has single sounding precisionCE5 of ∼ 0.8 ppm over land and ∼ 0.5 ppm over water, and RMS bi-60 ases of 0.5-0.7 ppm over both land and water (ODell et al., 2021 TS5 ). The evolution of the ACOS L2FP retrieval algorithm from v7 to v10 is summarized in Table 1.
No major changes were made in the ACOS v9 L2FP retrieval algorithm relative to v8 except for the sampling of the 65 meteorological prior. The trace gas absorption coefficient tables (ABSCO) were updated in various versions of the ACOS L2FP retrieval algorithms. The source of the prior meteorology was changed from the European Center for Medium-Range Weather Forecasts (ECMWF) in ACOS v7 to the 70 NASA Goddard Modeling and Assimilation Office (GMAO) Goddard Earth Observing System (GEOS) Forward Processing -Instrument Team (FP-IT) products for v8 and v9. The aerosol prior source was changed from the GMAO Modern-Era Retrospective analysis for Research and Applications 75 (MERRA) product in v7-9 to Goddard Earth Observing System 5 (GEOS5) FP-IT in v10. Moreover, an additional stratospheric aerosol layer was introduced in ACOS v8-10. The prior value of aerosol optical depth (AOD) for each retrieved aerosol type was lowered from 0.0375 in v7 to 0.0125 in v8-80 10. The CO 2 prior developed by the Total Carbon Column Observing Network (TCCON) team using the ggg2014 algorithm remained same in v7, v8, and v9 of the algorithm. Another major change was switching the land surface model from a purely Lambertian land surface model to a bidirec-85 tional reflectance distribution function (BRDF) model (Taylor et al., 2021).

ODIAC dataset
ODIAC is a global fossil fuel CO 2 (FFCO 2 ) emission dataset with a 1 × 1 kmCE6 monthly resolution over land and a 90 1 × 1 • CE7 annual resolution for international bunkers from the year 2000 onward (Oda et al., 2018). It shares countryscale estimates with the Carbon Dioxide Information Analysis Center (CDIAC) but distributes the emissions differently within the countries and includes gridded international 95 bunker emissions (Oda and Maksyutov, 2015). CDIAC distributes the CO 2 emissions based on the population density, whereas ODIAC incorporates power plant profiles and nighttime light observations for emission distribution (Wang et al., 2020). ODIAC shows better agreement with the US bottom-100 up inventory (Gurney et al., 2009) than CDIAC, and it is commonly used in flux inversions (Crowell et al., 2019;Lauvaux et al., 2016;Maksyutov et al., 2013;Takagi et al., 2011). In this study, we used the 2020 version of ODIAC emission dataset that is freely available and can be downloaded from http://db.cger.nies.go.jp/dataset/ODIAC/TS6 .

Methods
The estimation of anthropogenic CO 2 emissions includes 5 three major steps, as shown in Fig. 1: the first step includes enhancing the XCO 2 concentration influenced by anthropogenic activities; the second step involves setting up the GRNN model using the XCO 2 , NPP, and ODIAC datasets; and the final step is the validation of estimated CO 2 emis-10 sions against the actual ODIAC emission dataset. The OCO-2 XCO 2 dataset was downloaded from the EARTHDATA platform (https://earthdata.nasa.gov/TS7); to ensure the reliability of the data, screening and filtering of the dataset was carried out following the instructions given 15 in the OCO-2 Data User Guide (DUG). Each sounding that is processed using the ACOS L2FP retrieval algorithm is assigned either a "good" (0) or "bad" (1) quality flag based on screening criteria derived from comparisons with TC-CON and modeled CO 2 fields. It is generally advised that 20 users should use the good-quality soundings for regionaland local-scale studies because the soundings flagged as badquality might include biases that compromise their utility for the application. In this study, the OCO-2 XCO 2 retrievals were included if (i) they were flagged good (flag of 0) and 25 (ii) the standard deviation of the good soundings for the day was less than 2 ppm. CO 2 has a larger background concentration and a longer atmospheric lifetime than other greenhouse gases (Hakkarainen et al., 2019). Hence, XCO 2 varies by nearly 2 % over the seasonal cycle and from pole to pole. In 30 addition, XCO 2 variations influenced by anthropogenic activities are also smaller on the scale of satellite soundings (2-4 km 2 ). Therefore, high precision is critical for the accurate quantification of the XCO 2 anomalies related to anthropogenic activities. To highlight the emission areas, CO 2 35 seasonal variability and the large background concentrations must be removed.
To highlight the areas associated with the anthropogenic CO 2 emission, XCO 2 anomalies were calculated by subtracting the daily XCO 2 median (daily background) from the indi-40 vidual XCO 2 observation -a method suggested by previous studies (Hakkarainen et al., 2019(Hakkarainen et al., , 2016: (1) This equation calculated the XCO 2 anomalies for each observation. Subtraction of the daily background concentration 45 removes the seasonal variability. The space-based soundings are irregularly distributed and have spatiotemporal gaps because a large amount of the satellite observations is removed after screening for clouds and other artifacts. To deal with the spatiotemporal gaps, kriging interpolation was used, and 50 a mapping dataset was generated with a spatial resolution of 0.5 • × 0.5 • (latitude × longitude) and a temporal resolution of 16 d. Finally, the mean against each grid cellCE8 was calculated for each year from 2015 to 2019. The annual mean of XCO 2 (anomaly) can detrend the seasonal varia-55 tion (Hakkarainen et al., 2016). The annually averaged XCO 2 anomalies were resampled at a grid with a spatial resolution of 1 • × 1 • (latitude × longitude) and used along with 1 • × 1 • (latitude × longitude) ODIAC emission dataset to set up the GRNN model.

60
During the process of photosynthesis, living plants convert CO 2 into sugar molecules that they use for food. In the process of making food, they also release the oxygen we breathe. Plant productivity plays a crucial role in the global carbon cycle by absorbing the CO 2 released by an-65 thropogenic activities. The net primary productivity (NPP) shows how much CO 2 is absorbed by plants during photosynthesis minus how much CO 2 is released during respiration. A negative NPP value means that CO 2 is released into the atmosphere, and a positive value represents the ab-70 sorption of atmospheric CO 2 . To improve the model results, an NPP dataset (MOD17A3HGF) provided by MODIS has also been used in this study. It provides information about annual NPP and is distributed by NASA's Land Processes Distributed Active Archive Center (LP DAAC). The 75 NPP dataset with a spatial resolution of 500 m was downloaded from the LP DAAC website (https://lpdaac.usgs.gov/ products/mod17a3hgfv006/ TS8 ). The annual NPP is derived from the sum of all 8 dTS9 Net Photosynthesis (PSN) products (MOD17A2H) from the given year. The MODIS NPP 80 dataset was reprojected and resampled to the spatial res- for each year and used along with the ODIAC and OCO-2 datasets to train the GRNN model and as well predict the CO 2 emissions. XCO 2 variations are primarily influenced by anthropogenic activities and terrestrial ecosystems, and there is 5 both linear and nonlinear mapping between the XCO 2 and the emissions. We adopted the GRNN algorithm to represent the nonlinear mapping between the independent variables (XCO 2 anomaly and NPP) and the dependent variable (CO 2 emissions). The GRNN is a memory-based network that pro-10 vides estimates of continuous variables and converges to an underlying regression. The regression of a dependent variable on an independent variable is the computation of the most probable value of the dependent variable for each value of the independent variable based on a finite number of possi-15 bly noisy measurements of the independent variable and the associated values of the dependent variable. The dependent and the independent variables are usually vectors (Rooki, 2016). The architecture of GRNN is shown in Fig. 2. It consists of four layers including an input layer, a hidden layer, 20 a summation layer, and a decision layer. In the input layer, each neuron corresponds to the independent variable that is expressed as a mathematical function, and the independent variable values are standardized. The standardized values of the independent variable are then transferred to the neurons 25 in the hidden layer. In this layer, each neuron stores the values of the dependent and independent variables and calculates a scalar function. The third layer, known as the summation layer, contains two neurons: the denominator summation unit, which sums the weight values being received from the 30 hidden layer, and the numerator summation unit, which sums the weight values multiplied by the actual target-dependent variable value for each hidden neuron. Finally, the targetdependent value is obtained in the decision layer by dividing the value accumulated in the numerator summation unit by 35 the value in the denominator summation unit. To develop a neural network, the dependent and the independent training variables must be standardized so that all training data will have the same order of magnitude in the input layer (Yang et al., 2019) where p is the dimension of the variable vector x i , σ is the spread parameter, and an optimal spread parameter value is obtained after several runs following the mean squared error of the estimated values, which must be kept at a mini-45 mum (Rooki, 2016). In this study, values of spread parameters were optimized using the "Holdout Method". More details about the Holdout Method are provided in a previous study (Specht, 1991). The weight of the denominator neuron was set to 1.0. The predicted target dependent variable was 50 defined by the following equation: where the values calculated with the scalar function in a hidden neuron i are weighted with the corresponding values of the training samples y i . n denotes the number of training 55 samples.
3 Results and discussion

Spatial distribution of XCO 2 observations and anomalies
The satellite-based observations are sensitive to clouds and 60 aerosols; therefore, many of the data are discarded during preprocessing due to the presence of clouds and aerosols (Mustafa et al., 2021b). Figure 3a and b show TS11 the quantity of XCO 2 retrievals from 2015 to 2019 on a spatial grid of 0.5 • × 0.5 • (latitude × longitude) over West and East Asia, 65 respectively. OCO-2 shows good spatial coverage over East Asia; however, the southern parts of the region, in particular the Tibetan Plateau, have a relatively lower number of XCO 2 retrievals. The Tibetan Plateau is the most extensively elevated surface on Earth, and satellite measurements show 70 larger uncertainties over this region (Yang et al., 2019). In the case of West Asia, the southern parts of the region have a lower number of XCO 2 retrievals. A very large desert, the Rub' al Kahli, is located in this area; it stretches across Saudi Arabia, Yemen, Oman, and the United Arab Emirates (UAE) 75 and often observes dust storms. The lower number of XCO 2 retrievals in these parts of the region might be due to the ACOS XCO 2 retrieval algorithm that excludes satellite measurements with a high aerosol optical depth and cloud optical thickness (Crisp et al., 2012;O'dell et al., 2012). 80 Figure 3c shows the spatial distribution of the 5-year averaged XCO 2 anomalies calculated using the method described in Sect. 2.2 over West Asia. The higher concentrations of XCO 2 anomalies were observed over the central parts of the region that included Iran, Kuwait, Saudi Arabia, and Iraq. 85 Iran and Saudi Arabia are listed among the top 10 CO 2 emitting nations and produce over 6 % of the global CO 2 emissions (Jalil, 2014). In addition, Iran, Saudi Arabia, and Iraq are the major fuel consumers of the region and contribute more than 60 % of the region's total fossil fuel CO 2 emis-90 sions (Boden et al., 2017). Figure 4d shows the multiyear averaged XCO 2 anomalies over East Asia. The eastern parts of the region including eastern China, Japan, and South Korea show the highest concentrations of XCO 2 anomalies. China's Beijing-Tianjin-Hebei area, Korea, and Japan are the most 95 populated urban regions with high amounts of anthropogenic emissions in the world (Mustafa et al., 2020). Figure 3e shows the monthly averaged XCO 2 over East and West Asia. The monthly averaged XCO 2 concentrations show seasonal fluctuations. Moreover, the XCO 2 concentrations during each month are higher than those in the same month of the previous year, which reflects that the XCO 2 concentration in the atmosphere is continuously increasing in both regions. The XCO 2 concentration starts increasing 5 from September and reaches its maximum value in April; it then starts decreasing and reaches its minimum value in August. The decrement in its concentration from May to August is due to several reasons; however, it is primarily owing to the strong photosynthesis and weak respiration rate of plants, which is enhanced during the monsoon or rainy season (Mustafa et al., 2020). The increment in the XCO 2 concentration from September to April is likely to be caused by weak photosynthesis and strong respiration, the use of heating systems in winter, and strong microbial activity (Cao 15 et al., 2017;Mustafa et al., 2021a).

Estimated CO 2 emissions
The annually averaged XCO 2 anomalies, MODIS NPP, and ODIAC CO 2 emission datasets for a period of 4 years from 2015 to 2018 were used as a training dataset for the GRNN 20 model built to estimate the CO 2 emissions using the method described in Sect. 2.2. The GRNN model was then applied to 2019 annually averaged XCO 2 anomalies and NPP datasets to predict the CO 2 emissions with the same unit as the ODIAC CO 2 emissions. The analyses were carried out separately over East and West Asia. Figure 4a and b show TS12 the 5 estimated values and the ODIAC CO 2 emissions over East Asia, respectively. The results show that the estimated values and the inventory CO 2 emissions exhibit nearly the same spatial distribution pattern. The eastern part of the region shows higher CO 2 emissions, and the western and northern 10 parts, in particular the Tibetan Plateau and Mongolia, show the minimum CO 2 emissions. The pattern is also similar to the XCO 2 anomalies distribution over East Asia (Fig. 3d). The estimated CO 2 emissions have a relatively smoother distribution pattern compared with the ODIAC CO 2 emis-15 sions, which might be due to the interpolation of the OCO-2 dataset. Figure 4c shows the difference between the estimated and the inventory CO 2 emissions over East Asia. The estimated CO 2 emissions are generally overestimated rela-tive to the ODIAC CO 2 emissions; however, the emissions 20 are underestimated over some parts of the region as well. Figure 4d shows the land cover distribution of East Asia provided by the Copernicus Global Land Service (Buchhorn et al., 2020). The predicted CO 2 emissions are overestimated over most of the regional parts; however, this overestimation 25 is more significant over agricultural areas that are located near high-density regions, e.g., eastern China. Eastern China, Japan, and Korea are known to be among the regions with the highest CO 2 emissions, and this underestimation over the agricultural areas might be caused by the nearby CO 2 emis-30 sion sources which raise the CO 2 concentration of the nearby areas through atmospheric transport. Previous studies have demonstrated that the concentration of atmospheric CO 2 is influenced by atmospheric transport (Cao et al., 2017;Kumar et al., 2014). The areas where the predicted CO 2 emissions 35 are underestimated are covered by agriculture, forest, and vegetation. This underestimation of the predicted CO 2 emissions over these areas indicates the presence of uncertainties in the XCO 2 anomalies that are likely to be produced by the CO 2 uptake of the biosphere which still remains in the XCO 2 anomalies. In addition, the areas where the estimated CO 2 emissions are overestimated have higher elevations. OCO-2 observations show larger uncertainties over elevated and 5 mountainous areas, especially the Tibetan Plateau where the OCO-2 retrievals are significantly overestimated (Kong et al., 2019;Mustafa et al., 2020), and this might also have a contribution to the overestimation of estimated CO 2 emissions. The difference between the estimated and the ODIAC CO 2 emissions ranged from −0.06 × 10 9 to 3.2 × 10 9 kg, and the magnitude of difference between −1 × 10 9 and 1 × 10 9 kg accounted for 84 % of the total number of grid cells. Yang et al. (2019) estimated the CO 2 emissions using a similar machine learning approach with GOSAT XCO 2 retrievals 15 over China, and the differences between the estimated values and the ODIAC CO 2 emissions were between −5 × 10 9 and 5 × 10 9 kg. Moreover, the predicted results from the abovementioned study exhibited less CO 2 emissions overall relative to the ODIAC emissions, contradicting our results. Our 20 study showed better results, which may be due to the fact that (i) we improved the predictive model with the addition of an NPP dataset (Fig. 4e), (ii) we utilized the higher-resolution XCO 2 retrievals provided by OCO-2, and (iii) we incorporated the OCO-2 XCO 2 retrievals processed using the lat-25 est version of the retrieval algorithm. The newer version of the ACOS L2FP retrieval algorithm has improved the quan-tity and the quality of the satellite-based observations (Taylor et al., 2021). Figure 5a and b showTS14 the spatial distribution of satellite-based estimated CO 2 emissions and the actual ODIAC CO 2 emissions over West Asia, respectively. The 5 spatial distribution pattern of both the estimated and the original CO 2 emissions is similar with some differences in their magnitudes. CO 2 emissions in the eastern parts are relatively larger compared with other parts of the region. Figure 5c shows the difference between the estimated values and the 10 ODIAC CO 2 emissions. The satellite-based estimated CO 2 emissions are generally overestimated compared with the actual ODIAC CO 2 emissions. The estimated CO 2 emissions are notably larger over Iran and Saudi Arabia. Figure 5d shows the land cover distribution of West Asia. It can be 15 seen that the predicted CO 2 emissions are overestimated over the areas that are covered by either urban settlements or bare land. The overestimation of estimated CO 2 over these areas is likely to be caused by atmospheric transportation that influences the spatial distribution of atmospheric CO 2 (Cao 20 et al., 2017). Moreover, a large part of West Asia is covered by deserts, and these deserts observe a notably lower number of OCO-2 retrievals (Fig. 3a). The overestimation of the predicted CO 2 emissions over the largest desert of the region, the Rub' al Kahli, located in southern parts is 25 likely to be caused by the uncertainties in the satellite-based XCO 2 anomalies, and these uncertainties are likely to be produced due to a lower number of OCO-2 retrievals. In addition, a previous study also indicated that the ACOS XCO 2 retrieval algorithm showed uncertainties over deserts (Bie 30 et al., 2018). Similar to East Asia, the predicted CO 2 emissions over West Asia are also underestimated over areas that are covered by agriculture or vegetation, and this underestimation might be due to the presence of CO 2 uptake by the biosphere in the XCO 2 anomalies calculated using the 35 satellite-based retrievals. The difference between the estimated values and the ODIAC CO 2 emissions ranged from −0.16 × 10 9 to 2.8 × 10 9 kg, and the magnitude of the difference between −1 × 10 9 and 1 × 10 9 kg accounted for 88 % of the total number of grid cell.CE12 40 3.3 Correlation analysis between OCO-2 XCO 2 anomalies and ODIAC emissions Figure 6 shows the correlation analysis between the ODIAC CO 2 emissions and the XCO 2 anomalies calculated using the OCO-2 retrievals over East and West Asia. Yang et al.

45
(2019) found that the cluster of XCO 2 changes derived from satellite-based observations showed a better and more significant correlation with the CO 2 emissions relative to a single grid of XCO 2 , which might have been due to the fact that the atmospheric CO 2 measurement is an instantaneous 50 snapshot of the realistic atmosphere CE13.
For the correlation analysis, we segmented the ODIAC emissions, which were binned every 0.3 t yr −1 of lgE using mean emissions calculated from annual emissions during 2015-2019, and then carried out an analysis between the mean of 55 the emissions and the mean of the XCO 2 anomalies within the binned regions. The results showed a positive and significant correlation between the two datasets. Figure 6a and b show TS15 the spatial distribution of segmented ODIAC emissions over East Asia and the scatterplot between the mean of 60 the emissions and the mean of the XCO 2 anomalies, respectively. The two datasets show a positive and significant correlation with a determined coefficient (R 2 ) of 0.81. The spatial distribution of segmented ODIAC emissions over West Asia and the scatterplot between the mean of the emissions and 65 the mean of the XCO 2 anomalies for this region are shown in Fig. 6c and d, respectively. The two datasets showed a good correlation with a determined coefficient (R 2 ) of 0.60. Several studies have correlated satellite-based XCO 2 anomalies with CO 2 emissions (Fu et al., 2019;Shekhar et al., 2020). 70 Yang et al. (2019) performed a correlation analysis between the GOSAT-based XCO 2 anomalies and the ODIAC CO 2 emissions over China and found a significant correlation with a determined coefficient (R 2 ) of 0.82 which increased up to 0.95 if the analysis was carried out with higher CO 2 emission 75 values. In our study, the correlation between the CO 2 emissions and XCO 2 anomalies is relatively low for West Asia, which might be due to the uncertainties in the OCO-2 retrievals. A large part of West Asia is covered by deserts, and, as previously stated, Bie et al. (2018) reported that the ACOS 80 XCO 2 retrieval algorithm showed uncertainties over deserts.

Summary and conclusions
In this study, anthropogenic CO 2 emissions were estimated using satellite datasets and employing a neural-networkbased method. The study was carried out using ODIAC CO 2 85 emissions, OCO-2 XCO 2 , and MODIS NPP datasets from 2015 to 2019. To remove the CO 2 seasonal variability and the large background concentration from the OCO-2 XCO 2 retrievals, XCO 2 anomalies were calculated for each year. A GRNN model was then built; XCO 2 anomalies, NPP, and 90 CO 2 emissions from 2015 to 2018 were used as a training dataset; and, finally, CO 2 emissions were predicted for 2019 based on the NPP and XCO 2 anomalies calculated for the same year. The analyses were carried out separately over East and West Asia. The satellite-based estimated values and 95 the ODIAC CO 2 emission datasets were compared, and both of the datasets showed good agreement in terms of spatial distribution. The estimated CO 2 emissions showed better results over East Asia compared with West Asia, which might be due to the uncertainties in the XCO 2 retrievals: previ-100 ous studies have reported that the ACOS XCO 2 retrieval algorithm produced uncertainties over deserts. The predicted CO 2 emissions were generally overestimated, and this overestimation was larger over the areas that were closer to the high-density urban regions. The overestimations might be 105 due to the nearby high-emission CO 2 sources that raised the XCO 2 concentration due to the effects of atmospheric transport. The satellite-based estimated CO 2 emissions were underestimated over some parts of the regions, mostly areas covered by agricultural land and vegetation; this was likely 5 caused by the uncertainties in the calculated XCO 2 anomalies, and these uncertainties were produced due to the presence of the CO 2 uptake of the biosphere. We compared our results with a previous study carried out using a similar predictive model incorporating GOSAT XCO 2 retrievals CE14 .

10
The referenced study generally underestimated the predicted CO 2 emissions, with larger differences relative to ODIAC CO 2 emissions, contradicting our results. Our study showed relatively better results, which might be due to several reasons: (i) we improved the predictive model with the addi-15 tion of an NPP dataset, (ii) we incorporated OCO-2 XCO 2 retrievals that have a higher spatial resolution compared with the GOSAT XCO 2 retrievals, and (iii) we used a XCO 2 product processed using the latest version of the ACOS L2FP retrieval algorithm. The newer version of the algorithm has im-20 proved the quantity and the quality of the XCO 2 retrievals.
Moreover, correlation analysis was also carried out between the ODIAC CO 2 emissions and the OCO-2 XCO 2 anomalies, and the results were significant with R 2 values of 0.81 and 0.60 over East and West Asia, respectively. These results 25 were in agreement with the previous studies. The results from our study suggest that CO 2 emissions can be estimated using observations obtained from CO 2 monitoring satellites. Currently, several satellites are orbiting the Earth and are dedicated to monitoring atmospheric CO 2 . 30 Joint utilization of the observations from the old and the latest satellites, such as OCO-3, GOSAT-2, and TanSAT, might reduce the spatiotemporal gaps and uncertainties. In future studies, we intend to improve the GRNN model via the addition of CO 2 uptake datasets and the joint utilization of multi-35 sensor data.
Author contributions. FM carried out the analysis under the supervision of LB, with input and support from QW, NY, MS, MB, RWA, and RI. FM wrote the original article with feedback from all the coauthors.
Competing interests. The contact author has declared that neither 5 they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Remarks from the language copy-editor CE1 Please note that this manuscript has undergone copy-editing according to the standards of American English.

CE2
Please check that the meaning of your sentence is intact.

CE3
Please note that, according to our standards, we always define abbreviations/acronyms at the first instance of use (in the abstract as well as in the rest of the text). Please check the other edits made in such cases throughout the paper and let me know if further changes are required. Thank you.

CE4
Please confirm the change.

CE8
Do you mean "the mean of each grid cell"?

CE9
Please check that the meaning of your sentence is intact.

CE10
Please confirm the change here and again in the caption of Fig. 5.

CE11
What does "lgE"refer to? This is currently unclear. Please advise.

CE12
This seems to be repeated information from page 8. Is is required again here? Please check.

CE13
Please check that the meaning of your sentence is intact.

CE14
Please add a citation for the study that you are referring to here.

Remarks from the typesetter TS1
Please check.

TS2
Please check and confirm the change.

TS3
Please check and confirm the change.

TS4
Please confirm or provide a different short running title.

TS5
Reference missing from reference list.

TS6
Please provide date of last access.

TS7
Please provide date of last access.

TS8
Please provide date of last access.

TS10
The composition of Fig. 3 has been adjusted to our standards. Please also not the language edits to Figs. 1 and 3.

TS13
Please note that units have been changed to exponential format throughout the text. Please check all instances.

TS16
Please provide a direct link to the dataset and, if possible, a DOI instead of a URL. In any case, please provide a reference list entry including creators, title, and date of last access.

TS18
Please note that the funding information has been added to this paper. Please check if it is correct. Please also doublecheck your acknowledgements to see whether repeated information can be removed or changed accordingly. Thanks.

TS19
Please ensure that any datasets and software codes used in this work are properly cited in the text and included in this reference list. Thereby, please keep our reference style in mind, including creators, titles, publisher/repository, persistent identifier, and publication year. Regarding the publisher/repository, please add "[dataset]" or "[code]" to the entry (e.g. Zenodo [code]).

TS20
Please provide article number or page range.

TS21
Please check and confirm addition of all necessary information taken from the original document.

TS22
Please provide date of last access and confirm the year. Please also confirm the entire reference list entry.

TS23
Please check and confirm the change of all author names and initials based on the original document/DOI.

TS24
Please add the title and venue of the event.

TS25
Please check and confirm the change.

TS26
Please provide article number or page range.

TS27
Please add the name of publisher/publishing institution and place of publication.

TS28
Please check and confirm the change/addition