Assessing the Feasibility of Using a Neural Network to Filter OCO-2 Retrievals at Northern High Latitudes

Satellite retrievals of XCO2 at northern high latitudes currently have sparser coverage and lower data quality than most other regions of the world. We use a neural network (NN) to filter OCO-2 B10 bias-corrected XCO2 retrievals and compare the quality of the filtered data to the quality of the data filtered with the standard B10 quality control filter. To assess the performance of the NN filter, we use Total Carbon Column Observing Network (TCCON) data at selected northern high 15 latitude sites as a truth proxy. We found that the NN filter decreases the overall bias by 0.25 ppm (~50%), improves the precision by 0.18 ppm (~12%), and increases the throughput by 16% at these sites when compared to the standard B10 quality control filter. Most of the increased throughput was due to an increase in throughput during the spring, fall, and winter seasons. There was a decrease in throughput during the summer, but as a result the bias and precision were improved during the summer months. The main drawback of using the NN filter is that it lets through fewer retrievals at the highest latitude Arctic TCCON 20 sites compared to the B10 quality control filter, but the lower throughput improves the bias and precision.


Introduction
Northern high latitude regions are undergoing considerable changes related to climate change. The Arctic has seen the annual average temperature increase three times more than the global annual average (Stocker et al., 2013). The Boreal forest (an important driver of the CO2 seasonal cycle) has seen its growing season lengthen due to climate change (Pulliainen et al., 25 2017), with an increase in the frequency and severity of forest fires (Seidl et al., 2017). Permafrost soils of the northern high latitudes are a large carbon reservoir and some fraction of this carbon is vulnerable to release as CO2 and CH4 as the climate warms (Schuur et al., 2015). Changes in the carbon cycle will impact the climate, which in turn will impact the carbon cycle.
Understanding how the carbon cycle is changing at Boreal and Arctic latitudes, including this feedback loop, will be key to predicting future climate change.
In situ atmospheric measurements of CO2 can be used to study how the carbon cycle is changing. However, cost and logistical challenges present barriers to establishing measurement sites at high northern latitudes, limiting the amount of information available about the carbon cycle in the Arctic and Boreal regions. Remote sensing measurements from space can be used to complement coverage to the current in situ networks (Olsen and Randerson, 2004). Current satellite missions such as the 35 Greenhouse Gases Observing Satellite (GOSAT) (Yokota et al., 2009) and the Orbiting Carbon Observatory 2 (OCO-2) (Crisp et al., 2004) record solar absorption spectra reflected off the Earth's surface which are used to retrieve column-averaged dryair mole fractions of CO2 (XCO2), giving regional information on atmospheric CO2. These data can be used to learn about the carbon cycle but require low bias and high precision to be useful (Rayner and O'Brien, 2001).

40
The density of satellite retrievals of XCO2 from current missions is limited by the amount of available sunlight and the inability to measure through clouds. At high latitudes there is less sunlight available during the colder seasons decreasing the number of spectra obtained when compared to the mid-latitudes. Furthermore, filtering and bias correction schemes are optimized for mid-latitudes where more validation datasets are available. This has led to a filter that removes a larger fraction of the highlatitude data than data at mid-latitudes. Scenes with snow are also filtered out because they are thought to be problematic for 45 the retrievals, which decreases the throughput during the colder seasons. In order to improve the quality and throughput of retrievals at high latitudes, in this study we focus on using high-latitude validation XCO2 retrievals to improve the filtering of Northern high-latitude OCO-2 bias corrected XCO2 retrievals.
The study by Jacobs et al. (2020) showed that making modifications to the quality control filtering scheme and bias correction 50 used by OCO-2, one can increase the throughput of OCO-2 retrievals (data version B9) (Kiel et al., 2019;O'Dell et al., 2018) in the Boreal region. This was done by changing limits on the features used in the quality control scheme created in O'Dell et al. (2018). These changes were validated by comparing OCO-2 XCO2 retrievals (Kiel et al., 2019;O'Dell et al., 2018) coincident to XCO2 retrievals from ground-based solar absorption spectra made by remote sensing instruments used by the Total Carbon Column Observing Network (TCCON) (Wunch et al., 2011a). 55 Machine learning algorithms are useful for pattern recognition in complex data sets. Mandrake et al. (2013) was the first study to demonstrate the use of machine learning (using a genetic algorithm) to filter ACOS-GOSAT retrievals and multiple versions of the OCO-2 retrievals using warn levels. There is potential to apply different machine learning algorithms to the Northern high-latitude OCO-2 data set in order improve the bias, precision, and throughput. 60 In this study, we investigate the feasibility of using a simple neural network to filter the current OCO-2 data version (B10)  XCO2 retrievals at Northern high latitudes. Section 2 outlines the coincidence criteria between OCO-2 and TCCON retrievals and an explanation of how the retrieved XCO2 is adjusted for different averaging kernels and a priori information when comparing OCO-2 to TCCON. Section 3 describes the architecture of the neural network and how it is 65 https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License.
trained to filter the OCO-2 bias-corrected XCO2 retrievals. In Section 4, the NN filtered OCO-2 retrievals are compared to the B10 quality control (qc_flag) retrievals to assess the performance of the NN filter. Finally, we discuss results of the study and future work to improve the NN filtering.

Coincidence Criteria
The OCO-2 satellite was launched on July 2, 2014 and has been making measurements since mid-September 2014. The 70 instrument on board the satellite is a three channel, imaging, grating spectrometer that records spectra of reflected sunlight in three spectral bands centered at 0.765 µm, 1.62 µm, and 2.04 µm. These spectra are processed using a "full-physics" retrieval algorithm that retrieves a profile of CO2 (which is used to calculate XCO2) and other geo-physical information. In this study we use OCO-2 data that has been processed using the B10 version of the full-physics retrieval algorithm, with the retrieval output and sounding information contained in the B10 lite files . All soundings used in the study were 75 recorded from September 2014 to July 2020.
TCCON is a global network of ground-based Fourier Transform Infrared (FTIR) spectrometers that record direct solar absorption spectra. The high-resolution spectra are processed using the GGG2014 retrieval algorithm which scales the a priori profile of the gas of interest until the spectrum calculated by forward model best matches the spectrum recorded by the FTIR 80 (Wunch, et al., 2015). GGG2014 retrieves XCO2, XCH4, XCO, XN2O, XHF, and XH2O from a single spectrum. Selected XCO2 TCCON retrievals made in the Boreal and Arctic regions were used as a truth proxy to compare to OCO-2 retrievals.
Filtering for XHF <= 150 ppt was done to avoid the impact of the polar vortex on the TCCON retrievals. Arctic sites such as Eureka and Ny Ålesund routinely record solar absorption spectra while under polar vortex conditions during the spring months.
In some years the polar vortex can reach as far south as 40 o N (Whaley et al., 2013). Boreal sites such as East Trout Lake have 95 recorded solar spectra under polar vortex conditions but on fewer days than at the Arctic sites. Since the GGG2014 retrieval algorithm does a profile scaling retrieval (Wunch et al., 2015) it relies on good knowledge of the shape of the profile of the https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License.
gases of interest. The GGG2014 profiles are built without knowledge of the impact of the polar vortex on the shape of the profiles. When XCO2 is retrieved from spectra measured through polar vortex conditions, the shape of the a priori profile generated by the GGG2014 retrieval algorithm will likely be incorrect. This is less of an issue for OCO-2 retrievals because 100 OCO-2 performs a profile retrieval (O'Dell et al., 2018).
The TCCON sites used in this study have no direct influence due to anthropogenic pollution but are still influenced by biomass burning plumes. At sites like East Trout Lake, major enhancements in XCO over background levels are measured, typically in late summer when measurements are made through forest fire plumes. Even a remote Arctic site like Eureka sees forest fire 105 plumes during the summer months (Viatte et al., 2013). In an attempt to avoid a situation where a coincident TCCON measurement is influenced by a plume and the OCO-2 measurement is not, we filter any TCCON measurement where XCO is elevated above the value of ~150 ppb or more.
We use the B10 "lite" OCO-2 data product , where the XCO2 values have been corrected for various 110 biases, such as footprint-to-footprint biases and biases that are dependent on features of the atmosphere, surface, or retrieval algorithm. OCO-2 XCO2 data are also scaled by a global offset term that was derived using the OCO-2 target mode retrievals coincident with TCCON retrievals . In our study, we use all OCO-2 spectra that are coincident with the TCCON spectra acquired in nadir and glint modes over land. The coincidence criteria are: the distance of an OCO-2 measurement must be <= 150 km of a TCCON station, the temperature difference between the TCCON and OCO-2 115 temperature profiles at 700 hPa must be <= absolute value of 2K (Wunch et al., 2011), and the time difference between the TCCON and OCO-2 measurements must be <= 2hrs to avoid the impact of the XCO2 diurnal cycle.
To compare TCCON and OCO-2 retrievals, one has to take into account that GGG2014 and the OCO-2 retrievals obtain information about atmospheric CO2 from different spectral regions (which have peak sensitivity at different altitudes), and use 120 different a priori information. To adjust the OCO-2 bias-corrected XCO2 retrievals to take into account the a priori profile used in the GGG2014 retrieval, the following formula is used: where 2 −2 is the original bias-corrected XCO2 value found in the lite files, ℎ −2 is the OCO-2 pressure weighting vector, −2 is the OCO-2 total column averaging kernel vector, ⃑ −2 is the XCO2 a priori profile used in the OCO-2 125 retrieval and ⃑ is the XCO2 a priori profile used in the GGG2014 retrieval but interpolated onto the OCO-2 retrieval pressure grid.
OCO-2 retrieves information about CO2 from the strong CO2 band (centered at 2.04 µm) and the weak CO2 band (centered at 1.62 µm) (O'Dell et al., 2018). TCCON retrieves information from two weak CO2 bands, centered at 1.62 and 1.57 µm, (Wunch 130 https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License. et al., 2011b) but not in the strong CO2 band. This results in the OCO-2 retrievals, having different vertical sensitivities compared to the TCCON retrievals. To take this into account when comparing OCO-2 and TCCON retrievals the following formula is used to adjust the TCCON retrieved XCO2: where 2 is the integrated a priori profile used in the GGG2014 retrieval, ℎ −2 is the OCO-2 pressure weighting 135 vector, −2 is the OCO-2 total column averaging kernel vector, ⃑ is the XCO2 a priori profile used in the GGG2014 retrieval, and is the TCCON XCO2 value divided by 2 . Ideally should be the scaling factor determined by the GGG2014 retrieval, but this value does not take into account the airmass dependence correction and aircraft calibration factor applied in post processing to the retrieved XCO2. The vectors −2 and ⃑ have been interpolated onto a 20-layer pressure grid using the surface pressure measured at the TCCON site. 140 The bias between coincident TCCON and OCO-2 retrievals is calculated by taking the difference between 2 −2 (Eq. 1) and 2 (Eq. 2) and resulting in: (3)

Neural Network Architecture and Training 145
To filter the OCO-2 data, we use a three-layer neural network (NN) that consists of an input layer, a hidden layer, and an output layer. The design of the NN is based on the book by Nielsen (2015). The input layer is the value of the features of the OCO-2 retrievals that are given in the B10 lite files. Table 1 lists all the features used by the NN. The hidden layer contains the "neurons" where the calculations are done. Each input is connected to a neuron by a weight. The calculation for single neuron ( ) in a NN with neurons is given by: 150 where is the value of feature , is the weight on feature for neuron , and is the bias associated with neuron . There is a total of 37 neurons which is the total number of features plus one. An activation function is commonly applied to the neuron in order to introduce some non-linearity into the neuron calculation and make sure that small changes in the values of and result in small changes in the final output values when training the NN (Nielsen, 2015). The sigmoid function: 155 is used as the activation function. Each neuron is linked to the final output value by a weight ( ). The output value is given by: where is the offset and everything else is as described as before. 160 Applying the sigmoid activation function in Eq. 6 ensures that ̂ will have a value between 0 and 1. This is useful for binary classification, which in this case we would use the NN to classify the OCO-2 retrieval as either "good" or "bad" by equating a calculated value that is close to 0 as "good" and a calculated value that is close to 1 as "bad".

165
For the NN to work, the values of , , , and need to be determined. This was done by using a subset of the OCO-2 coincident retrievals to train the NN. The coincident data set consists of co-located OCO-2 soundings at the following TCCON sites: East Trout Lake (et), Eureka (eu), Bremen (br), Białystok (bi), Sodankylä (so), Ny Ålesund (sp), and Rikubestu (rj). We withhold the Park Falls (pa) data set so that it can be a completely independent source of validation. The coincident data were split into three datasets: training, testing, and validation. For the training and testing data, 20% of the data were randomly 170 selected to go into each data set, with the remaining 60% used for validating the results. In order to train the NN, one needs to know the input values of the training data set (which are the values of the features in the B10 lite files) and the expected output value ( ). The expected output value was set to = 0 if the difference between a coincident OCO-2 and TCCON retrieval is <= ±2.5 ppm and set to = 1 if the difference between the retrievals is > 2.5 ppm. Fig 2a shows the histogram of the difference between coincident OCO-2 and TCCON retrievals as well as the boundaries separating data into expected values of 0 and 1. 175 All data between the red dashed lines was set to = 0 (or "good") and set to = 1 (or "bad") if outside of the boundary.
To achieve the best results when training the NN, we standardize the values of the input features so that each feature has a similar range of values. This is helpful because the features have different units and orders of magnitude, and if left as is the NN will place much more importance on features that have large absolute values than other features with smaller values. To 180 standardize the input features the following formula is used: where is as before, is the mean of values from the training data set, and is the standard deviation of the values from the training data set. This means that is used in Eq. 4 instead of . The supplementary excel file (sheet Standardize values) contains the and for each of the features to be used to standardize the data before inputted into the NN. 185 To determine the values of , , , and , they are initially set randomly to be between a value of ±1. Using the training data set, ̂ is calculated for all the data using the initial values of , , , and . The performance of the NN is then determined by comparing the calculated value (̂) to the expected output value ( ) using the log loss entropy cost function: where is the total number of OCO-2 retrievals in the training data set. If ̂ = then will equal zero, meaning the values of , , , and are set to the best values that perfectly determine whether an OCO-2 retrieval is good or bad. This is unlikely to happen for the initial values of those variables since they are set randomly, so will be > 0. To minimize the value of , the values of , , , and are adjusted. The adjustments are done by taking the partial derivative with respect to the cost function (i.e., , , , and ). In principle, this should be iterated until = 0 but in practice, the classification of the 195 training data setup is not perfect. The assumption made when setting up the classification of the training data is that if -2.5 ppm < 2 (Eq. 3) < 2.5 ppm then it is a good OCO-2 retrieval but this might not be true. It could be the case that the OCO-2 retrieval has adjusted parameters as much as possible to achieve the best possible fit to the measured spectra, but that the retrieved parameters deviate from the true values while still providing an integrated profile that is close to the TCCON XCO2. This retrieval would be mis-classified as good, so the cost function will never reach 0. 200 To stop training the NN, a few cutoffs were placed: the maximum number of iterations is 5000 or the accuracy between the training and testing data < 3%. When training the NN, the accuracy of the training data and the testing data is calculated on each iteration and compared. Since the data were set up in a binary classification (i.e., 0 or 1), on each iteration, if a calculated value was <= 0.1 (unitless) the classification was set to 0 and > 0.1 the classification was set to 1.0. These classification values 205 were compared to the expected classification value on each iteration to get the accuracy of the training and testing data sets.
The testing dataset is not used when determining the values of , , , and , rather it is used as an independent data source to make sure that the NN is not overfitting the training data. The derived values of , , , and can be found in the supplementary excel file with values of in sheet w1, in sheet b1, in sheet w2, and in sheet b2.
210 Figure 3 shows the 2 as a function of the value calculated by the NN for all three data sets. Fig. 3a shows that the OCO-2 retrievals with calculated values close to 0 have the smallest spread in 2 , while calculated values close to 1 have the largest spread. This pattern is seen in all three of the datasets. The density plot shown in Fig. 3b confirms that for most of the data the calculated values are <= 0.1. There is no clear separation of data (i.e. good retrievals <= 0.1 and bad retrievals >= 0.9) as one would expect from a binary classifier. This could be because there are many combinations of feature 215 values that can lead to a bad retrieval. Another possibility is that most of the training data were classified as good and so there are more examples of good retrievals than bad retrievals to learn from.

Validation
To validate the NN filtering, the validation data set was separated into two data sets. One data set was the OCO-2 bias-corrected XCO2 values filtered using the NN filter and the other was filtered using the B10 qc_flag=0. Since the validation data set was 220 not used in the training of the NN, it is an independent data set kept aside to assess the performance of the NN filter. Table 2 https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License.
shows the bias, scatter, and number of retrievals for the entire validation data set (All) and at each site when applying either the NN or qc_flag filter. The overall XCO2 bias using the NN filter is half of the qc_flag filter, the scatter has been decreased by 0.18 ppm, and the throughput has been increased by 16%. The NN filter reduces the bias at every site except at Eureka and Rikubetsu. The precision is better at every site when the NN filter is applied to the validation data. The throughput has increased 225 at every site, when the NN filter is used, except for the Arctic sites (Eureka and Ny Ålesund). Park Falls data was not used to train the NN filter because it is slightly outside of the Boreal domain and it is used as a completely independent data set to validate the NN filter. When the NN filter is applied to Park Falls data, the bias remains the same, the precision decreases by 0.09 ppm, and the throughput increases by ~20%.

230
The reduction in throughput at the Arctic sites is because the distribution of data at the Arctic sites is different compared to all other sites as shown in Fig. 2b. The peaks of the histograms for the Arctic sites are closer to the boundaries used to classify the training data as "good" or "bad", so almost half of the data is set to "bad" when training the NN. Fig. 4 shows the pass rate for the NN filter given the value of the solar zenith angle (Fig. 4a), sensor zenith angle (Fig. 4b) and altitude standard deviation (Fig. 4c). In all three plots the data are binned, with the blue dots showing the number of OCO-2 soundings that pass the NN 235 filter divided by the total amount of data multiplied by 100 in each bin. The pink bars are the histogram of OCO-2 soundings coincident with Eureka TCCON data. Fig. 4a shows that the coincident OCO-2 soundings are made at solar zenith angles between 58 o to 85 o , with the blue dots showing that 30% to 0% of the soundings that have these values pass the NN filter.
Similarly Fig. 4b shows that the coincident OCO-2 soundings are made at high sensor zenith angles, which are less likely to pass the NN filter. Most of the coincident OCO-2 soundings at Eureka are made over land that contains significant topographic 240 variability. Fig. 4c shows the altitude standard deviation, which is the standard deviation of the elevation (in meters) of the field of viewing of the sounding. The plot shows that at an altitude standard deviation of ~ 50 m only 30% of the soundings pass the NN filter. The combination of high airmass and variable topography decreases the throughput at Eureka.
For further validation, the seasonal bias, scatter, and number of retrievals that pass the filters at each site is compared. Fig. 5  245 shows the bias at each site for spring, summer, fall, and winter when the NN filter is applied to the validation data (solid bars) and also when the qc_flag filter (dashed bars) is used on the same validation data set. For most sites and seasons, the magnitude of the biases for the two different filtering schemes are similar, although in most cases the NN filter has a lower absolute bias compared to the qc_flag filter. The NN filter significantly improves the bias at Sodankylä and Rikubetsu during spring, Ny Ålesund during summer, and Rikubetsu and Bremen during winter. Both the NN filter and the qc_flag show there is a positive 250 bias between OCO-2 and TCCON in summer. The NN filter is able to reduce this summer bias but it still remains. At Park Falls the bias between the two filters is similar for the different scenes, with the qc_flag showing a lower bias in summer and the NN filter decreasing the bias in winter.
https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License. Figure 6 shows the precision at each site for spring, summer, fall, and winter when the different filters are applied to the 255 validation data. The precision is very similar (i.e., within 0.2 ppm) for most sites during the different seasons. The NN filter improves the precision (by more than 0.2 ppm) at Rikubetsu during spring, Eureka and Ny Ålesund during summer, and Białystok and Rikubetsu during fall. However, the qc_flag filter has a much better precision at Sodankylä during spring when compared to the NN filter. 260 Figure 7 shows the number of retrievals that pass each filter for the different sites during spring, summer, fall and winter. At most sites, the NN filter lets through more retrievals compared to the qc_flag filter during spring, fall, and winter. In summer the qc_flag filter has a slightly higher throughput compared to the NN filter at most sites. This decrease in throughput during summer helps improve the bias and precision as seen in Figs. 5b and 6b. There is a significant increase in throughput at East Trout Lake during spring with the NN filter, and it even produces some retrievals in winter. At Park Falls the throughput has 265 increased in spring, fall, and winter but is significantly decreased during summer. The decrease in summer is because the NN filter is trained on data that show a bias during summer, which it decreases by filtering out more data compared to the qc_flag filter. Even though Park Falls is not in the Boreal domain, its scene type (forest) is similar to East Trout Lake. The NN has no information on time of year, but it does have information on the surface type through the albedo values, which change due to the time of year. It's most likely that what the NN learned from the East Trout Lake data is influencing how the NN filters the 270 data at Park Falls.
Some of the increase in throughput with the NN filter during spring, fall, and winter can be explained by the fact that the qc_flag filter tries to filter out spectra that have been recorded over snow scenes . The snow_flag, found in the B10 lite files is used to indicate the presence of snow in the scene. We applied this snow_flag to the NN filtered data to 275 see if the NN filter removes all soundings over snow. From the validation data set, 3219 retrievals have snow_flag = 1, with 785 of those retrievals passing the NN filter. This means that the NN filter passes about 24% of the OCO-2 soundings made over snow. This is much lower compared to the general case (all scenes) where greater than 40% of the data pass both the NN and qc_flag filters. The bias over snow scenes compared to TCCON from all the retrievals that pass the NN filter in the validation data set is 0.13 ± 1.44. Since the precision is lower over snow, it makes sense that the throughput over snow is lower 280 compared to the general case. At Park Falls 1032 soundings that pass the NN filter, with 727 in winter and 302 in spring, are snow scenes. So a significant amount of throughput during winter at Park Falls are made over snow scenes. The bias of snow scenes at Park Falls was found to be 0.12 ± 1.41.
The NN filter was applied to all OCO-2 B10 data at latitudes greater than 45 o N to determine the throughput in the Boreal and 285 Arctic regions. Fig. 8 shows the percent difference (NN minus qc_flag, divided by qc_flag, and multiplied by 100) between the number of soundings that pass the filters. The maximum value for the percent difference was capped at 100%. The throughput with the NN filter is greater than the qc_flag filter in spring and winter, while the throughput with the qc_flag filter https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License. is greater than the NN filter in summer and fall. This is consistent with what is seen at the individual TCCON sites. Over Greenland the throughput has increased with the NN filter regardless of season because qc_flag filter removes all data over 290 Greenland with the snow_flag filter. During fall, the throughput has increased at greater than 70 o N, because the NN filter is letting through soundings that were recorded over snow scenes.

Discussion and Conclusions
In this study, a neural network was used to filter the OCO-2 bias-corrected XCO2 data collected near northern high-latitude TCCON stations as described in Section 3. The performance of the NN filter was assessed by comparing the bias, precision 295 and throughput to the quality control filtered data. There was an improvement in the bias, precision, and throughput both overall and at most sites, as well as improvements in the bias in different seasons. However, the NN filter decreases the throughput at Eureka because it finds that OCO-2 soundings made at high solar zenith angles, high sensor zenith angles, and over topography are problematic.

300
This study shows the potential of using a neural network to filter OCO-2 retrievals that could be useful in future filtering schemes for OCO-2 or other satellite missions. However, there are potential drawbacks to the methodology presented in this study. In this study, we focus on data near northern high-latitude TCCON stations and so do not sample globally representative ranges of surface properties or airmasses. Fig. 1 shows the limited coverage that the TCCON sites provide, with no coverage over Greenland and most of the Eurasian Boreal region. The effectiveness of the NN filter is dependent on how well the NN 305 is trained. We train the NN using OCO-2 data coincident with TCCON data, so the NN filter is trained only under atmospheric conditions observed at the northern high-latitude TCCON sites. We have shown that this way of training the NN is effective when validated against northern high-latitude TCCON data. When the NN filter was applied to Park Falls data, which was not used in the training of the NN, we found that the bias was similar to the qc_flag filter, with a decrease of 0.09 ppm in precision, but a 20% increase in throughput. Although the throughput increased in spring, fall, and winter seasons, it decreased a lot 310 during summer. The decrease in throughput in summer led to improved bias and precision values at all the TCCON sites used in the training of the NN, but not at Park Falls. This is because the NN has found a pattern that improves the training data set which is not as applicable to Park Falls data. The qc_flag filter lets through almost twice as much data compared to the NN filter during summer with a decrease in precision of only 0.09 ppm compared to the NN filter. The NN filter is sub-optimal at Park Falls (during summer) and if one applied the NN filter to data that is not similar to northern high-latitude data used to 315 train the NN it will not be as effective.
The effectiveness of the NN filter is dependent on the data set used to train the NN. In this study we assume that the TCCON data represents the truth and any bias that we see is in the OCO-2 retrievals. The NN is trying to decrease the bias it sees as much as possible and if there is a bias in the TCCON data, it will attribute this to a bias in the OCO-2 data and treat that data 320 https://doi.org/10.5194/amt-2021-145 Preprint. Discussion started: 19 July 2021 c Author(s) 2021. CC BY 4.0 License. as bad. One way to be less influenced by TCCON data is to use a small area approximation, where XCO2 is assumed constant within a small region (O'Dell et al., 2018). While the absolute value of the retrieval cannot be evaluated using a small area analysis, variability within the small area can, and this would vastly increase the dataset size used in the NN, and improve the range of surface properties, atmospheric conditions, and airmasses represented by the training dataset. This small area approach will be investigated in a future study. 325 The accuracy of XCO2 observations over the northern high latitudes and the loss of data there due to filtering has been a longstanding issue with OCO-2 and GOSAT data versions to date, which has limited the scientific community's ability to apply their data to investigate important northern high-latitude carbon cycle science questions. This paper demonstrates that a neural network approach can be used to increase the number of soundings at northern high latitudes, while also improving the 330 bias, precision and throughput depending on the site. Continual efforts at improving northern high-latitude retrievals and filtering will be beneficial not only to current missions, but also to future XCO2 missions like MicroCarb (Pasternak et al., 2017), GOSAT-GW (Kasahara et al., 2020), and CO2M (Sierk et al., 2019), which will make global observations that include northern high latitudes and even more so for missions under consideration like AIM-North (Nassar et al., 2019), which is dedicated to observing the Arctic and Boreal atmosphere. 335

Author Contributions
JM designed the study, developed the neural network code, analysed the results and wrote the paper. RN provided input into the analysis. DW provided insight into the use of TCCON data, comparisons between TCCON and OCO-2 as well as the overall analysis. CO provided critical analysis of choice of coincident criteria, filtering of OCO-2 data, and insight into OCO-2 bias correction. RK, IM, JN, CP, KS, and DW are involved with the operation of the TCCON sites, data processing, and use 345 of TCCON data in this study. All authors read and provided feedback to JM on the paper.

Competing Interests
Kimberly Strong and Justus Notholt are associate editors of AMT. There are no other conflicts of intrest.