The Berkeley Environmental Air-quality and CO 2 Network: ﬁeld calibrations of sensor temperature dependence and assessment of network scale CO 2 accuracy

. The majority of global anthropogenic CO 2 emissions originate in cities. We have proposed that dense networks are a strategy for tracking changes to the processes contributing to urban CO 2 emissions and suggested that a network with ∼ 2 km measurement spacing and ∼ 1 ppm node-to-node precision would be effective at constraining point, line and area sources within cities. Here we report on an assessment of the accuracy of the Berkeley Environmental Air-quality and CO 2 Network (BEACO 2 N) 5 CO 2 measurements over several years of deployment. We describe a new procedure for improving network accuracy that accounts for and corrects the temperature dependent zero offset of the Vaisala CarboCap GMP343 CO 2 sensors used. With this correction we show that a total error of 1.6 ppm or less can be achieved for networks that have a calibrated reference location and 3.6 ppm for networks without a calibrated reference.

2 ppm, making the sensors potentially useful for ambient air-quality monitoring. Recently, Müller et al. (2020) evaluated the potential applications of a low-cost CO 2 NDIR sensor network for resolving site-specific CO 2 signals in Switzerland. The calibration method of Müller et al. (2020) involved laboratory chamber calibrations of over 300 low-cost NDIR CO 2 sensors and ambient co-location with a reference instrument prior to deployment, as well as regular monitoring and drift correction during a 2-year deployment period. Shusterman et al. (2016) developed an in situ method for calibrating and correcting for 60 individual instrument biases and temporal drifts of the Vaisala CarboCap GMP343 CO 2 instruments deployed in the BEACO 2 N nodes. Using this method, Shusterman et al. (2018) demonstrated that the BEACO 2 N network could provide highly sensitive detection of changes to traffic emissions at a scale relevant to policy concerns. Shusterman et al. (2018) also illustrated the efficacy of the BEACO 2 N network in showing both regional CO 2 emissions and local CO 2 enhancements at the scale of a single neighborhood. In an analysis of the BEACO 2 N observations for 6 weeks before and after the COVID-19 shutdown, Turner et al. (2020) showed that a 25% change in emissions is easily derived by an inverse model and that hourly variations in emissions can be inferred.
The use of a large number of low-cost CO 2 sensors introduces challenges regarding accuracy and inconsistent behavior between instruments that often requires labor-intensive regular calibration, data correction and filtering, and validation with comparison to a smaller number of frequently calibrated high-accuracy instruments. In particular, the low-cost NDIR absorp-70 tion sensor used in each BEACO 2 N node (Vaisala CarboCap GMP343) is susceptible to temporal drift and fluctuations due to environmental variables that present challenges to achieving a goal of 1 ppm network error (van Leeuwen, 2010;Shusterman et al., 2016). Correction of the Vaisala CarboCap GMP343 instruments (Vaisala, hereafter) for changes in pressure, temperature, and humidity is required for accurate measurements (Vaisala, 2013). The typical correction for pressure and temperature accounts for changes in the number density of CO 2 according to the ideal gas law (van Leeuwen, 2010;Vaisala, 2013;Shus-75 terman et al., 2016). The humidity effect on measured CO 2 is accounted for by considering the dilution effect of water vapor according to Dalton's law of partial pressures (van Leeuwen, 2010;Vaisala, 2013;Shusterman et al., 2016). However, even after accounting for these factors, reported corrected CO 2 concentrations for the Vaisala instrument have been observed to exhibit a strong temperature dependence of up to 1 ppm/°C (van Leeuwen, 2010). Using a laboratory calibration procedure, van Leeuwen (2010) found that a linear correction was necessary to account for the residual temperature dependence. However, 80 correcting for the temperature dependence using lab calibrations is labor intensive, as the temperature dependence is unique for each Vaisala sensor. Regular laboratory temperature calibration would also be required to account for temporal variations in the temperature correction as sensors age. For a high-density urban network like BEACO 2 N, this would require substantial time investment by trained personnel. The associated high labor costs defeat the purpose of using low-cost sensors. In situ field calibration of the Vaisala sensors thus presents a more attractive method for correcting for the temperature dependence of the 85 CO 2 measurements.

BEACO 2 N network
The Berkeley Environmental Air-quality and CO 2 Network (BEACO 2 N) Bay Area deployment currently consists of 73 nodes spaced at ∼2 km intervals with locations in Alameda, San Francisco, Contra Costa, Sonoma, Sacramento, and Solano counties.
Vaisala CarboCap GMP343 CO 2 sensor, along with a Shinyei PPD42NS nephelometric particulate matter sensor and several Alphasense electrochemical sensors for measuring CO, NO, NO 2 , and O 3 (CO-B4, NO-B4, either NO2-B42F or NO2-B43F, and either Ox-B421 or Ox-B431). The most recent version adds a Plantower PMS 5003 aerosol sensor. Sensors are assembled into compact, weatherproof enclosures with air flow through the enclosure provided by two 30 mm fans. Data is compiled with a Raspberry Pi microprocessor and an Adafruit Metro Mini microcontroller. Data is acquired every 5 or 10 s and is transferred 95 to a central server via an Ethernet or Wi-Fi connection. Observations are posted on the BEACO 2 N website within a few hours of measurement time (http://beacon.berkeley.edu).
The calibration procedure for the Vaisala CarboCap GMP343 CO 2 sensor is as outlined in Shusterman et al. (2016Shusterman et al. ( , 2018. Briefly, deployed Vaisala sensors operate with the internal relative humidity (RH), temperature, and pressure compensation set to "off" and the oxygen correction set to "on", with oxygen input as 20.95%. A post hoc multiplicative scale factor is applied 100 to convert the raw CO 2 outputs to the mole fraction of CO 2 that would be measured if the observed air parcel were dried and brought to standard temperature and pressure ([CO 2 ] ST P ). Raw CO 2 data is adjusted using temperature (T ) measured by the internal thermometer of the Vaisala. Water vapor pressure (P H2O ) and air pressure (P tot ) are obtained from the pressure and dew point temperature measured inside each node enclosure by a Bosch Sensortec Adafruit BME280 sensor. The [CO 2 ] ST P is then adjusted to account for temporal drift in the instrument "zero" by comparing the background signal of the Vaisala 105 CO 2 measurement at each node to a reference Picarro G2301 system, located at the Richmond Field Station in Richmond, CA ( Fig. 1) (Eq. 1-2).
where m t is the temporal drift (ppm day −1] ) and b is a constant atemporal bias.

Picarro reference instrument
A "supersite" with reference grade instruments is operated within the BEACO 2 N Bay Area network to provide a reference for the network calibration. Instruments are installed within a temperature-controlled instrument shelter at the U.C. Berkeley Richmond Field Station. Measurements include basic meterology, NO x (Thermo 42CTL with a molybedum NO 2 to NO convertor), O 3 (Teledyne/API T400), CO 2 , CH 4 , and CO (Picarro G2401 cavity ring down analyzer). Air is sampled through 120 Teflon tubes mounted to a small tower affixed to the trailer roof, for a combined height of 6 meters above the ground. The co-located BEACO 2 N node is attached outside of the trailer to the same tower.
The NO x and Picarro analyzer calibrations are checked against reference gases every two to three weeks.

Identification of a temperature-dependent error in Vaisala measurements
There exists an additional temperature dependence among the Vaisala CarboCap GMP343 instruments that varies between 130 instruments. The temperature dependence was first identified from observations of CO 2 diurnal cycles at certain Bay Area BEACO 2 N sites that were out of phase or larger in magnitude than the diurnal cycles at near-by nodes or measured by the Picarro. The presence of a temperature dependence in suspect Vaisala instruments was confirmed by examining the relationship between temperature in the node and the difference between baseline CO 2 signals measured by the Vaisala and the Picarro reference instrument.

135
Diurnal cycles of urban CO 2 typically exhibit a daily maximum at night or mid-morning (depending on influence from traffic emissions) due to mixing in a shallow nighttime planetary boundary layer (PBL), and reach a minimum during the day as PBL height increases and vegetation takes up CO 2 (Idso et al., 2002;Coutts et al., 2007;Turnbull et al., 2015;Shusterman et al., 2016). The presence of an additional temperature dependence the Vaisala CO 2 instrument is particularly pronounced and obvious in the measurements obtained with the sensor located at the East Bay Municipal Utility District (EBMUD) BEACO 2 N 140 site during 2020 (Fig. 2). The magnitude of the diurnal cycle at EBMUD is larger and out of phase with the Picarro reference instrument (Fig. 2a). The result of this temperature dependence at EBMUD (Fig. 2c) is a diurnal cycle that peaks midday ( Fig.   2b). Figure 2b compares the CO 2 diurnal cycle at EBMUD with the nearby urban site Laney College (Fig. 1). In contrast to EBMUD, Laney College exhibits a daily maximum in the mid-morning-a pattern more consistent with typical urban CO 2 behavior (Idso et al., 2002;Coutts et al., 2007;Turnbull et al., 2015;Shusterman et al., 2016).

145
The Vaisala temperature dependence varies in magnitude and sign. Figure 3 shows the CO 2 mixing ratios and temperature dependence at the Montclair Elementary School site. Compared to the Picarro instrument, this site also demonstrates higher amplitude diurnal cycles (Fig. 3a), but these diurnal cycles are in phase with the reference instrument. Unlike EBMUD, the Montclair site exhibits a negative temperature dependence (Fig. 3c). Figure 3b shows the diurnal cycles at Montclair and the nearby node located at College Preparatory School (CPS). The comparison of these two sites suggests there may indeed be an 150 amplification of the diurnal cycle at Montclair caused by a negative temperature dependence of the Vaisala instrument.

Temperature correction method
The goal of our approach to accounting for temperature dependence of the Vaisala instruments is to rely exclusively on the network itself and, if available, supplementary reference instruments, such as a Picarro, for derivation of correction factors to null sensor temperature dependence.

155
The method we developed builds on our method for accounting for drift in the instrument zero. To derive a temperature factor, we use hourly averaged [CO 2 ] ST P and node measurements of temperature (T ). It is important to note that a major factor contributing to the temperature inside the node is whether the node is placed in the sun or shade.
A linear regression for ∆ [CO 2 ] T 10% against T provides a slope (m T ) and intercept (b T ) for a moving three-week time window. 165 We considered the possibility that the instrument response to temperature could be a zero shift and/or a change in the response to CO 2 . We were able to achieve similar results assuming the temperature effect is entirely due to one or the other of these possibilities. As there is already substantial drift in the instrument zero, we proceed under the assumption that the effect can be entirely attributed to the temperature dependence of the instrument zero. The median of m T is calculated for the deployment period of the Vaisala sensor to determine the temperature-corrected offset and CO 2 mixing ratios of Vaisala CO 2 measurements, 170 based on an additive error correction (Eq 4-5). When it is observed the either the offset bias, the temperature-dependent slope, or the time-dependent drift in the instrument zero shifts dramatically during a deployment period, the deployment is manually separated into different periods that are calibrated separately.
An example calibration, demonstrating m T and [CO 2 ] T of f set over time at EBMUD 2020, is shown in Figure S1. Following calculation of the temperature-corrected offset, the temporal drift slope and intercept of this corrected offset are calculated and corrected using the methods described above, resulting in the generation of the temperature-and drift-corrected CO 2 offset The majority of the BEACO 2 N nodes examined demonstrated a strong linear relationship between ∆ [CO 2 ] T 10% and node temperature. However, the node at Elsa Widenmann Elementary School appeared to show a strong negative temperature dependence only on particularly warm days (Fig. 4a,c). The temperature dependence of ∆ [CO 2 ] T 10% for this node better fit a quadratic than an linear relationship. To account for nodes with a non-linear temperature dependence, in cases where a 185 quadratic fit improves the R 2 of the fit by more than 0.2, the [CO 2 ] T of f set and [CO 2 ] T,drif t corrected are calculated via Eq.7-8.
m 1 T and m 2 T are the first and second terms of the quadratic fit of ∆ [CO 2 ] T 10% against T . We attempted to determine a relationship between Vaisala sensor age and temperature-dependence slope, but m T was only 190 weakly correlated with sensor age (r ≈ 0.3). We did, however find some evidence that older sensors had a larger likelihood of having a larger temperature-dependence. For sensors less than 3 years since their initial deployment, 90% had m T < 1 ppm/°C and 64% had m T < 0.5 ppm/°C. For sensors older than 3 years, 75% had m T < 1 and 47% had m T < 0.5. with and without and adjustment for a temperature-dependent zero offset. With the application of the temperature correction, the magnitudes of the diurnal cycles are reduced and demonstrate much better agreement in amplitude and phase with the Picarro instrument. The resulting diurnal cycle at EMBUD shows a much more typical diurnal cycle for an urban site, with a maximum occuring in the mid-morning (Fig. 5c). At Montclair, the magnitude of the diurnal cycle is reduced, reaching a 200 maximum of ∼ 430 ppm CO 2 during the early morning, and a minimum of ∼ 412 ppm CO 2 during midday-a pattern much more aligned with the diurnal cycle exhibited at CPS (Fig. 3b, Fig. 5c).

Evaluation of calibration
Following confirmation of the effectiveness of the temperature correction method on the sensor deployed at EBMUD in 2020 (EBMUD 2020, hereafter sensors will be referred to following the notation: site year) and Montclair 2018, we examined the temperature-corrected CO 2 data at the Laney College BEACO 2 N site during the spring (March-June) of three different 205 years when different Vaisala CarboCap GMP343 instruments were deployed at the site. Given the hypothesis that the observed temperature dependence is due to temperature-dependent errors in the Vaisala CO 2 signal, a successful calibration should be sensor specific, rather than site specific. Figure 6b demonstrates the different ∆ [CO 2 ] T 10% temperature dependence during three different years with different instrument deployments. Each deployment has a distinct offset and slope of ∆ [CO 2 ] T 10% vs temperature. During all deployment years, the temperature correction results in better agreement between the reference instruments 210 and the Vaisala data (e.g. 6/15/2018), while preserving local signals (e.g. 4/14/2020) (Fig: 6a). The correction is also effective for the data record before deployment of the Picarro reference instrument in August 2017, when the Exploratorium CO 2 Buoy, located in the San Francisco Bay, was used as a reference instrument (Fig. 6a). The correction of the CO 2 diurnal cycle at Laney College is most notable during 2017, although midday levels of CO 2 are reduced in the corrected data for 2018 and 2020 as well (Fig 6c). Elementary School were compared. The resulting temperature dependent percent differences of CO 2 between adjacent sites 220 are reduced to approximately 0-2% from 1-5% (Fig. S3, S6). Temperature corrections also result in better agreement in CO 2 mixing ratios between adjacent sites in Richmond ( Fig. 7 and Fig. S2) and in Vallejo (Fig. S4, S5). The results were identical when a multiplicative correction term, rather than additive, was considered (e.g. if the temperature effect was assumed to be on the CO 2 signal magnitude rather than entirely on the instrument zero).

Comparison of nearest-neighbor sites 225
To assess the improvement in the network precision following application of the temperature-dependence correction, we combined observations from the entire Bay Area network using data from all of 2020. All sites with available data for more than one month of 2020 were included. Nearest neighbor pairs of each site were identified, where nearest neighbors to an individual site were considered as the closest BEACO 2 N sites within a 2 km radius of the site. There are 53 unique nearest neighbor pairs.
For each nearest neighbor pair X and Y , an array of the fractional differences between sites were calculated as: This was done using both the measurements before and after correction for temperature-dependent instru- Figures 8a and 8d show the fractional differences of each nearest neighbor pair as a histogram calculated using [CO 2 ] drif t corrected and [CO 2 ] T,drif t corrected , respectively. Most nearest neighbor site pairs exhibit a distribution of fractional differences centered close to zero, with both positive and negative tails (Fig. 8a,d). Further analysis was performed to confirm that the temperature correction method eliminates any temperature-dependent disagreement between nearest neighboring sites. The nearest neighbor fractional differences of CO 2 data were separated into 2°C temperature bins. For each temperature bin, the absolute value of the mean fractional difference between each nearest 245 neighbor pair, using either [CO 2 ] drif t corrected or [CO 2 ] T,drif t corrected , was calculated. We then averaged the mean fractional difference in each temperature bin over all nearest neighbor pairs. A plot of the resulting network mean percent difference vs. temperature is shown in Figures 8c and 8f, using [CO 2 ] drif t corrected and [CO 2 ] T,drif t corrected data, respectively. In the original data, the mean percent differences were greatest at both high and low temperatures. In the temperature-corrected data, there is no clear dependence of nearest neighbor mean percent differences on temperature. The mean percent difference at all temperatures is also reduced.

250
4 Assessment of the network error Turner et al. (2016) suggested that a network uncertainty of ∼1 ppm CO 2 would be compatible with relevant constraints on point, line, and area CO 2 sources of 147, 45, and 9 tC hr −1 , respectively. Assessing network error in the field is, however, a complex problem. We approach the problem by exploring differences between adjacent nodes, which should be an upper limit to the uncertainty. Although the site-to-site variation is strongly influenced by local emissions sources, there are also 255 strong correlations with changes in urban-, synoptic-, and global-scale CO 2 signals that are spatially coherent across pairs of adjacent nodes. Variances between adjacent nodes are due to a combination of true site-specific signals and instrument biases.
It is therefore difficult to know the minimum variance in adjacent nodes for a hypothetical "perfect" measurement. For nearest neighbor sites, the majority of the CO 2 signal should show near zero difference, representing the background signal. In the observation record we would also expect moments when either site in a pair has a larger signal, driven by local emission 260 sources and meteorology. Sites closer to the highway also typically have larger CO 2 signals . In the following section we describe a procedure for evaluating network error and summarize the improvements following inclusion of the temperature-correction described above.

Site variance and correlation lengthscales
To evaluate the network error, a semivariogram was constructed for [CO 2 ] T,drif t corrected (Fig. 9). Using data from all sites with 265 more than three days of available data during the summer of 2020, we calculated the semivariance between CO 2 measurements at each BEACO 2 N node and all other sites in the Bay Area network. Summer months were chosen because the average and diurnal variability of CO 2 mixing ratios are reduced, meaning that measured site variances are relatively more influenced by instrument error, rather than by "true" atmospheric variance, than in the winter. In Figure 9 the square root of the semivariance is plotted against the distance separating the BEACO 2 N nodes and fitted with an exponential model. The Picarro reference 270 instrument at the Richmond Field Station was included in this analysis.
Using the root semivariance as a correlation metric, in temperature-corrected data, the e-folding length scale for variation is 1.2 ± 0.3 km (1.7 km ± 0.7 km using semivariance as a correlation metric, not shown), supporting the BEACO 2 N hypothesis that 2 km node spacing in a dense network will capture important elements of local variability. The temperature-correction results in a maximum root semivariance of 5.5 ± 2 ppm (reduced from 8 ppm in the uncorrected data). Extrapolated to a 275 distance of zero, the temperature correction method has a predicted root semivariance of 1.3 ± 0.9 ppm, representing the network error. This analysis suggests that the desired ∼1 ppm network error has been achieved with the application of the temperature-correction.
Length scales for correlations (r 2 ) between sites calculated by Shusterman et al. (2018) during the summer 2017 were larger than the 1.2 km length scale identified here for root semivariance (1.7 km for semivariance). To more directly compare, we 280 also performed the method of Shusterman et al. (2018) on the temperature-corrected CO 2 data for the summer of 2020. We examined the correlation of CO 2 concentrations for every pairing of Bay Area sites during this period for all hours, during the day, and during the night (Fig. S7). The e-folding distance for the decay of r 2 correlation coefficients was 2.8 km for all times, 3.7 km during the day, and 2.8 km at night. This is in good agreement with the length scales of 2.9 km at all times, 3.6 km during the day, and 2.2 km at night found by Shusterman et al. (2018). The base-line correlation for sites separated by more

Contribution of instrument error to site variance
We can represent the network instrument error also by examining the sources contributing to the semivariance between nearest 290 neighboring sites. The semivariance (γ nn ) of nearest neighboring sites can be expected to have contributions from both "true" variations in emissions and meteorology and erroneous differences caused by instrument error. To estimate the portion of the semivariance resulting from atmospheric phenomenon, an analogous quantity for the hourly variations in CO 2 was calculated for each site according to Eq. 9: N is the number of hours of data and [CO 2 ] h , and [CO 2 ] h+1 are the measured mixing ratios of CO 2 at each hour and one hour later, respectively. The individual instrument error was then calculated as: The resulting upper-bound instrument error from the median of individual instrument errors for the Bay Area network is 2.5 ± 0.5 ppm. (This estimate for non-temperature corrected data is 4.5 ± 0.9 ppm). We consider this an upper bound because hourly 300 variations in the CO 2 signal reflects the atmospheric changes at an individual site, which may not match with the atmospheric changes at the nearest neighbor sites. Variations in emissions or wind velocity, may result is larger "true" differences between a site and its nearest neighbor than are represented by the site's hourly variability.
To reduce the influence from short-term atmospheric variations, the network error was also estimated using an individual site's root mean squared error (RMSE i ) as a metric for "true" atmospheric variation (Eq. 11) and a "paired" RMSE 305 (RMSE paired ) using the mean CO 2 signal of its nearest neighbor site ( nn [CO 2 ]) as a measure of total variation (Eq. 12).
The site error was then calculated according to Eq. 13 The resulting network instrument errors were between 0.5 ppm and 4 ppm, with a median of 1.6 ± 0.4 ppm, in good agreement with the error calculated from the semivariogram fit. Based on these analyses, we estimate the network error of the Bay Area BEACO 2 N network to be less that 1.6 ppm, close to our stated goal of 1 ppm network error.

Application to other city networks
The BEACO 2 N network has recently been extended to several other cities, and will further expand to additional locations in coming years. Currently, locations where BEACO 2 N nodes are deployed (in addition to the Bay Area) are Houston (19 nodes, network start 11/2017), Glasgow in collaboration with the University of Strathclyde (>20 nodes, network start 5/2021), New York City (8 nodes, network start 4/2018), and Los Angeles, in collaboration with the University of Southern California (12 320 nodes, network start 5/2021). The goal of the network is to be self-calibrated, as not all locations at which the nodes will be deployed have a highly precise and frequently calibrated reference instrument. As such, an alternative method of obtaining a reference for the determination of drift, offset, and temperature dependence is needed.

Bay Area Tests
We observe good agreement between the Picarro reference instrument during 2020 and [CO 2 ] med ST P (Fig. 10). The mean percent difference considering all 2020 data is 0.46%, representing an accuracy error of 2 ppm at 420 ppm CO 2 (Fig. 10d). We also do not see evidence of a temperature-dependent offset between the Picarro reference instrument and [CO 2 ] med ST P . The precision of the Bay Area network is negligibly affected when the network median is used as the reference, with the mean of the absolute value of the average fractional differences of all nearest neighbor pairs equal to 0.015 ± 0.008 (compared to 0.013 ± 0.007 with the Picarro as reference) (Fig. S8). The resulting maximum root semivariance is 5.5 ± 2 ppm 335 and extrapolated root semivariance at zero km separation is 0.8 ± 0.9 ppm, respectively, approximately equal to the values calculated when the Picarro is used as a reference. The network accuracy is however, more appreciably altered. Figure 11 shows the fractional difference between [CO 2 ] T,drif t corrected determined using the Picarro and [CO 2 ] med ST P as a reference at each site. The resulting mean percent difference is 0.51 ± 0.02 %, representing a network accuracy error of 2 ppm at 420 ppm CO 2 .
This accuracy error is mainly driven by small differences in the offsets (2 ppm on average) and m T (0.2 ppm/°C on average, 340 see Supporting Information) between [CO 2 ] T,drif t corrected calculated using the Picarro and [CO 2 ] med ST P as a reference. These results suggest that the network precision can be expected to remain near 1 ppm CO 2 with the use of [CO 2 ] med ST P as a reference, but additional accuracy error of 2 ppm may be introduced.
Analysis of the Bay Area network was performed on the 36 nodes with sufficient data availability for 2020. However, the newly established networks have fewer nodes than in the Bay Area. To use [CO 2 ] med ST P as a reference, we must have sufficient 345 nodes from which to calculate the network median. To evaluate this, for n = 1-26, a random subset of n Bay Area nodes was selected 100 times. For each of the 100 random subsets of n nodes, the mean fraction difference was calculated between the network median CO 2 and the median calculated using the subset. The average and standard error of the 100 mean fraction differences was then calculated. The results of this analysis are presented in Figure 12. We suggest that a minimum of 7 nodes with m T less than 1 ppm/°C is required for the accuracy error to be lower than 2%. For less than 1% error, at least 12 nodes 350 are required.

Houston
Data from the Houston network was subsequently calibrated using [CO 2 ] med ST P as a reference for determination of temperature dependence, drift, and offset. Temperature dependence calibration of each site in the Houston network was performed twice. All sites were first included in [CO 2 ] med ST P and sites with m T greater than 1 ppm/°C were identified. These sites were then excluded 355 from [CO 2 ] med ST P and each site was re-calibrated. Histograms of the fraction differences between nearest neighbor sites are shown in Figure 13. The average mean percent difference between nearest neighbors was 2 ± 1 %. Though considerably larger than the differences between nearest neighbors in the Bay Area network, it is not immediately clear whether this difference is caused by greater precision error in Houston, or differing meteorology and CO 2 sources that cause greater differences between CO 2 mixing ratios at adjacent sites. We attempted to perform a similar instrumental error analysis, but there are currently insufficient overlapping CO 2 data in Houston for uncertainty analysis. However, we do not have reason to expect the instrument errors would be any larger in the Houston network.

Conclusions
We have assessed the accuracy of the BEACO 2 N network following in situ calibration of the temperature-dependence in Vaisala CO 2 sensors. We report a network instrument error of 1.6 ppm CO 2 or less, near the desired 1 ppm network error suggested by 365 Turner et al. (2016).
A method for correcting Vaisala instrument temperature dependence in BEACO 2 N has been established and evaluated using sites across the San Francisco Bay Area network. The method corrects observations from individual instruments so that they exhibit a temperature dependence in their lowest temperature-based 10 th percentile of CO 2 data that is equivalent to that of a reference site, thus correcting erroneous instrument temperature dependence while preserving true diurnal cycles and local 370 signals. This field calibration of temperature dependence can be entirely internal to the network and does not necessarily require a reference instrument, although the addition of a reference instrument provides greater network accuracy. The implementation of the temperature correction method produces more reasonable diurnal cycles, diurnal cycles that are maintained for sites influenced by similar emissions sources, and better agreement between adjacent sites. We additionally describe methods for characterizing network scale uncertainties and site-to-site biases. The average variation between adjacent sites was found to 375 be 1.3% following implementation the temperature correction (compared to 2.5% prior to the correction). The temperature correction greatly improves the precision of CO 2 measurements in the BEACO 2 N network.
We show that the network precision can be maintained at 1.3% even in locations without a high-cost reference instrument, using the network median as a reference, provided that there are at least 12 sites with small temperature dependencies. This has important implications for the expansion of BEACO 2 N to additional cities globally, as well as for other dense low-cost CO 2 380 networks. However, without a reference instrument, the network accuracy error is larger and is ∼ ± 2 ppm.
Author contributions. JK, HLF, CN, and PW collected the data used in this analysis. ERD composed the manuscript and designed and executed the analysis in consultation with JK and KW. KW also aided with data processing and implementation of the temperature calibration method. JK and RCC provided additional manuscript feedback and RCC supervised the project.    is the average and 95% confidence interval of the absolute values of each neighboring pair's mean fractional difference. b) Histogram of the fractional differences between all aggregated nearest neighbors sites with the temperature correction applied. The mean and error indicated is the mean and 95% confidence interval, respectively, of the distribution.