Non-dispersive infrared (NDIR) sensors are a low-cost way
to observe carbon dioxide concentrations in air, but their specified
accuracy and precision are not sufficient for some scientific applications.
An initial evaluation of six SenseAir K30 carbon dioxide NDIR sensors in a
lab setting showed that without any calibration or correction, the sensors
have an individual root mean square error (RMSE) between

Carbon dioxide (CO

Recent research efforts have focused more locally and on the use of networks of observing sites that use instrumented towers similar to what is used for global monitoring, but applied to the urban environment (Pataki et al., 2003; Briber et al., 2013; Kort et al., 2013; McKain et al., 2012; Turnbull et al., 2015). High-accuracy observations from these tower sites are then used to create inversions to estimate the total greenhouse gas flux from the urban area in question (McKain et al., 2012; Bréon et al., 2015; Lauvaux et al., 2016). However, due to the cost of these networks being comparable to ones at the global scale, the observation towers are still sited at a relatively low density of typically 3 to 12 sites in a single metropolitan area (McKain et al., 2012; Kort et al., 2013; Turnbull et al., 2015; Bréon et al., 2015). Observing system simulation experiments have found that, depending on the methodology used, a higher spatial density of observations in these urban regions has been shown to better constrain the inversion estimates, even if the absolute uncertainty of the observations is higher (Turner et al., 2016; Wu et al., 2016; Lopez-Coto et al., 2017), but a trade-off between total network cost and inversion constraint must be balanced.

Recently, a wave of small, low-cost sensors, some of which measure trace
gases or particulate matter, in addition to traditional meteorological
variables, using various technologies have become commercially available.
Evaluation and implementation of some of these new low-cost sensors
demonstrate their promise for ambient air monitoring (Eugster and Kling,
2012; Holstius et al., 2014; Piedrahita et al., 2014; Young et al., 2014;
Wang et al., 2015; Shusterman et al., 2016). Many of these instruments are
based on electrochemical reactions to measure the concentrations of trace
gases. With the advent of widely available and low-cost mid-infrared light sources
and detectors, a small group of non-dispersive infrared (NDIR) CO

In this paper, one of these small NDIR CO

To test the validity of using low-cost sensors for scientific applications,
a sensor package was implemented consisting of various off-the-shelf
components. The K30 sensor module (K30) from SenseAir (Sweden) is the
low-cost NDIR CO

Certain commercial equipment, instruments or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

. The K30 is a microprocessor-controlled device with on-board signal averaging and has a measurement range of 0 to 10 000 ppm, observation frequency of 0.5 Hz, and resolution of 1 ppm. The manufacturer's stated accuracy of the K30 sensor isTo compare the performance of the K30 to better-performing research
instrumentation, a greenhouse gas analyzer based on cavity-enhanced
absorption spectrometry (CEAS) was used as the control. The LGR-24A-FGGA
fast greenhouse gas analyzer from Los Gatos Research (LGR, San Jose, CA)
provides CO

It is important to note that there are differences in how CEAS works
compared to NDIR, most notably that the LGR and other CEAS instruments have
a controlled cavity where pressure and temperature are kept nearly constant
(with a standard deviation of under 0.5 torr and 0.1

For data collection, a Raspberry Pi (RPi) computer is used (Raspberry Pi
Foundation, 2015). The RPi is a credit-card-sized (approximately

Photograph of a Raspberry Pi computer (top), a SenseAir
K30 (NDIR) CO

Archiving and comparing multiple datasets proved to be challenging, so steps are taken to ensure that each compared value is at the same observed time. All of the RPis use an internet server to synchronize their time, and the LGR uses an internal clock with battery that was set to the same time as the RPis at the beginning of the experiment. Because of various complications including the exact LGR start time and the potential for delays in the RPi's Linux operating system, the data collection times of each K30 sensor package and the LGR are asynchronous. Additionally, power issues can corrupt parts of the plain text data files stored on the RPi's SD card with random characters. Thus, a post-processing procedure has been developed that filters extraneous characters, and then each dataset is synchronized based on recorded time stamps and averaged over selected time periods. These new datasets can then be directly compared without missing or out of phase data points.

Allan variance (Allan, 1966) is a measure of the time-averaged stability
between consecutive measurements or observations, often applied to clocks
and oscillators. In addition, an Allan variance analysis can be used to
determine the optimum averaging interval for a dataset to minimize noise
without sacrificing signal. Figure 2 shows the Allan deviation (the square
root of the variance) for one K30's raw 2 s data when exposed to a
known reference gas. The original 2 s data show the maximum noise,
with a standard deviation comparable to the manufacturer's specifications of

Allan variance analysis for an NDIR (K30) CO

The need to quickly and effectively evaluate a relatively large number of
sensors under conditions with relatively stable CO

For a continuous period of approximately 4 weeks in spring 2016, six K30
sensor packages as described in Sect. 2 were deployed alongside the LGR in
the rooftop room, all sampling room air. The LGR was also connected to a
mass flow controller and standard tank to periodically provide a reference
for stability (details in Sect. 3). For the reference dataset, the dry
CO

To evaluate the K30 NDIR sensor performance compared to a research-grade
analyzer, first the control dataset needs to be calibrated and corrected for
drift. To calibrate the LGR, after the experiment concluded the dataset was
corrected using a two-point calibration curve derived from using two
NIST-traceable gas standards, one with a CO

In addition to the calibration described above, there was a need to quantify
any drift in the LGR analyzer. During the experiment period, the LGR was
attached to a tee connector, which pulled ambient air from the
aforementioned evaluation chamber using its included pump most of the time,
but received periodic calibration every 23 to 47 h for a period of 1 h, initially, and later 10 min, to conserve the tank, using a
reference tank of breathing air connected to a Dasibi model 5008 calibrator,
which was used to schedule the input of calibration gas. This breathing air
tank is assumed to have a fixed CO

Stability of the Los Gatos Fast Greenhouse Gas Analyzer
shown over a 30-day period. Excess breathing air with a fixed CO

Continuous 1 min time series data during the
evaluation experiment.

In Fig. 3, the ambient data from the LGR have been filtered out to show only
each calibration period performed during the month-long experiment. The data
during each calibration period were averaged (either a total of 10 min or
1 h depending on the calibration period) and the averages are plotted
on Fig. 3. While there is some small variation in the mean mole fraction
observed during each calibration from day to day, there was an upward trend
in the recorded value, by over 1.2 ppm over a 30-day period. This observed
drift, while not insignificant, is well within the manufacturer's
specifications for this analyzer. However, the observed standard deviation
of the 2 s points used in each average (the error bars on Fig. 3)
remained relatively constant throughout the period with a mean standard
deviation of

Figure 4 shows the original time series of data recorded during the
evaluation experiment described in Sect. 2.2. The top panel shows raw
CO

Over this 4-week period, the LGR observed an ambient variation of
CO

From the difference plot (Fig. 4, bottom panel), there are some important
things to note. First and foremost, each individual K30 sensor has a
distinct zero offset. A few of the sensors are approximately the same as the
LGR, but many can have an offset that is as much as 5 % (20 ppm) from the
LGR. The differences between each K30 and the LGR all have standard
deviations between 4 and 6 ppm and RMSEs between
5 and 21 ppm. This means that after accounting for the offset of each
individual K30, the practical accuracy of the K30 CO

In Fig. 4, the difference between the LGR and each K30 is shown in the
bottom panel below time series of environmental data from the evaluation
chamber. Just like in the difference plot, each of the environmental
variables features two distinct timescales of variability. There is a
diurnal cycle of each variable, as well as synoptic-scale variability
attributed to weather systems that occurs on the order of 1 week. Because
the observed CO

Each individual K30 sensor's original observed CO

Calibration curve of K30-1 vs. LGR for 1 min averages
without any environmental correction, only span and zero offset are
corrected. Solid line is the best fit; dashes represent the

This process is repeated for each environmental variable pressure (

A continuous time series of 1 min averages as well as scatter plots for K30-1 compared to the LGR instrument during each step of the successive regression described in Sect. 5.1. Cumulative, in order from top to bottom: the original dataset, after correcting for span and offset, after correcting for pressure, after correcting for temperature and, finally, after correcting for water vapor. The root mean square error (RMSE) of the K30 data compared to the LGR at each step is annotated to the upper left of the scatter plot. This regression contains all data points observed in the evaluation period.

Root mean square error in ppm between the CEAS LGR and each K30 NDIR sensor's 1 min averaged data for the original dataset before correction, at each step of the successive regression correction (correcting for (1) zero/span, (2) atmospheric pressure, (3) temperature and (4) water vapor mixing ratio) and after the multivariate regression correction. Each value shown is for a regression calculated using data from the entire evaluation period.

Alternatively, a multivariate linear regression statistical method can be
used to calculate the regression coefficients for each K30 sensor. This
results in five correction coefficients

Difference plots for K30-1 compared to the LGR during each step of the successive regression described in Sect. 5.1 and shown in Fig. 6 for 1 min averages. Cumulative, in order from top to bottom: the original dataset, after correcting for span and offset, after correcting for pressure, after correcting for temperature and, finally, after correcting for water vapor.

A continuous time series of 1 min averages as well as
scatter plots for K30-1 compared to the LGR for the multivariate
regression described in Sect. 5.2.

There are two observations to note based on the evaluation and analysis.
First, both before and after the multivariate regressions, there are
frequent shifts in the sign of the difference between each K30 and the LGR;
these sudden changes occur at or around sunrise most days. Because of the
rapid change in atmospheric CO

Atmospheric inversion methods often use hourly-averaged data from tower observations (McKain et al., 2012; Bréon et al., 2015; Lauvaux et al., 2016), so after the multivariate regression was applied the K30 and LGR datasets were further averaged to 10 min and hourly datasets. The average RMSE for the six K30s with the 1 min data is 2.3 ppm, 2.0 ppm for 10 min averages and 1.8 ppm for hourly-averaged data. Throughout this analysis period, one of the six K30s evaluated performed consistently worse than the others, and after removing it from the averages the RMSE values dropped to 1.9, 1.6 and 1.5 ppm for 1 min, 10 min and hourly averages, respectively. Thus, by using hourly averages and discarding underperforming sensors, the average RMSE of the difference between the LGR and a K30 NDIR sensor can be reduced to approximately 1.5 ppm.

The RMSE of all six K30 NDIR sensors when compared to the LGR over the entire experiment as a function of how many days the regression analysis was performed. The colored dots represent each K30's RMSE, and the box plot shows the median in red, the first and third quartiles within the box and the min and max values on the whiskers.

The RMSEs described above and in Table 1 are for regressions calculated over the entire experiment period of approximately 4 weeks. One goal of this work is to develop a methodology to evaluate individual sensors quickly so that they can be used in scientific applications. In Fig. 9 the average RMSE calculated over the entire month of all six K30s is plotted with respect to the number of days used in the multivariate regression from Sect. 5.2. While the RMSE is generally minimized with increasing regression length, after a regression period of just a few days the RMSE drops significantly from its initial values. Once a few diurnal cycles of varying amplitude have been incorporated, as well as the synoptic-scale variations in the atmosphere (with a timescale of around 1 week), the regression stabilizes. Thus, a regression length of around 2 weeks is recommended to maximize correction while minimizing the required amount of time the sensor needs to run concurrently with the LGR.

As depicted in Fig. 8, a continuous time series as well
as scatter plots for K30-1 compared to the LGR for the multivariate
regression described in Sect. 5.2.

In Fig. 10, a multivariate regression is applied to the same K30 as
described in the aforementioned sections and shown in Figs. 6, 7 and 8, but
the coefficients are calculated using only data from the first 15 days. The
change in the RMSE between the two regressions is 0.1 ppm, going from 1.8 ppm when using all data points to 1.9 ppm when using only approximately the
first half. This small but not insignificant change is most likely
attributed to the fact that during the first half of the evaluation period,
the ambient CO

All of the final RMSEs calculated in this analysis are from using individual regression coefficients for each K30 sensor. However, it would be beneficial to determine if a generalized set of regression coefficients could be applied to any K30 sensor and what the RMSEs over the evaluation period would be. To calculate the generalized coefficients, the four slopes for each variable as well as the intercepts for each of the five remaining sensors were averaged together, K30-3 was omitted due to the fact that it was the poorest performing sensor and that its coefficients were significantly different from the other five. After correction using the same set of coefficients, the RMSEs of the six sensors ranged from 3.1 ppm to as high as 23.9 ppm. The final RMSEs in some cases were higher than with the original, uncorrected data. Similar results were observed when the multivariate regression coefficients were calculated using the mean concentration of the five sensors. Thus, it appears that for each K30 sensor, an independent evaluation must be completed to provide observations with a sufficient level of quality.

The K30 is a small, low-cost NDIR CO

In the future, further analysis will be performed evaluating the K30 as well
as other low-cost CO

The original data used in this analysis are available in a zip file at

The authors declare that they have no conflict of interest.

We acknowledge support for this project from the FLAGG-MD grant from the NIST Greenhouse Gas Measurements program (Cooperative Agreement no. 70NANB14H333). The authors wish to thank the undergraduate and graduate students at the University of Maryland who helped with this analysis. Additionally, we would like to thank all of the members of the NIST Greenhouse Gas Measurements program including Subhomoy Ghosh, Israel Lopez-Coto, Kimberly Mueller, Kuldeep Prasad, James Whetstone and Tamae Wong for their help.Edited by: Szymon Malinowski Reviewed by: two anonymous referees