Articles | Volume 15, issue 11
Atmos. Meas. Tech., 15, 3353–3376, 2022
Atmos. Meas. Tech., 15, 3353–3376, 2022
Research article
09 Jun 2022
Research article | 09 Jun 2022

Performance characterization of low-cost air quality sensors for off-grid deployment in rural Malawi

Performance characterization of low-cost air quality sensors for off-grid deployment in rural Malawi
Ashley S. Bittner1, Eben S. Cross2, David H. Hagan2, Carl Malings3, Eric Lipsky4, and Andrew P. Grieshop1 Ashley S. Bittner et al.
  • 1Department of Civil, Construction and Environmental Engineering, North Carolina State University, Raleigh, NC 27606, USA
  • 2QuantAQ, Inc., Somerville, MA 02143, USA
  • 3NASA Postdoctoral Program Fellow, Goddard Space Flight Center, Greenbelt, MD 20771, USA
  • 4Department of Energy Engineering, Penn State Greater Allegheny University, McKeesport, PA 15132, USA

Correspondence: Andrew P. Grieshop (


Low-cost gas and particulate matter sensor packages offer a compact, lightweight, and easily transportable solution to address global gaps in air quality (AQ) observations. However, regions that would benefit most from widespread deployment of low-cost AQ monitors often lack the reference-grade equipment required to reliably calibrate and validate them. In this study, we explore approaches to calibrating and validating three integrated sensor packages before a 1-year deployment to rural Malawi using colocation data collected at a regulatory site in North Carolina, USA. We compare the performance of five computational modeling approaches to calibrate the electrochemical gas sensors: k-nearest neighbors (kNN) hybrid, random forest (RF) hybrid, high-dimensional model representation (HDMR), multilinear regression (MLR), and quadratic regression (QR). For the CO, Ox, NO, and NO2 sensors, we found that kNN hybrid models returned the highest coefficients of determination and lowest error metrics when validated. Hybrid models were also the most transferable approach when applied to deployment data collected in Malawi. We compared kNN hybrid calibrated CO observations from two regions in Malawi to remote sensing data and found qualitative agreement in spatial and annual trends. However, ARISense monthly mean surface observations were 2 to 4 times higher than the remote sensing data, partly due to proximity to residential biomass combustion activity not resolved by satellite imaging. We also compared the performance of the integrated Alphasense OPC-N2 optical particle counter to a filter-corrected nephelometer using colocation data collected at one of our deployment sites in Malawi. We found the performance of the OPC-N2 varied widely with environmental conditions, with the worst performance associated with high relative humidity (RH >70 %) conditions and influence from emissions from nearby residential biomass combustion. We did not find obvious evidence of systematic sensor performance decay after the 1-year deployment to Malawi. Data recovery (30 %–80 %) varied by sensor and season and was limited by insufficient power and access to resources at the remote deployment sites. Future low-cost sensor deployments to rural, low-income settings would benefit from adaptable power systems, standardized sensor calibration methodologies, and increased regional regulatory-grade monitoring infrastructure.

1 Introduction

Ambient air pollution is a leading cause of morbidity and premature mortality in sub-Saharan Africa (SSA) (Murray et al., 2020). Air pollution in SSA is expected to increase over time given regional growth in population and energy demand combined with a biomass-fuel-dominated energy mix (Shikwambana and Tsoeleng, 2020; Stevens and Madani, 2016; Liousse et al., 2014; Amegah and Agyei-Mensah, 2017). However, regulatory air quality (AQ) monitoring is uncommon in many SSA countries, partially due to the high cost of reference-grade equipment (Amegah, 2018; Petkova et al., 2013). Remote sensing is a valuable tool to address these data gaps (El-Nadry et al., 2019), but satellite observations alone have various shortcomings relative to in situ measurements (Martin et al., 2019). Additional validation with reliable surface measurements is required, particularly in SSA (Malings et al., 2020; Hersey et al., 2015). In the meantime, low-cost gas and particulate sensor packages provide an affordable, compact, and easily transportable approach to supplement air quality networks in regions where reference grade instrumentation is not accessible. Malawi, located in southeastern Africa, provides a relevant context to investigate how low-cost sensors (LCSs) can be used to address the global dearth of AQ observations. The Malawi Bureau of Standards published ambient air quality limits based on World Health Organization guidelines in 2005 (Mapoma and Xie, 2013; Malawi Bureau of Standards, 2005), but there is no regulatory air quality monitoring program in the country to date. Previous studies of AQ in Malawi have primarily focused on indoor air quality or were unable to capture long-term trends (Fullerton et al., 2009, 2011; Jary et al., 2017; Mapoma and Xie, 2013). A dependable and affordable LCS monitoring network in Malawi could provide data to monitor the evolution of air quality and establish baselines for future AQ management.

Given the potential applications, LCS deployments are becoming common (Giordano et al., 2021). However, as the cost of LCSs decreases, so too will their selectivity, linearity, and accuracy. Electrochemical gas sensors are prone to interference and cross-sensitivities. Interference occurs when sensors respond to changes in temperature (T) and relative humidity (RH). Cross-sensitivities occur when sensors respond to the presence of gases other than the target analyte (Lewis et al., 2016; Mead et al., 2013). Failure to properly account for these during calibration can result in substantial measurement error under ambient conditions (Lewis et al., 2016; Cross et al., 2017; Castell et al., 2017; Mead et al., 2013). The calibration and application of LCS technologies to augment existing regulatory monitoring networks has been widely explored (Cross et al., 2017; Hagan et al., 2018; Malings et al., 2019a, b; Mead et al., 2013; Zimmerman et al., 2018; Li et al., 2021), but historically there has been little standardization in calibration approach or performance evaluation (Castell et al., 2017; Duvall et al., 2021a, b; Morawska et al., 2018; Rai et al., 2017). In response to this, the U.S. Environmental Protection Agency (EPA) recently released two reports outlining testing protocols, metrics, and target values to evaluate the performance of ozone and fine particulate matter (PM2.5) sensors for non-regulatory supplemental and informational monitoring applications in the U.S. (Duvall et al., 2021a, b). Unfortunately, there is no similar guidance for validating LCSs for deployments in settings without in situ regulatory monitors. The deployment and evaluation of LCS packages in areas without existing AQ monitoring infrastructure is a growing research area (Chatzidiakou et al., 2019; Hagan et al., 2019; Subramanian et al., 2020, 2018). A lack of in situ regulatory monitors requires colocation, calibration, and validation at another site, potentially under a set of environmental conditions different from those of the target deployment environment. Advancements in laboratory chamber calibration may help resolve this issue. In a controlled environment, gas sensors can be exposed to and calibrated for a range of environmental conditions (i.e., gas concentration, RH, T, pressure, etc.), which may allow LCS cross-sensitivity and interference to be measured and controlled for before deployment (Williams et al., 2014b; Spinelle et al., 2016, 2015; Lewis et al., 2016). However, studies of low-cost particle sensors have observed better performance under laboratory versus field conditions (Rai et al., 2017). For example, previous long-term field assessments of the Alphasense OPC-N2 optical particle counter have observed large variability with changing seasons, environmental conditions, and background pollution levels (Bulot et al., 2019; Rai et al., 2017; Sousan et al., 2016). Low-cost optical particle sensors can systematically overestimate mass concentrations under high RH (>70 %) conditions due to hygroscopic growth of the particles (Crilley et al., 2018; Di Antonio et al., 2018), with errors ranging from 100 % to 500 % depending on aerosol hygroscopicity (Hagan and Kroll, 2020). Further, the complex chemical, physical, and optical properties of aerosol can complicate the field evaluation of low-cost particle sensors. For the Alphasense OPC-N2, particle composition may impact the sensor output by as much as a factor of 30 (Rai et al., 2017; Sousan et al., 2016). A recent modeling effort by Hagan and Kroll (2020) found that the optical properties and particle size distribution of the source aerosol can result in errors of up to 100 % and 90 %, respectively, in mass measurements made by low-cost optical particle sensors. Measurement errors were highest for strongly absorbing aerosol dominated by small (<300 nm) particles. These traits can be characteristic of aerosol emitted by biomass burning (Reid et al., 2005), a dominant source of ambient PM throughout SSA (Marais and Wiedinmyer, 2016; Queface et al., 2011; Liousse et al., 2014). Therefore, stringent quality assurance is necessary to ensure the validity of LCS particle measurements in this environment.

In this study, we calibrated and evaluated the “ARISense”, a moderate-cost integrated gas, particle, and meteorological sensor package (Aerodyne, Inc.) for long-term field deployment to Malawi. Our overarching goal was to assess the viability of augmenting and maintaining a small, temporary network of LCS monitors, until a more formal governmental regulatory monitoring system can be established. Given that comparison to regulatory grade equipment in Malawi was not possible, the objective of this work was to devise an alternative methodology to evaluate the ARISense technology (Sect. 2.1) for accuracy, precision, and stability over the 1-year pilot deployment. In Sect. 2.3 and 2.4, we describe colocations of the gas sensors (in North Carolina, USA) and particle sensor (in Mulanje, Malawi) with reference or semi-reference instruments (described in Sect. 2.2). We use colocation data and quantitative assessment metrics (described in Sect. 2.5) to compare the performance of five modeling approaches to calibrate the gas sensors (Sect. 3.1) and to estimate error in the particle sensor data (Sect. 3.2). After deployment to Malawi (described in Sect. 2.6), we qualitatively assess how the ARISense performed in the field using contextual information about nearby emission sources, diurnal trends, and an intercomparison of calibrated gas model observations (Sect. 3.3 and 3.4). In Sect. 3.5 and 3.6, we compare the deployment results to remote sensing and reanalysis data products and to surface measurements from similar environments in SSA. Finally, in Sect. 3.7, we qualitatively assess the long-term stability of the sensor readings and calibration models in Malawi by comparing ambient data collected 1 year apart at the same location. In concluding (Sect. 4), we draw on these pilot results to characterize the benefits, limitations, and robustness of this technology and methodology for our application: collecting AQ data in understudied and low-resource regions. Additionally, we offer guidance on considerations to improve future remote deployment efforts. Detailed analysis and discussion of more than 3 years of data collected in Malawi will be presented in a forthcoming complementary publication.

2 Methods

The ARISense sensor packages were colocated with reference instruments in North Carolina (NC) before and after deployment to Malawi. One ARISense monitor was colocated with a semi-reference PM instrument at a deployment site in Malawi to assess the performance of the integrated OPC-N2. Instrumentation, colocation, and calibration are covered in Sect. 2.1–2.4. Performance assessment metrics are given in Sect. 2.5. Calibrated ARISense monitors were deployed to Malawi (Sect. 2.6) and compared to remote sensing data products (Sect. 2.7).

2.1 ARISense sensor packages

The ARISense monitoring package (Fig. S1 in the Supplement) integrated the following sensors from Alphasense Ltd., UK: carbon monoxide (CO-B4), nitric oxide (NO-B4), nitrogen dioxide (NO2-B43F), total oxidants (Ox-B421), and the OPC-N2 optical particle counter. The ARISense package reported voltage readings from electrochemical gas sensor working electrodes (WEs) and auxiliary electrodes (AEs). Sensor differential voltage (ΔV) was calculated as WE–AE. The Alphasense OPC-N2 recorded counts in 16 size bins spanning particle diameters from 0.38 to 17.5 µm, meaning the OPC-N2 primarily measures coarse-mode aerosol particles (>2µm) and some accumulation-mode (0.1 to 2 µm) aerosol particles (Badura et al., 2018; Crilley et al., 2018; Sousan et al., 2016). Although the OPC-N2 has embedded algorithms to convert count measurements into mass concentrations of PM1.0, PM2.5, and PM10 (particulate matter with aerodynamic diameters less than 1.0, 2.5, and 10 µm, respectively), the bin count data were manually integrated, converted to number concentration (cm−3) assuming unity measurement efficiency across the bin range and then to mass concentration assuming spherical particles with uniform density (1.65 g cm−3). The values reported for PM2.5 are PM2. The location of the adjacent bin separations at 2.0 and 2.99 µm did not allow for direct estimates of PM2.5. However, this was only one of many contributing sources of error in approximating true mass concentration with the Alphasense OPC-N2. Given the minimum cut-off diameter, we were unable to measure (nor did we try to estimate) the mass from particles smaller than 0.38 µm.

We used four ARISense monitors in this study: serial numbers ARI013, ARI014, ARI015 (v1.0, 2017), and ARI023 (v2.0, 2018). The monitors were powered by solar panels charging external batteries and recorded data to an internal USB device. Details and images are provided in Sect. S1 of the Supplement. Additional environmental and meteorological sensors (i.e., T, RH, pressure, solar intensity, and noise) and the system design are described in Cross et al. (2017).

2.2 Reference instrumentation

Gas concentration measurements for NOx/NO/NO2 (Teledyne Model T200UP), CO (Thermo Scientific Model 48i-TLE), and ozone (Ecotech Federal Equivalent Method instrument) were obtained from reference instruments operated by the North Carolina Department of Environmental Quality (NC-DEQ) and the U.S. EPA.

The semi-reference MicroPEM (RTI International) instrument was used to assess the performance of the OPC-N2 in Malawi. The MicroPEM, equipped with T and RH sensors, sampled (0.50 L min−1, 100 % duty cycle) via a PM2.5 inlet into a nephelometer (0.1 Hz) and 25 mm PTFE filter. In previous evaluation studies, after gravimetric correction, the MicroPEM real-time nephelometer agreed with fixed-site reference monitors across a wide range of ambient PM concentrations (Du et al., 2019; Williams et al., 2014a). However, deployments observed baseline (zero) drift and poor performance at RH conditions above 94 % (Williams et al., 2014a; Zhang et al., 2018). To account for baseline drift, the MicroPEM was zeroed before each deployment using a HEPA filter. Additional details on the MicroPEM sensor, filter analysis, and quality assurance are provided in Sect. 1 of the Supplement.

2.3 Gas sensor colocation and calibration

Before deployment to Malawi, ARI013, ARI014, and ARI015 were colocated with EPA and NC-DEQ reference instruments (Fig. S2) at a near-highway site near Durham, North Carolina, USA (35.865 N, 78.820 W) between 29 May and 15 June 2017 (boreal summer, i.e., a warm, mild season). ARI013 and ARI014 were colocated for 17 d. ARI015 was colocated for only 8 d due to a defect identified early in the colocation. All data were recorded at 1 min resolution. Colocation site details are provided in Sect. 2 of the Supplement.

The pre-deployment colocation data were used to train, assess, and compare the performance of five modeling approaches to convert the raw voltage data to concentration units and to account for sensor interference and cross-sensitivities. Outlying data points in the raw ARISense gas sensor voltage data due to noise and power cycling were visually identified and removed. Raw NO sensor data collected within 8 h of a power cycle were also removed due to the extended warmup time of the NO-B4 sensor. ARISense data were time-aligned with the reference data, and both data sets were averaged to 5 min resolution. A random 70 % of the colocation data were used for model training, and the remaining 30 % were withheld for testing. Performance assessment metrics were calculated only for the withheld data.

Individual calibration models were built for each gas sensor (Ox, NO, NO2, CO) in each monitor (ARI013, ARI014, ARI015) using five modeling approaches: k-nearest neighbors (kNN) hybrid (Hagan et al., 2018), random forest (RF) hybrid (Malings et al., 2019a), high-dimensional model representation (HDMR) (Cross et al., 2017), quadratic regression (QR) (Malings et al., 2019a), and multi-linear regression (MLR). The five models were selected for consideration based on their performance in previous studies. The kNN hybrid model was found to enable accurate measurements even when pollutant levels were higher than encountered during calibration (Hagan et al., 2018). Given that we expected levels of some pollutants to be higher in Malawi than during calibration in NC, we expected kNN hybrid models to be well suited for our application. Further, the kNN hybrid approach is expected to be widely applicable to a range of pollutants, sensors, and environments (Hagan et al., 2018). In a calibration and validation study conducted by Malings et al. (2019a), RF hybrid models were recommended for any low-cost monitor using electrochemical sensors similar to their sensor package, the Real-time Affordable Multi-Pollutant (RAMP) monitor. Given that the RAMP and ARISense monitors use the same electrochemical sensors and have similar integrated designs, we expected RF hybrid models to perform well for our data set. HDMR models were found to effectively model interference effects derived from the variable ambient gas concentration mix and changing environmental conditions over three seasons for the sensor types used in the ARISense package (Cross et al., 2017). Finally, MLR and QR are simple, popular calibration approaches, and they were included in this study for that reason.

The modeling inputs are summarized in Table 1. O3 models were designed to account for sensor cross-sensitivity to NO2 (Cross et al., 2017). Note that references to “O3” indicate estimates made from calibrating the Ox sensor data. References to “Ox” indicate raw voltage measurements from the total oxidant sensor. “Ozone” is used when referring to the gaseous air pollutant. For our study, the CO HDMR models were set to allow only first-dimensional interactions, as second-order interactions were observed to lead to spurious results for data collected outside the bounds of training data (see Sect. 3.3 for more information on deployment conditions). For the CO sensors, this effectively made the HDMR model equivalent to the MLR model. Therefore, the statistical metrics achieved by both models were identical and are shown as overlaid points in Fig. 2a.

Table 1Calibration modeling inputs for each gas sensor (CO, carbon monoxide; NO, nitrogen oxide; NO2, nitrogen dioxide; Ox, oxidants) and model combination, where “all” indicates k-nearest neighbors (kNN) hybrid, random forest (RF) hybrid, high-dimensional model representation (HDMR), multi-linear regression (MLR), and quadratic regression (QR). ΔV is the voltage difference between the working electrode (WE) voltage and the auxiliary electrode (AE) voltage measured by each electrochemical gas sensor. RH stands for relative humidity, T stands for temperature, and DP stands for dew point.

a  kNN hybrid only.
b RF hybrid only.

Download Print Version | Download XLSX

2.4 OPC-N2 colocation and calibration

ARI023 was colocated with a MicroPEM in an ambient, combustion-source-influenced environment on a house rooftop (4 m a.g.l.) in Mikundi village in Mulanje District, Malawi (16.056 S, 35.535 E) between 25 July 2018 and 7 August 2018 (austral winter – cool, dry season). We collected 130 h of colocation data over three multi-day collection periods (i.e., three PTFE filters). A 75 % completeness requirement was applied before the raw 1 min data were averaged to 1 and 24 h intervals. Sub-daily averaging intervals were used to assess the OPC-N2 for near real-time (1 min) and diurnal trend (1 h) monitoring applications. A bin-wise RH-correction algorithm based on κ–Köhler theory was applied to correct for hygroscopic growth under high RH conditions, initially assuming particle density (ρ) equal to 1.65 g cm−3 and aerosol hygroscopicity (κ) of 0.6 (Di Antonio et al., 2018). To observe sensitivity of this correction to the assumed hygroscopicity, the density was held constant at 1.65 g cm−3 and the κ value was varied (κ= 0.15, 0.6, and 1). To observe variability due to the assumed source of the aerosol, the density and hygroscopicity were varied to approximate ammonium nitrate, dust, wildfire, and background aerosols. Aerosol property assumptions (κ and density) are based on Hagan and Kroll (2020) and Petters and Kreidenweis (2007).

2.5 Assessment metrics

We adapted performance metrics and target values from recently published U.S. EPA guidelines (Duvall et al., 2021a, b) to assess ARISense performance (Table S1 in the Supplement). The EPA guidelines suggest using linearity, bias, precision, and error metrics to assess air sensor performance, and they offer target values for each. We use the U.S. EPA target values as quantitative markers to indicate satisfactory or unsatisfactory sensor performance; however, given the differences in our study compared to the U.S. EPA methodology, we do not consider these categorizations to be definitive. Further, we emphasize that even if a sensor meets or surpasses the performance target values for each metric, this does not constitute endorsement by the U.S. EPA. Their guidelines were developed for Ox and PM2.5 air sensors, and we used these to assess the ARISense Ox-B421 and OPC-N2 sensors, respectively. Although there are no formal guidelines for CO, NO, and NO2 sensors at the time of writing, for coherency we opt to assess those sensors using a similar approach.

The coefficient of determination (R2), an indicator of the correlation between estimated and true concentrations, was used to assess linearity. The root-mean-square error (RMSE) was used to assess error in the estimated measurements compared to the true values. The coefficient of variation (CV) was used to assess precision. Finally, to assess bias, a linear regression model (y=mx+b) was fit using the ARISense measurements as the dependent variable (y) and the reference measurements as the input variable (x), and the resulting slope (m) and intercept (b) were calculated. Quantitative descriptions for each metric are given in Sect. 3 of the Supplement.

In addition, prediction intervals between the OPC-N2 and MicroPEM data were calculated to provide a statistical confidence interval to interpret OPC-N2 sensor measurements collected after the evaluation period (Bean, 2021). We calculated 68 % (1-sigma) prediction intervals for the ARISense using colocation data from ARI023 (Table 2) collected at the Village 2 site (Fig. 1d). The 1 h averaged observations were used to fit a linear model, which required a Box–Cox transformation (Box and Cox, 1964) to obtain normally distributed residuals (Fig. S3). Details are given in Sect. 3 of the Supplement.

Table 2Project timeline of colocations, deployment, and emissions monitoring experiments. The description under each period indicates the activity conducted during that timeframe. The location of the activity is given in parentheses.

* Data from emissions monitoring experiments not discussed in this paper.
Note that n/a stands for not applicable.

Download Print Version | Download XLSX

Figure 1(a) Satellite map of Malawi in southeastern Africa, (b) three ARISense monitoring sites in Malawi, (c) satellite map of Village 1, and (d) satellite map of Village 2. Blue markers indicate ARISense monitoring sites. Red crosses indicate the location of known biomass cookstoves within 50 m of the monitoring site. The image source is Google Earth Pro Version University, Village 1, and Village 2, Malawi, South-eastern Africa. Borders and labels layer. Accessed: June 5, 2020. © Google Earth 2021.

2.6 Deployment to Malawi

ARI013, ARI014, and ARI015 were deployed to their respective monitoring locations in Malawi from July 2017 to July 2018 (shown as blue markers in Fig. 1). The three locations were selected to provide measures of regional variation and replicates in two paired village sites. ARI013 (Village 2 site) and ARI014 (Village 1 site) were deployed <5 km apart (Fig. S5) in two rural villages in Mulanje, Malawi, adjacent to private residences. ARI015 (University site) was deployed >375 km northwest of the village sites at a rural university campus ∼30 km from the capital city (Fig. S6). Additional satellite images are given in Sect. 4 of the Supplement.

Almost all rural households in Malawi (99.7 %) use solid fuels (e.g., firewood, charcoal) for cooking (National Statistics Office, 2017). Emissions from widespread biomass cookstove use are known to impact local ambient air quality (Aung et al., 2016; Zhou et al., 2011; Amegah and Agyei-Mensah, 2017). Homes regularly using biomass cookstoves within 50 m of the monitoring sites were visually identified at the onset of the study (shown with red crosses in Fig. 1c–d).

A timeline of the ARISense colocations and deployments is given in Table 2. After the 1-year ambient deployment was completed, the ARISense were used for high-concentration emissions monitoring experiments in rural Malawi in July and August 2018. The details of those experiments (i.e., number of experiments, duration, approximate CO concentrations) are discussed in Sect. 5 of the Supplement. We explore the impact of these experiments on sensor operation, but we do not discuss the data itself in this paper.

At the conclusion of the emissions monitoring experiments, ARI013 and ARI014 were returned to NC and were colocated with reference instruments at the near-highway Durham, NC, site used in the pre-deployment colocation (described in Sect. 2.3). ARI015 was relocated to a new monitoring site in Malawi.

2.7 Remote sensing and reanalysis data

Two publicly available NASA data products were obtained from the Goddard Earth Sciences Data and Information Services Center (GES-DISC) Interactive Online Visualization and Analysis Infrastructure (GIOVANNI): (1) area-averaged, monthly multispectral CO surface mixing ratio (daytime / descending) from MOPITT and (2) monthly averaged CO surface concentration (ENSEMBLE) from MERRA-2, henceforth referred to as “MOPITT” and “MERRA-2”, respectively. MOPITT is a calibrated satellite observation and MERRA-2 is a global reanalysis data product. MERRA-2 is the output of an atmospheric chemistry model that has assimilated other data, including satellite data, in making its estimations. Monthly averaged MOPITT and MERRA-2 observations were compared to ARISense CO surface data collected at the Village and University locations. Given the physical proximity of Village 1 and Village 2, and the similarity in monthly mean CO concentration at each site (Fig. S7), the average of the data sets (Village Mean) was used. Additional details are given in Sect. 6 of the Supplement.

3 Results and discussion

3.1 Gas sensor performance during colocation

Raw gas sensor voltages (5 min averaged data) from all three ARISense monitors (ARI013, ARI014, ARI015), excluding the Ox sensor in ARI015, were highly correlated (R2>0.8) during the pre-deployment colocation, suggesting changes in sensor response were due to environmental changes, not sensor-to-sensor variability (Fig. S9). The sensors in ARI013 and ARI014 were most closely correlated (R2>0.9). The raw ARI015 Ox sensor data showed weaker temperature dependence and the lowest correlation (R2<0.6) with Ox sensors in ARI013 and ARI014 (Fig. S9).

Figure 2 shows two performance metrics representing each sensor–model combination for the three ARISense. Data points toward the lower-left corner of each Fig. 2 panel indicate better performance. Results from all ARISense-sensor–model combinations for all five performance metrics are given in Tables S4–S6. We found that performance varied by ARISense monitor, but none of the ARISense consistently performed better than the others. Overall performance varied by gas sensor type and modeling approach. The calibrated NO2 sensors in all three ARISense were the least correlated with reference measurements compared to the other gas sensors. Only the ARI015 NO2 sensor, calibrated by the RF hybrid model, surpassed the target value for the linearity metric (R2>0.8). Further, no NO2 sensor–model combination met the bias target values for slope and intercept. For all three ARISense, the calibrated NO2 sensors underestimated the true concentration compared to the reference (0.26<m<0.71). However, all NO2 sensor–model combinations met the error target (RMSE <5 ppb) and approached the precision metric target.

Figure 2Performance comparison of gas sensors (a) CO, (b) NO, (c) NO2, and (d) O3 as calibrated by the five types of modeling approaches adopted for this study (kNN hybrid, RF hybrid, HDMR, MLR, QR). The model type is indicated by color and marker shape. An individual data point represents the paired metrics (RMSE and R2) for one ARISense monitor. Since there are three ARISense (ARI013, ARI014, ARI015) monitors, there are three markers for each gas sensor–model combination. RMSE is root-mean-square error. R2 is the coefficient of determination (negative infinity R21). The lower-left corner region of each panel indicates the highest performance based on these metrics.


At the other end of the performance spectrum, the calibrated Ox sensors performed the best compared to the other gas sensors during pre-colocation. Nearly all Ox sensor–model combinations attained similar linearity and error metrics (0.85<R2<0.99 and 2< RMSE <5 ppb, well within the target values). Only the ARI015 Ox sensor calibrated by the RF hybrid model failed to meet the RMSE target value, yet it returned the highest R2 value compared to the other models. Additionally, all Ox sensor–model combinations met the slope and intercept target values for bias. For the kNN hybrid model, the calibrated O3 observations had a slope approximating 1 (m>0.98) and an intercept of 0, suggesting minimal bias. Only the precision values (37 % < CV <54 %) were outside the EPA guideline target range (CV <30 %).

Most NO sensor–model combinations met the target value for the bias, error, and linearity metrics, but precision was low for all combinations assessed, with most CV values >100 %. This suggests that the variation in the NO data set was in the raw sensor or reference measurements, rather than the modeling approaches. The MLR model was associated with the worst performance for all three NO sensors compared to the other models. However, for ARI015, all NO sensor–model combinations surpassed the target for every metric except precision. Again, the ARI015 gas sensor–RF hybrid model combination was the outlier compared to ARI013 and ARI014 sensor–model combinations (Table S6). We hypothesize that the shorter colocation period of ARI015 (8 d compared to 17 d of colocation for ARI013 and ARI014) led some of the sensor–model combinations to be overfit or poorly constrained.

Most CO sensor–model combinations met or approached the target values for bias, linearity, and precision. The U.S. EPA recommended Ox target values for these three indicators (Table S1) can be used to compare against the CO sensor values to approximate performance, but we surmise that the error target value (RMSE  5 ppb) cannot. The U.S. EPA National Ambient Air Quality Standards suggest CO concentrations are 1–2 orders of magnitude larger than ambient ozone or NOx concentrations. By extension, we posit that a reasonable error target value for the CO sensor is 50 ppb. Except for the CO–kNN hybrid model combination, most CO sensor–model combinations did not meet our adapted error target value. However, considering the magnitude differences, the CO sensor–model combinations performed similarly to the NO, NO2, and Ox sensors in terms of error. The CO RMSE values (40–70 ppb) were correspondingly 1 order of magnitude larger than NO, NO2, and O3 RMSE values (2–7 ppb).

For the suite of gas sensors in the ARISense monitors, we found the kNN hybrid model to be the best among the modeling approaches used in the pre-deployment colocation testing (Fig. 2). In almost all cases, the kNN hybrid model returned higher R2 values, slope values closer to 1, and lower RMSE values than any other model. The RF hybrid model attained similar and occasionally higher R2 values than the kNN hybrid, but it had higher (and therefore worse) RMSE values by comparison. Further, the kNN hybrid model showed the least inter-monitor variation in performance. In Fig. 2b–d, the kNN hybrid points are closely clustered together, suggesting that this model was able to attain similar performance for each of the three ARISense. Conversely, the other models, in particular the RF hybrid and MLR, showed a wide range in performance across the three ARISense. Even if another model was able to attain performance metrics higher than the kNN hybrid (e.g., HDMR and MLR CO models in Fig. 2a), it was only for one of the three ARISense monitors and never all three. Additionally, the MLR failed to meet target values for some ARISense–gas sensor combinations (Fig. 2a–b). Taken together, these findings suggest the kNN hybrid model is the best choice among these five modeling approaches for our application, given that we sought an approach uniformly applicable to all the gas sensors and all three ARISense.

3.2 OPC-N2 performance during colocation

Pre-deployment colocation PM2.5 measurements in North Carolina (where no reference monitor or data were available) from ARI013, ARI014, and ARI015 suggest the Alphasense OPC-N2 sensors in each monitor responded similarly (R2>0.9) when in the same environment (Fig. S10). ARI013 PM2.5 mass concentration measurements were higher than measurements made by ARI014 and ARI015 (slope >1), despite all ARISense being in the same location. ARI015 underestimated the mass at low concentrations compared to ARI013 and ARI014 (nonlinear clustering at concentrations <5µg m−3 in Fig. S10a and c). The OPC-N2 sensors in ARI014 and ARI015 showed the highest similarity (slope =1± 0.05, R2=0.96).

Figure 3 shows scatterplots of the ARI023 OPC-N2 and MicroPEM 1 min, 1 h, and 24 h averaged data collected during colocation at the Village 2 site in Malawi (individual 1 min scatterplots for each of the three tests are shown in Fig. S11). RH correction partially mitigated the impact of overestimation due to hygroscopic growth but did not remove the artifact entirely (Fig. S12). RH correction improved the precision and error metrics, bringing RMSE within the target value (≤7µg m−3) for the 24 h averaged data (Table S7). Increasing the averaging interval had a similar effect, but this alone was insufficient to bring RMSE within the target range. Linearity was well below the target value (R2>0.7) for all averaging intervals, and RH correction did little to improve performance for this metric. For this data set, changes in bias and linearity appeared driven by averaging interval. For example, the OPC-N2 RH-corrected 1 min data met the target for slope and intercept, but the 1 and 24 h averaged data met neither of these targets. Particularly for the 24 h averaged data, the small sample was leveraged by a few points, which drove metric values (Fig. 3c); however, close 1:1 agreement between the instruments was observed for 4 of the 7 colocation days. These results highlight the value of longer and more representative colocations. At least two 30 d colocations would be needed, during the hot and dry (September to October) and warm and wet (November to April) seasons, to characterize this specific site.

Figure 3Scatterplots of RH-corrected PM2.5 mass concentration measurements from the OPC-N2 versus filter mass-corrected PM2.5 measurements from the MicroPEM at 1 min (a), 1 h (b), and 24 h (c) averaging intervals. Data points are colored according to RH (%) conditions. N represents the number of data points. Linear fit lines and regression coefficients (m,b) are given in red as Y=mx+b. Additional metric values are inset: R2 is the coefficient of determination, RMSE is root-mean-square error (units of µg m−3) assuming the MicroPEM is the reference instrument, and CV is coefficient of variation. The dashed black line is a 1:1 line.


Even after RH correction, the OPC-N2 overestimated mass concentrations compared to the nephelometer when RH was ≥70 %. Conversely, the OPC-N2 often underestimated mass when RH was ≤30 %. These effects were most noticeable at higher time resolutions (Fig. 3a–b). The effects of RH were tempered by a longer averaging interval; however, for a particularly humid day at this site, the 24 h mass concentration was overestimated by a factor of 3 (Fig. 3c). Notably, the moderate RH outliers in the 24 h average scatterplot suggest that other factors in addition to RH were contributing to error in the OPC-N2 observations.

To explore other contributors to variable OPC-N2 performance, Fig. 4 shows performance for RH-corrected data stratified by environmental conditions (wind direction, ambient concentration, and RH). Wind direction and concentration (Fig. 4a–b) were selected to explore the possible effect of nearby cookstove emissions, while Fig. 4c highlights the remaining effect of RH even after correction. We hypothesized that ambient concentration and wind direction might impact OPC-N2 performance given that the site was periodically exposed to cookstove emissions from the Village 2 site household kitchen (within 15 m to NW) and from adjacent residences (within 50 m to the SSW in Fig. 1d). Figure 4 shows that wind direction was associated with performance variation, although to a lesser degree than RH. Slightly increased performance was observed for northerly winds. Nearby cookstove use potentially explained the decreased performance associated with southerly winds. Four of the five morning cooking periods observed in the time series data were associated with wind blowing from the SE–S–SW (Fig. S14). Figure 4b shows that ambient concentration had a modest impact on OPC-N2 performance metrics. Linearity was expected to increase with concentration, particularly given that the high-concentration bin (20–105 µg m−3) spanned a larger interval than the other bins. Precision within each concentration bin was low. The CV values were well beyond the recommended target value (CV <30 %). The OPC-N2 frequently underestimated the ambient mass concentration compared to the MicroPEM, particularly during higher concentration periods dominated by near-field biomass burning (i.e., slope = 0.4 for measurements between 20 and 105 µg m−3). During periods of cookstove influence, the size distribution, hygroscopicity, and optical properties of the measured aerosol were likely altered. Assumptions about the source aerosol (density and hygroscopicity) used in the RH correction were found to affect inferred OPC-N2 performance compared to the MicroPEM (though not in a predictable fashion). For example, higher linearity and lower RMSE were observed when the particle composition was assumed to be highly hygroscopic (κ=1), yet the least bias was observed at the lowest hygroscopicity assessed (κ=0.15). Further, when the aerosol was assumed to be characteristic of wildfire (rather than ammonium nitrate, dust, or background in origin), the bias between the OPC-N2 and MicroPEM disappeared (slope = 1.02), yet the error metric was among the highest in the four aerosol categories and was above the target value (Table S10). These findings suggest more research is warranted to explore how changing aerosol characteristics (both assumed and actual) impact optical particle sensor performance. Summary statistics for each performance assessment metric are given in Tables S8–S10 in Sect. 8 of the Supplement.

Figure 4Performance comparison of the RH-corrected Alphasense OPC-N2 compared to the MicroPEM under different environmental conditions: (a) wind direction, (b) ambient concentration, and (c) relative humidity during colocation at the Village 2 site in Mulanje, Malawi. An individual data point represents the paired metrics (RMSE and R2) for the OPC-N2 for a specific range of each condition. The histograms (inset) show the normalized frequency distributions for the ranges of each condition recorded during the colocation period. The colored markers in each panel correspond to the colored histogram bins. The metrics were calculated from 1 h averaged RH-corrected OPC-N2 PM2.5 concentrations compared to the MicroPEM filter mass-corrected nephelometer. RMSE is root-mean-square error, assuming the MicroPEM concentrations as the true values. R2 is the coefficient of determination. The lower-left corner of each panel indicates the highest performance based on these metrics.


In this deployment site, the OPC-N2 performed the best compared to the MicroPEM during dry conditions (20 % to 40 % RH) and when measuring background aerosol rather than source emissions (Fig. S14, presumed based on time series data). However, this latter result might be partially due to the coincident effects of high RH in this environment (Fig. 7). Figure 4c shows OPC-N2 behavior was affected by changes in ambient RH. In general, performance decreased with increasing RH, and this effect remained even after RH correction. For RH of 20 % to 40 %, RH-corrected OPC-N2 performance approached or exceeded the target values for the linearity, error, and precision metrics (Table S8). After RH increased past 70 %, the R2 value approached zero and the RMSE increased beyond the target value. Unfortunately, the inset histogram of Fig. 4c shows that an RH range of 60 % to 80 % was typical for this site during colocation.

We found that the OPC-N2 at this specific site underestimated mass concentration compared to the MicroPEM, based on less than unity slope values. The performance was variable at low ambient concentrations and largely dependent on RH (Fig. S13). However, outside of very humid (RH >70 %) conditions, the RH-corrected OPC-N2 could estimate the PM2.5 mass concentration within about 10 µg m−3 of the MicroPEM value for real-time, hourly, and daily monitoring purposes (based on RMSE in Table S7). The findings from this section highlight the importance of quality assurance for low-cost optical particle sensor mass concentration measurements, especially those made in environments with highly variable meteorology and nearby ultrafine aerosol sources. For this site, contextual information on meteorology and emissions sources and their diurnal patterns helped interpret and evaluate the measurements.

3.3 Gas sensor performance during deployment

Given that RH, T, dew point (DP), and differential voltage were inputs to the calibration models, the ranges of these values during colocation in NC should mimic the ranges expected during deployment in Malawi. Otherwise, the model is required to extrapolate beyond its training bounds, which could lead to non-physical results (e.g., negative concentration values). Further, the performance assessment statistics derived from the colocation cannot be expected to hold for conditions far beyond those experienced during the performance characterization. Overall, the colocation and deployment settings exhibited a similar range of environmental conditions (Figs. S15–S16), but T and RH ranges in NC (15 to 40 C and 20 % to 80 %) were less extreme than in Malawi (10 to 45 C and 10 % to 95 %). In Malawi, the ARISense experienced more time at lower temperatures (T<25C), lower gaseous concentrations (other than CO), and lower ambient pressure (5 to 15 kPa lower depending on the location). Although the ARISense were deployed at a higher elevation in Malawi than during the colocation in North Carolina (625 m versus 120 m a.s.l.), all models were built using the differential voltages (WE-AE) of each electrochemical gas sensor. Therefore, the pressure-related shifts in the WE and AE baseline were not expected to pose an issue for the calibrated Malawi data. The variation in pressure was within the operating range given on the sensor specification sheets (80 to 120 kPa) and was stated not to have long-term impacts by the manufacturer (Alphasense FAQs, 2021). Further, others have shown no statistically significant change in electrochemical sensor sensitivity due to changes in pressure (Popoola et al., 2016). Even so, we did not have the laboratory chamber data to investigate this potential issue.

3.3.1 Bivariate histograms

Figure 5 shows bivariate distributions of T, RH, and gas sensor differential voltage data collected in NC and Malawi. In addition to capturing interactions between variables, Fig. 5 shows that the individual sensors in each ARISense responded differently even when in the same environment during the NC colocation. Compared to ARI013 and ARI014, the Ox sensor in ARI015 showed weaker temperature dependence (Fig. 5c). Since ARI015 had a shorter colocation period, it could be hypothesized that if ARI015 were present in the colocation environment for the same amount of time as ARI013 and ARI014, its response would look more like the ranges measured by the other sensors. However, this cannot fully explain the variation between individual sensors. For example, there is considerable variation between the ARI013 and ARI014 NO2 differential voltage ranges (grey regions in Fig. 5g–h), despite having identical colocation periods. Further, the raw CO sensor data for all three monitors showed much less inter-sensor variation (grey regions in Fig. 5d–f), even despite the shorter colocation period of ARI015. This inter-sensor variation, which appears largest for the NO2 sensors, may partially explain the lower performance of this gas sensor group during calibration model performance testing compared to the other gas sensor types (Fig. 2).

Figure 5Bivariate distributions of gas sensor calibration model data inputs (RH; T; and Ox, CO, NO, and NO2 differential voltage) for each ARISense monitor using kernel density estimation. Density is reflected in the color scheme, where darker colors indicate more data points in that region. Training data collected during colocation in North Carolina are shown in grey, and data collected during deployment to Malawi are shown in color. ARI013 was deployed to the Village 2 site, ARI014 to the Village 1 site, and ARI015 to the University site. Regions where the deployment distributions overlap with the NC colocation distributions indicate the regimes for which the calibration models were trained. Regions where the deployment location distributions extend beyond the NC colocation distributions indicate regimes where the calibration models must extrapolate to estimate pollutant concentrations. These regions are indicated by overlaid markers “x” and “+” and are discussed in the text.


There were notable regimes in Malawi that required the calibration models to extrapolate beyond NC training conditions. NO differential voltage responses in NC and Malawi did not completely overlap (Fig. 5g–i), especially in the low-concentration regime (i.e., V near 0 mV) which was more frequent in Malawi. The colocation site in NC was 10 m from an 8-lane freeway (Saha et al., 2018), therefore NOx concentrations were higher than in rural Malawi where vehicles and industry are rare. However, for ARI014 in Village 1, there was a higher NO2 response in the deployment environment compared to the colocation environment. This could be partially explained by sensor interference by RH and T, which was more extreme (i.e., beyond the training ranges) in Malawi (Fig. S17). Figure 5e shows that the maximum ARI014 CO differential voltage in Malawi (350 mV) was 3 times higher than the maximum voltage registered in NC (100 mV). This high CO regime is denoted by a cross in Fig. 5e. This difference was consistent with observations of nearby sources (Fig. 1c–d). ARI014 was deployed in more densely populated Village 1, adjacent to more biomass cookstove activity than ARI013 or ARI015 (Fig. 1c). In general, we expected higher CO in Malawi than in NC, where biomass burning is less common and emissions from other sources (e.g., vehicles) are controlled by strict federal regulation.

The Ox differential voltage ranges were the most dissimilar between the colocation and deployment environments. The most frequent regimes, the heaviest-shaded regions in Fig. 5a–c, did not fully overlap for any of the ARISense. In NC, the relationship between the Ox sensor voltage and ambient temperature was positive and monotonic. Higher temperatures generally facilitate ozone production; therefore, this relationship fit our expectation for an urban site in a single season. However, the positive relationship between Ox sensor voltage and temperature did not always hold in the deployment sites. In Fig. 5a–c, a high-temperature–low-ozone regime in Malawi (regions denoted by a “+” marker) that was not present in the NC data can be seen. Further, for all three Malawi sites, the minimum Ox sensor voltages were lower (−10< Vmin<0) than minima in the NC colocation.

3.3.2 Diurnal trends

Since the deployment sites did not have reference data for quantitative comparison, we calculated and compared the annual mean diurnal trends of each pollutant at each site, as predicted by the five models, to qualitatively assess the transferability of the calibration models to Malawi. Our definition of a transferable model required that it produce (a) non-negative concentration values and (b) diurnal trends consistent with our first-hand observations of nearby emission sources and their timing, previous observations of diurnal trends in regions with widespread biomass cookstove use (Dionisio et al., 2010; McFarlane et al., 2021; Subramanian et al., 2020), and knowledge of atmospheric chemistry. Non-physical predictions from a given model may indicate that differences between the colocation and deployment environments were too large to extrapolate, and therefore any deployment results calibrated by that model are likely not reliable. Alternatively, coherency among the concentration values and trends estimated by the models may suggest that the deployment results are robust against variation in the modeling approaches. This analysis can contribute to our confidence in the estimated concentration values and trends but cannot address or estimate the quantitative error. Diurnal trends in Fig. 6 suggest the kNN hybrid model was the most transferable for interpreting deployment data for all gas sensors. However, both the kNN and RF hybrid models predicted similar trends and values for most sensors. The MLR and HDMR models also predicted similar trends but sometimes predicted negative values.

Figure 6Diurnal trends of calibrated gas measurements (rows) at each site (columns) in the three deployment environments. RF hybrid stands for random forest hybrid, kNN hybrid stands for k-nearest neighbors hybrid, HDMR stands for high dimensional model representation, MLR stands for multilinear regression, and QR stands for quadratic regression QR model built for and applied to CO data only. The thick line indicates hourly mean, and the shaded region indicates interquartile range. Midnight is the zero hour. The hours are in local time.


Calibrated CO data showed the highest coherency across model predictions and were rarely non-physical (Fig. 6). All models predicted similar diurnal trends specific to each site. Knowledge of the nearby emission sources and activity patterns lend support to the calibrated CO data. For example, the village monitors were adjacent to widespread household biomass cookstove activity, coincident with the concentration peaks seen in the diurnal data. This diurnal cooking pattern was observed in both CO and OPC-N2 data (Figs. 6 and 7, respectively) at both village sites and was measured in complementary emissions monitoring work. Further, ARI014 was in a more densely populated village than ARI013, contributing to higher CO peaks (Fig. 1c). The QR model overestimated CO peaks compared to other models for the Village 1 data, likely because the model training set did not include high concentration data (Fig. 5e) and the quadratic term was not well constrained. Despite the calibrated CO measurements in Malawi being higher than the concentrations experienced in NC, particularly for ARI014 in Village 1, we expect that the calibrated CO measurements from Malawi are credible (excluding the QR model). We provide the following reasons for justification: (a) the manufacturers report that the sensor response is expected to be linear up to 500 ppm (Alphasense, Ltd., 2019), (b) RH/T interference induced on the CO-B4 sensor, approximately 0.2 mV ppb−1 (Lewis et al., 2016), has less relative influence on overall sensor readings in the higher voltage (i.e., concentration) regime, (c) all modeling approaches (other than QR) predicted highly similar diurnal trends and concentration values, and (d) there were known CO emission sources, with diurnal usage patterns matching the observed trends, near the monitoring sites. This suggests, for this specific sensor under these conditions, that these modeling approaches (other than QR) could reliably extrapolate beyond the training data limits to provide reasonable measurements in the deployment environment.

The calibrated NOx data showed less coherency than the CO data. NO2 trends were similar across the sites, and concentrations were rarely negative, but calibrated NO trends varied across models and the lower-performing models (HDMR and MLR) often predicted negative values. The better models identified in the NC colocation, kNN and RF hybrid, suggested that mean ambient NOx levels in Malawi were low (<15 ppb). We have lower confidence in the calibrated NOx measurements in Malawi for the following reasons: (a) the calibrated observations (5 to 20 ppb) being on the same order of the noise level reported on the sensor specification sheets (15 ppb) and (b) the lack of coherency observed between model predictions. Low ambient NOx levels and a lack of representative data in the NC colocation data likely contributed to the non-physical concentrations predicted by some models in Malawi.

The calibrated Ox sensors performed the best during colocation testing compared to the other gas sensors, but in Malawi the calibration models frequently returned non-physical values and showed inconsistent annual diurnal trends between the models and across the sites. For ARI014 and ARI015, most O3 trends were consistent in shape and magnitude and were aligned with the expected diurnal trend (i.e., peaking at midday). Peaks in the mean concentration were between 10 and 30 ppb, plateauing from 10:00 to 15:00 LT. The RF hybrid model at the ARI015 University site estimated the O3 peak to occur earlier in the day compared to the other models and sites. This may be the result of a spurious relationship between Ox voltage and DP in the colocation data set on which the RF Hybrid model was trained, which held at the Village sites but not at the University site. At the Village 2 site (ARI013), there was a change in raw differential voltage response after December 2017 that caused all Ox models to fail for the second half of the deployment. All models either consistently predicted negative values, values <1 ppb, or failed to reproduce the expected diurnal trend (i.e., peaking around 09:00 LT rather than 12:00 LT). Only Ox data collected before December 2017 resulted in reasonable calibrated values and trends (Fig. S18). Notably, Ox data collected after December 2017 corresponded with the high-temperature–low-ozone regime (Fig. S19) shown in Fig. 5a–c. Despite the Ox differential voltage data spanning a similar range in both NC and Malawi, there was little overlap in the ozone dimension at comparable concentration, RH, and T conditions. Since ozone is a secondary pollutant driven by complex atmospheric processes and multiple precursors, the ambient conditions that increase or decrease ozone formation in one region may not hold in another environment. Although the calibrated Ox sensors performed better than the other gas sensors in NC, the models were tuned for a set of conditions that did not hold in Malawi. This suggests that for these Ox sensors and these modeling approaches, a lack of environmentally similar colocation data compromised our ability to reliably interpret calibrated O3 measurements in this specific deployment environment.

3.4 OPC-N2 performance during deployment

To evaluate the long-term performance of the OPC-N2 during deployment in Malawi, we examined the representativeness of the colocation conditions for the full year of conditions experienced during deployment. Figures S20–S21 show normalized histograms of the T, RH, and PM2.5 mass concentration observed during the colocation and the full-year deployment in Malawi, suggesting the two data sets spanned a similar range of environmental conditions. However, the colocation occurred during the cool, dry season, and RH minima and maxima (regimes associated with deficient performance during colocation; see Sect. 3.2) were more extreme during the 1-year deployment in Malawi.

Figure 7 shows the annual diurnal trend of the mean PM2.5 mass concentration, with 1-sigma prediction intervals, using 1 h averaged, RH-corrected data from each deployment location. Peak PM2.5 concentrations were observed around 06:00 LT at all sites, when morning biomass cookstove activity coincided with high RH (and more atmospherically stable) conditions. Figure 6 shows that the diurnal trends of ambient CO (another pollutant emitted by biomass burning) were similar to the PM2.5 diurnal trends at each site. Again, the largest peaks were observed at the more densely populated ARI014 Village 1 site. The prediction intervals were widest between 05:00 and 07:00 LT, indicating overall low confidence in OPC-N2 measurements during this period. Afternoon and overnight means, coinciding with drier conditions, were similar across all three sites, and prediction intervals were narrowest during afternoons. Data from the more remote locations (ARI013 and ARI015) suggest background concentrations of PM2.5 in rural Malawi were low (5 to 15 µg m−3), but the OPC-N2 could not reliably quantify peak concentrations that were high and variable, depending on the timing and presence of nearby sources and changes in ambient meteorology (especially RH). Despite this, qualitative data from the OPC-N2 sensors was sufficient to identify nearby source activity and indicate periods when ambient concentrations were likely high enough to be harmful to human health (and at least partially driven by cooking activities associated with higher exposure concentrations).

Figure 7Diurnal trends of the integrated mean PM2.5 mass concentration measured by the OPC-N2 in each ARISense at each deployment site (left axis) and the annual relative humidity at the Village 2 site (right axis). Error bars represent the calculated 1σ (68 %) prediction interval of the hourly mean value. The red text annotation indicates the upper limit of the Village 1 prediction interval at 06:00 LT (beyond the range of shown y axis). For RH data, the thick line indicates hourly mean, and the shaded region indicates interquartile range.


3.5 Comparison of ARISense CO to remote sensing and reanalysis data

Given the absence of additional in situ surface data, we rely on satellites and models to estimate surface air quality for comparison of our results. To contribute to the literature on surface-to-satellite comparisons over Africa, we compared calibrated ARISense CO observations to a satellite observation (MOPITT) and a model estimate (MERRA-2) in our study region. We confirmed that all three data sets reported similar annual qualitative trends, although they disagreed in magnitude. This analysis was limited to CO, given that the calibrated CO observations were the most dependable of the ARISense gas data and that NASA remote sensing data products were more readily available for CO compared to O3 or NOx.

Figure 8 shows the mean monthly CO from the University (ARI015) and Village Mean (average of ARI013 and ARI014) sites compared to that from two area-averaged remote sensing products: CO surface mixing ratio from MOPITT and CO surface concentration from MERRA-2. All three data sets were compared from July 2017 to July 2018, focusing on differences between the peak agricultural burning (September to October) and non-burning (December to July) seasons. November and August were excluded from either description (peak burning or non-burning) for the following reasons: (a) a review of fire studies in the region consistently reported September and October as the dominant months of the burning season (Nieman et al., 2021), (b) August and November mark the beginning and end of the fire season, respectively, and therefore cannot be considered non-burning months, (c) the exclusion of August and November better captures strong seasonal differences, providing a measurable benchmark to compare the satellite and surface data, and (d) ARISense data for the Village sites was unavailable for November 2017 (see Sect. 3.7 for more on the difficulties of deployment). The MERRA-2 data set was complete for the full year of interest, but MOPITT was missing data for the Village Mean region in February and March 2018. The NASA data sets were more similar to one another at the Village Mean site compared to the University site. At both sites, MOPITT reported higher CO concentrations than MERRA-2, especially in the peak burning season.

Figure 8Monthly carbon monoxide (CO) concentration (ppb) reported by the surface ARISense (Tukey boxplots) and remote sensing data products (lines and markers indicating mean monthly value) at the (a) Village Mean and (b) University sites. The tops and bottoms of the boxes indicate 75th and 25th percentiles, whiskers show the 9th and 91st percentiles, the middle line indicates the median, and stars indicate mean. The ARISense surface data were at least 80 % complete for each month, except where noted with a percentage text label. Data for July 2017 and July 2018 were averaged. Village Mean represents the average of ARI014 (Village 1) and ARI013 (Village 2) data. The annual mean from each data source is given on the right axis. MOPITT (multispectral CO surface mixing ratio daytime / descending) is a satellite measurement, and MERRA-2 (CO surface concentration ENSEMBLE) is a global reanalysis product.


All three data sets (MOPITT, MERRA-2, and ARISense) indicated that annual mean CO concentrations were slightly higher overall at the University site than at the Village site, although this was less pronounced in MERRA-2. Similarly, all three data sets showed increased ambient concentrations during the peak burning season compared to the non-burning season at both sites. For ARISense, MOPITT, and MERRA-2 observations, respectively, peak season means were larger than non-burning season means by 160, 130, 60 ppb (Village Mean) and 190, 115, 50 ppb (University). Although the ARISense indicated larger absolute differences between seasons, the relative increase at both sites was only about 50 % of the non-burning season mean, while MOPITT and MERRA-2 reported increases of 125 % and 75 %, respectively. This could be explained by ARISense proximity to small-scale combustion activity not resolved by satellite imaging. Satellite-based observations approximate ambient background concentrations, which increased during the peak season due to regional agricultural burning. Meanwhile, the ARISense were exposed to ambient background concentrations as well as nearby biomass cookstove emissions, which presumably remained consistent throughout the year, showing a lower relative seasonal increase during the peak burning season. Quantitative disagreement between surface and remote CO observations was highest during the burning season, especially at the University site (Fig. 8). Remote sensing data suggested higher CO concentrations at the University compared to the Village Mean during non-burning periods, but during the peak burning season this difference shrank and similar concentrations were observed across both sites. Conversely, differences between ARISense observations grew by about 6 % during the peak season. MERRA-2 and MOPITT concentrations were highest in September, consistent with ARISense data at the University site but not the Village Mean site, which peaked in October. However, 90 % of the October CO data were missing for the Village site.

Monthly mean CO ARISense values were 2 to 4 times higher than those reported by MOPITT and MERRA-2. We found differences of 175 % to 200 % between the annual mean CO concentration from ARISense and MOPITT, depending on the site, and even larger differences (up to 360 %) with MERRA-2. Differences between MOPITT and MERRA-2 were smaller (30 % to 35 %). There are few comparable studies available to explain these differences, which are greater than previously reported in the literature available for SSA. One study in South Africa reported relative differences of ± 40 % between ground-based CO measurements and Aura satellite observations at Cape Point station (Toihir et al., 2015). Many studies found good agreement (within 10 %–20 % bias) between ground measurements and MOPITT observations, but this was for total column CO, and the observations were not limited to comparisons over Africa (Buchholz et al., 2017; Emmons et al., 2009, 2004; Yurganov et al., 2008, 2010). However, these studies found negative satellite bias when intense biomass plumes affected observations, when CO levels were low in the Southern Hemisphere, or when atmospheric CO levels changed rapidly (Buchholz et al., 2017; Emmons et al., 2004; Yurganov et al., 2008, 2010). Each of these conditions could be expected to occur in the southern African troposphere, potentially explaining differences observed between the ARISense and remote sensing observations in this study.

This comparison of low-cost sensor surface data, satellite observations, and model estimates in Malawi suggests each of these resources can give consistent information on qualitative, long-term trends in a region lacking ground-based reference monitoring. However, because of inherent differences in spatial and temporal resolution, each observation will disagree in magnitude. Satellite retrievals and real-time surface measurements do not result in directly comparable quantities. Satellite data are collected as a once-daily flyover observation, averaged over a  12 000 km2 area (corresponding to 1 spatial resolution). In contrast, the ARISense data were 1 min resolution, fixed-site, long-term point measurements at the surface. Further, the ARISense data were collected near visually identified biomass emission sources and were not representative of background conditions. Meanwhile, the satellite observations provide an estimate of regional background conditions. Despite these differences, the MOPITT, MERRA-2, and ARISense data sets agreed on the long-term seasonal trends present in this region and even corroborated site-to-site differences (e.g., higher mean CO at University compared to Village Mean site). These findings suggest the ARISense captured synoptic-scale variation in CO, but comparison to remote sensing data does not allow for a quantitative assessment of data collected at higher temporal resolutions.

3.6 Comparison to other ambient measurements in SSA

The annual median (July 2017 to July 2018) surface concentrations in Malawi estimated by the ARISense sensors were 9 to 11 ppb for NOx, 4 to 15 ppb for O3, and 240 to 330 ppb for CO, depending on the site. Surface concentrations and diurnal trends of ARISense CO and PM in Malawi were comparable to studies in Kenya, Rwanda, Ethiopia, Uganda, and South Africa (Delmas et al., 1999; DeWitt et al., 2019; Laakso et al., 2008; McFarlane et al., 2021; Nthusi, 2017; Scheel et al., 1998; Subramanian et al., 2020; Toihir et al., 2015). However, comparison of O3 concentrations suggested the calibrated ARISense observations underestimated actual concentrations. ARISense NOx observations were similar to two other studies (Delmas et al., 1999; Laakso et al., 2008), but overall there was little comparable data available to assess NOx concentrations in Africa.

ARISense CO observations were similar to regional CO concentrations in central Africa (measured by aircraft), found to be in the range of 250–400 ppb (Delmas et al., 1999). A long-term ambient study at the Rwanda Climate Observatory found a mean CO concentration of 215 ppb from May 2015 to January 2017 (DeWitt et al., 2019), only slightly lower than our findings in Malawi. Another LCS study in Kigali, Rwanda, observed a range in ambient CO concentrations, from 225 to 500 ppb at their rural and urban sites (Subramanian et al., 2020), spanning the concentration range we observed at our rural and semi-urban sites in Malawi.

Both studies in Rwanda found mean ambient O3 concentrations of 30 to 40 ppb (DeWitt et al., 2019; Subramanian et al., 2020). For a “relatively clean background site located in dry savannah in South Africa the annual median (July 2006 to July 2007) trace gas concentrations were equal to 1.4 ppb for NOx, 36 ppb for O3 and 105 ppb for CO” (Laakso et al., 2008). Background levels of NOx and CO at this site were lower than the ARISense annual means, yet background O3 was in line with the Rwandan studies. This suggests regional ozone concentrations in central and southern Africa are presently about 30–40 ppb. The annual mean ARISense O3 values were up to a factor of 10 lower; however, we identified quality-assurance issues in the calibrated O3 values, particularly for the second half of the deployment data. Therefore, the ARISense data are likely to be an underestimate of the true ambient values.

The relatively clean background site in South Africa (Laakso et al., 2008) had NOx concentrations up to a factor of 10 lower (1.4 ppb) than ARISense measurements in Malawi, but aerial measurements made during intense savanna fire activity in central Africa found NOy present in the range of 4–10 ppb (Delmas et al., 1999). Together, these studies suggest that the ARISense NOx concentrations (9–11 ppb) may be reasonable for our non-background, biomass-emission-influenced sites in Malawi.

Conversely, the annual median PM1, PM2.5, and PM10 concentrations (9.0, 10.5 and 18.8 µg m−3, respectively) at the background site in South Africa (Laakso et al., 2008) were comparable to ARISense observations in Malawi. The annual median ARISense RH-corrected PM1, PM2.5 and PM10 concentrations were between 4 and 7, 6 and 10, and 13 and 20 µg m−3, respectively, across all three sites. It is possible that actual concentrations of fine PM were higher at the sites in Malawi, considering that concentrations of gaseous emission tracer species (i.e., CO, NOx) were higher compared to regional background levels found by other studies. However, given the high minimum cut-off diameter of the OPC-N2, this particle sensor would have been unable to detect ultrafine particles emitted from biomass burning. Average ambient PM2.5 concentrations (measured with an Alphasense OPC-N2) were found to be between 11 and 24 µg m−3 at various sites in Kenya, with higher pollution episode concentrations ranging from 35 to 51 µg m−3 (Nthusi, 2017). Median ARISense PM2.5 concentrations were also comparable to US embassy measurements in Ethiopia and Uganda (DeWitt et al., 2019). Taken together, these comparisons suggest PM levels in rural Malawi are comparable to regional measurements made across SSA, but localized impacts from biomass cookstoves can result in higher concentrations of fine PM, which are difficult to accurately quantify with the OPC-N2. In all, although these comparisons are not a substitute for quantitative evaluation of the ARISense in Malawi, they provide a benchmark for comparison and suggest that the CO, NOx, and PM ARISense observations are reasonable for this region. At the same time, they cement our conclusion that ARISense O3 observations are likely erroneous for this environment.

3.7 Performance of ARISense sensor packages over time

Total data recovery for the 1-year deployment varied by site, season, and sensor, with rates ranging from 30 % to 80 % (Fig. S22). Average recovery for the 1-year deployment was around 60 %, with the highest recovery at the University site (80 %) and lowest at the Village 1 site (40 %). Data across all sites had the highest completeness (>70 %) in the cool and dry (June–July–August) and the cool and wet season (March–April–May). Data losses were mostly explained by power outages, software failures, and sensor equilibration times required after a power outage (Fig. S23). Power outages were common in the warm and wet season (December–January–February) due to insufficient solar intensity resulting from extended periods of heavy cloud cover. At the ARI014 site, insufficient power led to an unanticipated diurnal cycle wherein the monitor would shut off in the early morning hours and require a few hours of solar power before turning on again. This daily cycle, coupled with the 8 h long NO sensor re-equilibration time, led to almost 0 % NO data recovery in the second half of the deployment for Village 1. In all, nearly 50 % of data losses at the ARI014 site were due to insufficient power or failure to write data to file. Corrupt USB storage devices, which we were slow to replace due to ongoing civil unrest (The Guardian, 2017), resulted in significant data losses in the hot and dry season (September–October–November) at the two Village sites. Individual sensor failure was rare, but 2 months of ARI014 Ox data were lost to electrochemical sensor drift and one OPC-N2 (ARI013) failed in the last 3 months of deployment due to an insect nest clogging the OPC-N2 inlet. In all, we recorded 6992 h of data at the University site (ARI015), 5860 h at Village 2 (ARI013), and 4720 h at Village 1 (ARI014). Future deployments should include insect screens over all sensor inlets and improved battery storage and power systems that run at a longer duty cycle in the case of insufficient solar (e.g., power on only once battery is fully charged) to minimize the impact of sensor equilibration times on data recovery.

Since the monitors were deployed to their sites for >1 year, there was observation overlap in seasonally similar data collected 1 year apart. To gain insight into sensor stability, we compared the data collected in the first month (July 2017) to the final month (July 2018) of the deployment, given that ambient environmental conditions were similar in July of both years (additional details in Sect. 11 of the Supplement). It is not possible to know if the range of gas concentrations were significantly different between July 2017 and July 2018. We explored this analysis on the assumption that inter-annual variability in ambient concentrations was minimal. Bivariate distributions of the raw differential voltage readings from July 2017 and July 2018 showed that the most frequent observations (i.e., heaviest-shaded regions) were approximately the same in both years (Fig. S25). Observable differences in the voltage measurements could be partially explained by known environmental differences. For example, the Ox sensor voltages in July 2018 were lower on average than in 2017, but this was consistent with lower temperatures and higher RH in 2018 compared to 2017. However, there was potential evidence of slightly reduced or altered responses in individual sensors, particularly the NO sensors in ARI013 and ARI015 and the CO sensors in ARI013 and ARI014. For these sensors, the 2018 distributions had less spread than the 2017 distributions, suggesting either less variation in ambient concentrations in 2018 or decreased sensitivity in the sensors. Diurnal plots from both years showed that the raw mean voltages and trends were consistent (Fig. S26). However, again the most noticeable differences were in the individual CO and NO sensors identified from the bivariate distributions. For example, the CO peaks measured at mealtimes by ARI013 and ARI014 were about 50 mV lower in 2018 than 2017. These differences could be explained by lower concentrations in 2018 than 2017, changes in the raw sensor response over the 1-year period, or both. Without reference equipment, we were unable to investigate sensor drift and decay more rigorously. This qualitative analysis suggests individual sensor responses were altered during the 1-year deployment, but there was no unambiguous evidence for systematic deterioration within or across the electrochemical sensor groups used in the ARISense.

In general, the calibrated observations followed the trends identified from the raw sensor voltage readings. Calibrated CO data trends were consistent for both years, with the models responding as expected to the lower voltage readings in 2018 compared to 2017. For ARI013 and ARI014, the calibrated CO peaks at mealtimes were accordingly lower by about 100 ppb in 2018 (Fig. S27). However, although the raw Ox sensor trends in 2018 and 2017 were consistent for all the ARISense (Fig. S26), the kNN hybrid model calibrated O3 data were highly irregular between the 2 years (Fig. S27). For example, the calibrated O3 data for July 2017 showed the expected diurnal pattern (concentration increasing with solar intensity) with plateaus between 15 and 40 ppb depending on the site. However, in July 2018, although the raw Ox diurnal data looked similar to 2017, the calibrated data for ARI013 and ARI015 showed midday values between 0 and 5 ppb, and the diurnal trend for ARI013 showed a flat line (i.e., not correlated with solar activity). This finding, that raw Ox sensor voltages were similar year to year while the calibrated O3 values were not, provides further evidence that the lack of comparable T, RH, and ozone colocation data contributed to the non-physical O3 trends observed during the second half of the deployment at the ARI013 and ARI015 sites.

Before their return to NC, ARI013 and ARI014 were used for high-concentration emissions monitoring experiments after the 1-year ambient monitoring campaign was completed (Table 2). The reference monitor data from the post-deployment colocation in NC (August 2018 to May 2019) were intended to enable investigation of changes in ARI013 and ARI014 raw sensor response and model performance. However, the resulting data instead demonstrated that the sensors had been severely degraded during the high-concentration exposures. In the post-colocation data, the raw differential voltage gas sensor responses in ARI013 and ARI014 were well correlated with each other (R2=0.7 to 0.9) (excluding the ARI013 Ox sensor which was clearly degraded; see Fig. S28) but less correlated than during the pre-colocation comparison (R2=0.9 to 0.99). To facilitate comparison with the pre-colocation performance metrics shown in Fig. 2 and Tables S4–S6, the performance metrics for the post-deployment colocation are given in Tables S11 and S12. Despite showing inter-sensor consistency, the raw differential sensor voltages (other than CO) made by ARI013 and ARI014 were poorly correlated with reference measurements (Figs. S29–S30). Inspection of the time series showed that the ARISense NO sensors tracked some spikes in the time-aligned NO reference data, but the NO2 and Ox sensors did not track reference data trends (Figs. S31–S32). The time series of the differential voltage and temperature data suggest the gas sensors in ARI013 and ARI014 were responding similarly to changes in T and RH, but they were no longer sensitive to changes in the target gas (Fig. S31). This may explain why the sensors in ARI013 and ARI014 were still well correlated with each other and not correlated with reference measurements. The calibrated CO data were the only data still roughly correlated with CO reference measurements, although the calibrated CO data showed aberrant features (Figs. S33–S34). These ambient sensors (except for the CO sensor) were likely affected by high concentrations of PM and volatile gases (e.g., hydrocarbons, formaldehyde) co-emitted during the biomass burning experiments. Exceedingly high concentrations of emissions can chemically degrade or contaminate the sensors; for example, the catalyst or electrolyte can be affected or depleted by repeated interactions with high concentrations of non-target species emissions. Further, if there were high concentrations of fine semivolatile PM permeating the inlet and flow line, it could condense and block or attenuate the sample flow rate. The Ox, NO, NO2 sensors were permanently altered by the biomass burning emission experiments in Malawi, leading to poor performance during post-deployment colocations with reference instruments in NC. Given these dramatic changes in sensor responses, the models were unable to generate reasonable concentration values from sensor signals, and consequently we were unable to use the post-deployment colocation data set to quantitatively assess long-term model performance. The partial exception to this was for the kNN hybrid calibrated CO data, which were correlated with the reference data (R2=0.5), suggesting that the CO sensors might retain some function after additional colocation and recalibration.

4 Conclusions

Our experience showed that LCS networks are a viable method to collect novel surface AQ data in regions without reference equipment, but this approach requires strict data quality procedures to ensure the conclusions drawn from the resulting data are valid. Performance assessment in NC suggested the calibrated ARISense sensor packages (excluding the NO2 sensor) would be suitable for supplemental air monitoring based on U.S. EPA metrics and target values. However, performance during the pre-deployment NC assessment did not reflect performance in Malawi. For this deployment site, we found that detailed information about nearby sources and their diurnal emission patterns, ambient meteorological data, and a familiarity with air pollutant behavior were helpful when qualitatively assessing LCS performance in a region where quantitative assessment was not an option. A lack of coherency in diurnal trends between calibration model predictions and frequent non-physical concentration values (Fig. 6) showed that LCS measurements made in deployment environments different from the colocation environment can be unreliable and may lead to biased information about the deployment environment. For example, although the Ox sensors showed the highest performance of all sensor types during colocation testing, and the measured RH, temperature, and Ox voltage ranges were similar in the colocation and deployment environments, the calibrated O3 data in Malawi were unreliable. The colocation data were collected in an urban area near a highway, and the deployment data were collected in a rural area heavily impacted by biomass burning emissions. The resulting difference in ozone precursor emissions could have contributed to the deficient performance of the calibration models in the deployment environment. We expect our experience in Malawi may generalize to other regions, suggesting that additional research is needed to address the issue of LCS calibration for secondary pollutants.

We found that the kNN hybrid modeling approach performed the best for NC and when applied to data collected in Malawi. However, the general lack of standardization in LCS calibration and assessment approaches complicated and extended the calibration process for our study. Although there have been advancements in calibration methods, the difficulty of identifying and applying a singular best calibration model remains a common issue among LCS users (Topalović et al., 2019; Lewis and Edwards, 2016; Giordano et al., 2021). From an end user perspective, the burden of calibration easily becomes overwhelming. There is presently no clear guidance on which model would be appropriate for which sensor under which circumstances. This limits the potential user base of LCS technologies, complicates our ability to generalize findings across different studies, and may even lead to inferior quality measurements. Given the wide range in potential LCS technologies and deployment conditions, it is not possible to fully generalize the viability and sensitivity of the ARISense to another LCS package deployed in a different area. Nonetheless, we surmise that LCSs are most useful when they are carefully selected and calibrated for a single purpose and location, for which the environmental and pollutant conditions are at least partially characterized.

This pilot deployment also provided lessons regarding the design and deployment of low-cost AQ monitoring systems for off-grid applications. The ARISense packages survived the 1-year deployment to Malawi and enabled collection of a large, novel data set; however, they suffered individual sensor failures and frequent power losses. Given that 20 % to 50 % of the deployment data were lost due to insufficient power and corrupt data storage systems, for future solar-powered deployment efforts we suggest that the power system be designed to allow for primary and secondary data recovery goals (i.e., a back-up plan to prioritize the most desirable data in the event of insufficient power). Further, we were frequently restricted in troubleshooting and repair operations by spotty cellular connection, limited human resources, and our inability to remotely locate and procure appropriate equipment. A repair kit with basic equipment (e.g., pre-programmed USB devices, alternate SIM cards, hand tools with attachments specific to each LCS) stored in a nearby, secure location would have allowed for quicker troubleshooting and repair. We suggest that in addition to solar power limitations, other potential confounding factors like extreme weather and limited technical capacity and assistance availability be considered before deployment to remote locations. We found that the more closely located the monitor was to a trained local assistant, the lower the overall data losses were.

The responses of the gas sensors were not remarkably different after 1 year of deployment (Figs. S26–S27), assuming actual concentrations did not vary significantly from 2017 to 2018. However, except for CO, repeated exposure to high-concentration biomass emissions completely degraded the sensors. Key manufacturer specifications indicated that the CO sensor was the most robust. The CO sensor exposure limit was 40 times higher than that of the Ox, NO, and NO2 sensors. Further, the maximum temperature and RH range for the CO sensor was 50 C and 90 %, respectively, and only 40 C and 85 % for the Ox, NO, and NO2 sensors. During deployment, the maximum ranges were occasionally exceeded for every sensor except CO. Operation beyond specified conditions, combined with ∼100 h of exposure to high-concentration gases during the post-deployment emissions monitoring experiments, damaged the three less robust sensors (NO, NO2, Ox) and made them unsuitable for future use. We caution end users to carefully select an appropriate sensor package given pilot information about the emission sources in their target site.

A growing body of literature highlights the potential value of LCS technologies for sub-Saharan Africa and other low-resource settings (Subramanian and Garland, 2021; Wernecke and Wright, 2021; Rahal, 2020; Sewor et al., 2021; Awokola et al., 2020). We found that our LCS surface observations were consistent with the only other available data sources in this region (remote sensing data and model products) and data from similar studies across SSA. This suggests LCSs have a key role to play in providing reliable information on general air quality conditions and trends in regions without a historical record. Advancements in machine learning techniques show how LCSs can be used for source identification and attribution in regions where little quantitative information currently exists on dominant emission sources (Hagan et al., 2019; Thorson et al., 2019). While LCSs in SSA show promise, many of the issues experienced in this study stemmed from a lack of in situ reference monitors. Additional reference-grade monitors throughout the region may help circumvent issues related to calibration modeling and quality assurance. A regional shared facility would enable periodic regionally representative colocations without requiring every country to establish its own regulatory network. Recent research has improved our ability to synthesize data from networks of LCS through computational calibration solutions that minimize the need to transport and colocate each individual monitor separately and increase the spatiotemporal resolution beyond that of reference networks (Buehler et al., 2021; Malings et al., 2019a; Kelly et al., 2021; Considine et al., 2021; Sahu et al., 2021). Concurrently, policy-focused researchers are helping to bridge the gap between governments and AQ scientists by creating comprehensive frameworks that provide systematic procedures to establish regulatory AQ monitoring networks in regions without them (Gulia et al., 2020; Pinder et al., 2019). In the meantime, we found support from local universities, which helped maintain the pilot deployment of this LCS network. We expect that any AQ program in SSA will benefit from building long-term, local capacity and knowledge transfer systems for training on-site staff and for receiving their feedback and guidance.

Code availability

The basic random forest hybrid and quadratic regression model code is available as a Supplement to Malings et al. (2019a) (, Malings et al., 2018). The k-nearest neighbors hybrid, high-dimensional model representation, and multi-linear regression model code are proprietary products of QuantAQ, Inc.; contact David H. Hagan ( with inquiries.

Data availability

The data set used in this analysis is available as an open-access Dryad repository (, Bittner et al., 2022). The repository hosts pre-processed ARISense and reference data sets from the pre-deployment and post-deployment colocations, pre-processed RH-corrected OPC-N2 and MicroPEM data sets from the Malawi colocation, and collated ARISense data sets from the 1 year deployment at each of the three monitoring sites in Malawi. Please contact the corresponding author regarding raw data inquiries.


The supplement related to this article is available online at:

Author contributions

APG was responsible for conceptualization and funding acquisition. APG, ESC, DHH, and ASB developed the methodology. EL, APG, and ASB executed the deployment experiments. ESC, DHH, and APG provided supervision. DHH and CM developed software. ASB, ESC, EL, and APG performed data analytics and visualization. ASB wrote the original draft. CM, DHH, EL, EC, and APG participated in review and editing.

Competing interests

Eben Cross and David Hagan are the co-founders of QuantAQ, a for-profit company which marketed the ARISense (since discontinued) and is actively developing and marketing sensor-based instrumentation.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


This work benefitted from contributions from Elliott Hall, who executed gravimetric filter analysis and contributed to computational analysis of the MicroPEM and OPC-N2 colocation data sets, and from Jillian McNaught through her contribution to the acquisition of the GIOVANNI data sets. Carl Malings would like to thank Naomi Zimmerman and the Carnegie Mellon University RAMPs team for their assistance in developing low-cost sensor calibration approaches. Ashley Bittner would like to thank Ky Tanner for contributing to gravimetric filter analysis, Wyatt M. Champion for his contribution to Fig. 1, Nathan Williams (Carnegie Mellon University) for logistical support with ARISense repair, and all members of the Grieshop Atmosphere and Environment Lab. For their assistance in coordinating the colocation periods in North Carolina, we would like to thank the North Carolina Department of Environmental Quality and the U.S. Environmental Protection Agency and all dedicated employees including Sue Kimbrough (U.S. EPA), Richard Snow (U.S. EPA), Kay Roberts (NC-DEQ), Timothy Skelding (NC-DEQ), Joette Steger (NC-DEQ), and Vitaly Karpusenko (NC-DEQ). Finally, we would like to thank all project principal investigators, including Pamela Jagger, Charles Jumbe, Thabbie Chilongo, Rob Bailis, Jason West, and Adrian Ghilardi; principal interpreter and field work assistant Twapa Ghambi; equipment assistants Dominic Raphael and Misheck Mtaya; and all study participants from the villages of Mikundi and Makaula in Mulanje, Malawi.

Financial support

This research has been supported by the National Science Foundation (grant no. 1617359), the U.S. Environmental Protection Agency (grant nos. 83628601 and R836286), the Heinz Endowments (grant nos. E2375 and E3145), and the Agence Nationale de la Recherche (grant no. ANR-18-MPGA-0011).

Review statement

This paper was edited by Cléo Quaresma Dias-Junior and reviewed by two anonymous referees.


Alphasense, Ltd.: CO-B4 Carbon Monoxide Sensor Technical Specification, Doc. Ref. COB4/JUL19 datasheet, (last access: 2 June 2022), 2019. 

Alphasense FAQs:, last access: 11 October 2021. 

Amegah, A. K.: Proliferation of low-cost sensors. What prospects for air pollution epidemiologic research in Sub-Saharan Africa?, Environ. Pollut., 241, 1132–1137,, 2018. 

Amegah, A. K. and Agyei-Mensah, S.: Urban air pollution in Sub-Saharan Africa: Time for action, Environ. Pollut., 220, 738–743,, 2017. 

Aung, T. W., Jain, G., Sethuraman, K., Baumgartner, J., Reynolds, C. C., Grieshop, A. P., Marshall, J. D., and Brauer, M.: Health and Climate-Relevant Pollutant Concentrations from a Carbon-Finance Approved Cookstove Intervention in Rural India, Environ. Sci. Technol., 50, 7228–7238,, 2016. 

Awokola, B. I., Okello, G., Mortimer, K. J., Jewell, C. P., Erhart, A., and Semple, S.: Measuring Air Quality for Advocacy in Africa (MA3): Feasibility and Practicality of Longitudinal Ambient PM2.5 Measurement Using Low-Cost Sensors, Int. J. Environ. Res. Public Health, 17, 7243,, 2020. 

Badura, M., Batog, P., Drzeniecka-Osiadacz, A., and Modzel, P.: Evaluation of Low-Cost Sensors for Ambient PM2.5 Monitoring, J. Sensors, 2018, 5096540,, 2018. 

Bean, J. K.: Evaluation methods for low-cost particulate matter sensors, Atmos. Meas. Tech., 14, 7369–7379,, 2021. 

Bittner, A., Cross, E. S., Hagan, D. H., Malings, C., Lipsky, E., and Grieshop, A.: Data accompanying “Data accompanying: Performance characterization of low-cost air quality sensors for off-grid deployment in rural Malaw”, Dryad [data set],, 2022. 

Box, G. E. P. and Cox, D. R.: An Analysis of Transformations, J. Roy. Stat. Soc. B Met., 26, 211–252, 1964. 

Buchholz, R. R., Deeter, M. N., Worden, H. M., Gille, J., Edwards, D. P., Hannigan, J. W., Jones, N. B., Paton-Walsh, C., Griffith, D. W. T., Smale, D., Robinson, J., Strong, K., Conway, S., Sussmann, R., Hase, F., Blumenstock, T., Mahieu, E., and Langerock, B.: Validation of MOPITT carbon monoxide using ground-based Fourier transform infrared spectrometer data from NDACC, Atmos. Meas. Tech., 10, 1927–1956,, 2017. 

Buehler, C., Xiong, F., Zamora, M. L., Skog, K. M., Kohrman-Glaser, J., Colton, S., McNamara, M., Ryan, K., Redlich, C., Bartos, M., Wong, B., Kerkez, B., Koehler, K., and Gentner, D. R.: Stationary and portable multipollutant monitors for high-spatiotemporal-resolution air quality studies including online calibration, Atmos. Meas. Tech., 14, 995–1013,, 2021. 

Bulot, F. M. J., Johnston, S. J., Basford, P. J., Easton, N. H. C., Apetroaie-Cristea, M., Foster, G. L., Morris, A. K. R., Cox, S. J., and Loxham, M.: Long-term field comparison of multiple low-cost particulate matter sensors in an outdoor urban environment, Sci. Rep-UK, 9, 7497,, 2019. 

Castell, N., Dauge, F. R., Schneider, P., Vogt, M., Lerner, U., Fishbain, B., Broday, D., and Bartonova, A.: Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates?, Environ. Int., 99, 293–302,, 2017. 

Chatzidiakou, L., Krause, A., Popoola, O. A. M., Di Antonio, A., Kellaway, M., Han, Y., Squires, F. A., Wang, T., Zhang, H., Wang, Q., Fan, Y., Chen, S., Hu, M., Quint, J. K., Barratt, B., Kelly, F. J., Zhu, T., and Jones, R. L.: Characterising low-cost sensors in highly portable platforms to quantify personal exposure in diverse environments, Atmos. Meas. Tech., 12, 4643–4657,, 2019. 

Considine, E. M., Reid, C. E., Ogletree, M. R., and Dye, T.: Improving accuracy of air pollution exposure measurements: Statistical correction of a municipal low-cost airborne particulate matter sensor network, Environ. Pollut., 268, 115833,, 2021. 

Crilley, L. R., Shaw, M., Pound, R., Kramer, L. J., Price, R., Young, S., Lewis, A. C., and Pope, F. D.: Evaluation of a low-cost optical particle counter (Alphasense OPC-N2) for ambient air monitoring, Atmos. Meas. Tech., 11, 709–720,, 2018. 

Cross, E. S., Williams, L. R., Lewis, D. K., Magoon, G. R., Onasch, T. B., Kaminsky, M. L., Worsnop, D. R., and Jayne, J. T.: Use of electrochemical sensors for measurement of air pollution: correcting interference response and validating measurements, Atmos. Meas. Tech., 10, 3575–3588,, 2017. 

Delmas, R. A., Druilhet, A., Cros, B., Durand, P., Delon, C., Lacaux, J. P., Brustet, J. M., Serça, D., Affre, C., Guenther, A., Greenberg, J., Baugh, W., Harley, P., Klinger, L., Ginoux, P., Brasseur, G., Zimmerman, P. R., Grégoire, J. M., Janodet, E., Tournier, A., Perros, P., Marion, Th., Gaudichet, A., Cachier, H., Ruellan, S., Masclet, P., Cautenet, S., Poulet, D., Biona, C. B., Nganga, D., Tathy, J. P., Minga, A., Loemba-Ndembi, J., and Ceccato, P.: Experiment for Regional Sources and Sinks of Oxidants (EXPRESSO): An overview, J. Geophys. Res., 104, 30609–30624,, 1999. 

DeWitt, H. L., Gasore, J., Rupakheti, M., Potter, K. E., Prinn, R. G., Ndikubwimana, J. D. D., Nkusi, J., and Safari, B.: Seasonal and diurnal variability in O3, black carbon, and CO measured at the Rwanda Climate Observatory, Atmos. Chem. Phys., 19, 2063–2078,, 2019. 

Di Antonio, A., Popoola, O. A. M., Ouyang, B., Saffell, J., and Jones, R. L.: Developing a Relative Humidity Correction for Low-Cost Sensors Measuring Ambient Particulate Matter, Sensors-Basel, 18, 2790,, 2018. 

Dionisio, K. L., Arku, R. E., Hughes, A. F., Vallarino, J., Carmichael, H., Spengler, J. D., Agyei-Mensah, S., and Ezzati, M.: Air Pollution in Accra Neighborhoods: Spatial, Socioeconomic, and Temporal Patterns, Environ. Sci. Technol., 44, 2270–2276,, 2010. 

Duvall, R., Clements, A., Hagler, G., Kamal, A., Vasu Kilar, Goodman, L., Frederick, S., Johnson Barkjohn K., VonWald, I., Greene, D., and Dye, T.: Performance Testing Protocols, Metrics, and Target Values for Fine Particulate Matter Air Sensors: Use in Ambient, Outdoor, Fixed Site, Non-Regulatory Supplemental and Informational Monitoring Applications, U.S. EPA Office of Research and Development, Washington, DC, EPA/600/R-20/280, 2021a. 

Duvall, R., Clements, A., Hagler, G., Kamal, A., Vasu Kilar, Goodman, L., Frederick, S., Johnson Barkjohn K., VonWald, I., Greene, D., and Dye, T.: Performance Testing Protocols, Metrics, and Target Values for Ozone Air Sensors: Use in Ambient, Outdoor, Fixed Site, Non-Regulatory and Informational Monitoring Applications, U.S. EPA Office of Research and Development, Washington, DC, EPA/600/R-20/279, 2021b. 

Du, Y., Wang, Q., Sun, Q., Zhang, T., Li, T., and Yan, B.: Assessment of PM2.5 monitoring using MicroPEM: A validation study in a city with elevated PM2.5 levels, Ecotox. Environ. Safe., 171, 518–522,, 2019. 

El-Nadry, M., Li, W., El-Askary, H., Awad, M. A., and Mostafa, A. R.: Urban Health Related Air Quality Indicators over the Middle East and North Africa Countries Using Multiple Satellites and AERONET Data, Remote Sens., 11, 2096,, 2019. 

Emmons, L. K., Deeter, M. N., Gille, J. C., Edwards, D. P., Attié, J.-L., Warner, J., Ziskin, D., Francis, G., Khattatov, B., Yudin, V., Lamarque, J.-F., Ho, S.-P., Mao, D., Chen, J. S., Drummond, J., Novelli, P., Sachse, G., Coffey, M. T., Hannigan, J. W., Gerbig, C., Kawakami, S., Kondo, Y., Takegawa, N., Schlager, H., Baehr, J., and Ziereis, H.: Validation of Measurements of Pollution in the Troposphere (MOPITT) CO retrievals with aircraft in situ profiles, J. Geophys. Res-Atmos., 109, D03309,, 2004. 

Emmons, L. K., Edwards, D. P., Deeter, M. N., Gille, J. C., Campos, T., Nédélec, P., Novelli, P., and Sachse, G.: Measurements of Pollution In The Troposphere (MOPITT) validation through 2006, Atmos. Chem. Phys., 9, 1795–1803,, 2009. 

Fullerton, D. G., Semple, S., Kalambo, F., Suseno, A., Malamba, R., Henderson, G., Ayres, J. G., and Gordon, S. B.: Biomass fuel use and indoor air pollution in homes in Malawi, Occup. Environ. Med., 66, 777–783,, 2009. 

Fullerton, D. G., Suseno, A., Semple, S., Kalambo, F., Malamba, R., White, S., Jack, S., Calverley, P. M., and Gordon, S. B.: Wood smoke exposure, poverty and impaired lung function in Malawian adults, Int. J. Tuberc. Lung D., 15, 391–398, 2011. 

Giordano, M. R., Malings, C., Pandis, S. N., Presto, A. A., McNeill, V. F., Westervelt, D. M., Beekmann, M., and Subramanian, R.: From low-cost sensors to high-quality data: A summary of challenges and best practices for effectively calibrating low-cost particulate matter mass sensors, J. Aerosol Sci., 158, 105833,, 2021. 

Gulia, S., Khanna, I., Shukla, K., and Khare, M.: Ambient air pollutant monitoring and analysis protocol for low and middle income countries: An element of comprehensive urban air quality management framework, Atmos. Environ., 222, 117120,, 2020. 

Hagan, D. H. and Kroll, J. H.: Assessing the accuracy of low-cost optical particle sensors using a physics-based approach, Atmos. Meas. Tech., 13, 6343–6355,, 2020. 

Hagan, D. H., Isaacman-VanWertz, G., Franklin, J. P., Wallace, L. M. M., Kocar, B. D., Heald, C. L., and Kroll, J. H.: Calibration and assessment of electrochemical air quality sensors by co-location with regulatory-grade instruments, Atmos. Meas. Tech., 11, 315–328,, 2018. 

Hagan, D. H., Gani, S., Bhandari, S., Patel, K., Habib, G., Apte, J. S., Hildebrandt Ruiz, L., and Kroll, J. H.: Inferring Aerosol Sources from Low-Cost Air Quality Sensor Measurements: A Case Study in Delhi, India, Environ. Sci. Technol. Lett., 6, 467–472,, 2019. 

Hersey, S. P., Garland, R. M., Crosbie, E., Shingler, T., Sorooshian, A., Piketh, S., and Burger, R.: An overview of regional and local characteristics of aerosols in South Africa using satellite, ground, and modeling data, Atmos. Chem. Phys., 15, 4259–4278,, 2015. 

Jary, H. R., Aston, S., Ho, A., Giorgi, E., Kalata, N., Nyirenda, M., Mallewa, J., Peterson, I., Gordon, S. B., and Mortimer, K.: Household air pollution, chronic respiratory disease and pneumonia in Malawian adults: A case-control study, Wellcome Open Res., 2, 103,, 2017. 

Kelly, K. E., Xing, W. W., Sayahi, T., Mitchell, L., Becnel, T., Gaillardon, P.-E., Meyer, M., and Whitaker, R. T.: Community-Based Measurements Reveal Unseen Differences during Air Pollution Episodes, Environ. Sci. Technol., 55, 120–128,, 2021. 

Laakso, L., Laakso, H., Aalto, P. P., Keronen, P., Petäjä, T., Nieminen, T., Pohja, T., Siivola, E., Kulmala, M., Kgabi, N., Molefe, M., Mabaso, D., Phalatse, D., Pienaar, K., and Kerminen, V.-M.: Basic characteristics of atmospheric particles, trace gases and meteorology in a relatively clean Southern African Savannah environment, Atmos. Chem. Phys., 8, 4823–4839,, 2008. 

Lewis, A. and Edwards, P.: Validate personal air-pollution sensors, Nature, 535, 29–31,, 2016. 

Lewis, A. C., Lee, J. D., Edwards, P. M., Shaw, M. D., Evans, M. J., Moller, S. J., Smith, K. R., Buckley, J. W., Ellis, M., Gillot, S. R., and White, A.: Evaluating the performance of low cost chemical sensors for air pollution research, Faraday Discuss., 189, 85–103,, 2016. 

Li, J., Hauryliuk, A., Malings, C., Eilenberg, S. R., Subramanian, R., and Presto, A. A.: Characterizing the Aging of Alphasense NO2 Sensors in Long-Term Field Deployments, ACS Sens., 6, 2952–2959,, 2021. 

Liousse, C., Assamoi, E., Criqui, P., Granier, C., and Rosset, R.: Explosive growth in African combustion emissions from 2005 to 2030, Environ. Res. Lett., 9, 035003,, 2014. 

Malawi Bureau of Standards: Malawi Standard: Industrial Emissions, Emissions Limits for Stationary and Mobile Sources-Specification, MS737: 2011, Blantyre, p. 73, (last access: 3 June 2022), 2005. 

Malings, C., Tanzer, R., Hauryliuk, A., Kumar, S. P. N., Zimmerman, N., Kara, L. B., Presto, A. A., and Subramanian, R.: Supplementary Data for “Development of a General Calibration Model and Long-Term Performance Evaluation of Low-Cost Sensors for Air Pollutant Gas Monitoring” (abridged version) (2.0), Zenodo [code],, 2018. 

Malings, C., Tanzer, R., Hauryliuk, A., Kumar, S. P. N., Zimmerman, N., Kara, L. B., Presto, A. A., and R. Subramanian: Development of a general calibration model and long-term performance evaluation of low-cost sensors for air pollutant gas monitoring, Atmos. Meas. Tech., 12, 903–920,, 2019a. 

Malings, C., Tanzer, R., Hauryliuk, A., Saha, P. K., Robinson, A. L., Presto, A. A., and Subramanian, R.: Fine particle mass monitoring with low-cost sensors: Corrections and long-term performance evaluation, Aerosol Sci. Tech., 0, 1–15,, 2019b. 

Malings, C., Westervelt, D. M., Hauryliuk, A., Presto, A. A., Grieshop, A., Bittner, A., Beekmann, M., and R. Subramanian: Application of low-cost fine particulate mass monitors to convert satellite aerosol optical depth to surface concentrations in North America and Africa, Atmos. Meas. Tech., 13, 3873–3892,, 2020. 

Mapoma, H. and Xie, X.: State of Air Quality in Malawi, J. Environ. Prot., 4, 1258–1264,, 2013. 

Marais, E. A. and Wiedinmyer, C.: Air Quality Impact of Diffuse and Inefficient Combustion Emissions in Africa (DICE-Africa), Environ. Sci. Technol., 50, 10739–10745,, 2016. 

Martin, R. V., Brauer, M., van Donkelaar, A., Shaddick, G., Narain, U., and Dey, S.: No one knows which city has the highest concentration of fine particulate matter, Atmos. Environ.: X, 3, 100040,, 2019. 

McFarlane, C., Isevulambire, P. K., Lumbuenamo, R. S., Ndinga, A. M. E., Dhammapala, R., Jin, X., McNeill, V. F., Malings, C., Subramanian, R., and Westervelt, D. M.: First Measurements of Ambient PM2.5 in Kinshasa, Democratic Republic of Congo and Brazzaville, Republic of Congo Using Field-calibrated Low-cost Sensors, Aerosol Air Qual. Res., 21, 200619–200619,, 2021. 

Mead, M. I., Popoola, O. A. M., Stewart, G. B., Landshoff, P., Calleja, M., Hayes, M., Baldovi, J. J., McLeod, M. W., Hodgson, T. F., Dicks, J., Lewis, A., Cohen, J., Baron, R., Saffell, J. R., and Jones, R. L.: The use of electrochemical sensors for monitoring urban air quality in low-cost, high-density networks, Atmos. Environ., 70, 186–203,, 2013. 

Morawska, L., Thai, P. K., Liu, X., Asumadu-Sakyi, A., Ayoko, G., Bartonova, A., Bedini, A., Chai, F., Christensen, B., Dunbabin, M., Gao, J., Hagler, G. S. W., Jayaratne, R., Kumar, P., Lau, A. K. H., Louie, P. K. K., Mazaheri, M., Ning, Z., Motta, N., Mullins, B., Rahman, M. M., Ristovski, Z., Shafiei, M., Tjondronegoro, D., Westerdahl, D., and Williams, R.: Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: How far have they gone?, Environ. Int., 116, 286–299,, 2018. 

Murray, C. J. L., Aravkin, A. Y., Zheng, P., et al.: Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019, The Lancet, 396, 1223–1249,, 2020. 

National Statistics Office: The Fourth Integrated Household Survey: Household Socio-economic Characteristics Report, Republic of Malawi, IHS4, p. 109, (last access: 3 June 2022), 2017. 

Nieman, W. A., van Wilgen, B. W., and Leslie, A. J.: A reconstruction of the recent fire regimes of Majete Wildlife Reserve, Malawi, using remote sensing, Fire Ecol., 17, 4,, 2021. 

Nthusi, V.: Nairobi Air Quality Monitoring Sensor Network Report – April 2017,, 2017. 

Petkova, E. P., Jack, D. W., Volavka-Close, N. H., and Kinney, P. L.: Particulate matter pollution in African cities, Air Qual. Atmos. Health, 6, 603–614,, 2013. 

Petters, M. D. and Kreidenweis, S. M.: A single parameter representation of hygroscopic growth and cloud condensation nucleus activity, Atmos. Chem. Phys., 7, 1961–1971,, 2007. 

Pinder, R. W., Klopp, J. M., Kleiman, G., Hagler, G. S. W., Awe, Y., and Terry, S.: Opportunities and challenges for filling the air quality data gap in low- and middle-income countries, Atmos. Environ., 215, 116794,, 2019. 

Popoola, O. A. M., Stewart, G. B., Mead, M. I., and Jones, R. L.: Development of a baseline-temperature correction methodology for electrochemical sensors and its implications for long-term stability, Atmos. Environ., 147, 330–343,, 2016. 

Queface, A. J., Piketh, S. J., Eck, T. F., Tsay, S.-C., and Mavume, A. F.: Climatology of aerosol optical properties in Southern Africa, Atmos. Environ., 45, 2910–2921,, 2011. 

Rahal, F.: Low-cost sensors, an interesting alternative for air quality monitoring in Africa, Clean Air Journal, 30, 2,, 2020. 

Rai, A. C., Kumar, P., Pilla, F., Skouloudis, A. N., Di Sabatino, S., Ratti, C., Yasar, A., and Rickerby, D.: End-user perspective of low-cost sensors for outdoor air pollution monitoring, Sci. Total Environ., 607–608, 691–705,, 2017. 

Reid, J. S., Koppmann, R., Eck, T. F., and Eleuterio, D. P.: A review of biomass burning emissions part II: intensive physical properties of biomass burning particles, Atmos. Chem. Phys., 5, 799–825,, 2005. 

Saha, P. K., Khlystov, A., and Grieshop, A. P.: Downwind evolution of the volatility and mixing state of near-road aerosols near a US interstate highway, Atmos. Chem. Phys., 18, 2139–2154,, 2018. 

Sahu, R., Nagal, A., Dixit, K. K., Unnibhavi, H., Mantravadi, S., Nair, S., Simmhan, Y., Mishra, B., Zele, R., Sutaria, R., Motghare, V. M., Kar, P., and Tripathi, S. N.: Robust statistical calibration and characterization of portable low-cost air quality monitoring sensors to quantify real-time O3 and NO2 concentrations in diverse environments, Atmos. Meas. Tech., 14, 37–52,, 2021. 

Scheel, H. E., Brunke, E.-G., Sladkovic, R., and Seiler, W.: In situ CO concentrations at the sites Zugspitze (47 N, 11 E) and Cape Point (34 S, 18 E) in April and October 1994, J. Geophys. Res-Atmos., 103, 19295–19304,, 1998. 

Sewor, C., Obeng, A. A., and Amegah, A. K.: Commentary: The Ghana Urban Air Quality Project (GHAir): Bridging air pollution data gaps in Ghana, Clean Air Journal, 31, 1,, 2021. 

Shikwambana, L. and Tsoeleng, L. T.: Impacts of population growth and land use on air quality. A case study of Tshwane, Rustenburg and Emalahleni, South Africa, S. Afr. Geogr. J., 102, 209–222,, 2020. 

Sousan, S., Koehler, K., Hallett, L., and Peters, T. M.: Evaluation of the Alphasense optical particle counter (OPC-N2) and the Grimm portable aerosol spectrometer (PAS-1.108), Aerosol Sci. Tech., 50, 1352–1365,, 2016. 

Spinelle, L., Gerboles, M., and Aleixandre, M.: Performance Evaluation of Amperometric Sensors for the Monitoring of O3 and NO2 in Ambient Air at ppb Level, Chem. Engineer. Trans., 120, 480–483,, 2015. 

Spinelle, L., Gerboles, M., Aleixandre, M., and Bonavitacola, F.: Evaluation of Metal Oxides Sensors for the Monitoring of O3 in Ambient Air at Ppb Level, Procedia Engineer., 54, 319–324,, 2016. 

Stevens, T. and Madani, K.: Future climate impacts on maize farming and food security in Malawi, Sci. Rep., 6, 36241,, 2016. 

Subramanian, R. and Garland, R.: Editorial: The powerful potential of low-cost sensors for air quality research in Africa, Clean Air Journal, 31, 1,, 2021. 

Subramanian, R., Ellis, A., Torres-Delgado, E., Tanzer, R., Malings, C., Rivera, F., Morales, M., Baumgardner, D., Presto, A., and Mayol-Bracero, O. L.: Air Quality in Puerto Rico in the Aftermath of Hurricane Maria: A Case Study on the Use of Lower Cost Air Quality Monitors, ACS Earth Space Chem., 2, 1179–1186,, 2018. 

Subramanian, R., Kagabo, A. S., Baharane, V., Guhirwa, S., Sindayigaya, C., Malings, C., Williams, N. J., Kalisa, E., Li, H., Adams, P., Robinson, A. L., DeWitt, H. L., Gasore, J., and Jaramillo, P.: Air pollution in Kigali, Rwanda: spatial and temporal variability, source contributions, and the impact of car-free Sundays, Clean Air Journal, 30, 2,, 2020. 

The Guardian: UN moves staff after mobs kill five in Malawi vampire scare, 9 October 2017,,aid%20agencies%20and%20NGOs%20work, (last access: 24 May 2022), 2017. 

Thorson, J., Collier-Oxandale, A., and Hannigan, M.: Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources, Sensors, 19, 3723,, 2019. 

Toihir, A. M., Venkataraman, S., Mbatha, N., Sangeetha, S. K., Bencherif, H., Brunke, E.-G., and Labuschagne, C.: Studies on CO variation and trends over South Africa and the Indian Ocean using TES satellite data, S. Afr. J. Sci., 111, 1–9, 2015.  

Topalović, D. B., Davidović, M. D., Jovanović, M., Bartonova, A., Ristovski, Z., and Jovašević-Stojanović, M.: In search of an optimal in-field calibration method of low-cost gas sensors for ambient air pollutants: Comparison of linear, multilinear and artificial neural network approaches, Atmos. Environ., 213, 640–658,, 2019. 

Wernecke, B. and Wright, C.: Commentary: Opportunities for the application of low-cost sensors in epidemiological studies to advance evidence of air pollution impacts on human health, Clean Air Journal, 31, 1,, 2021. 

Williams, R., Kaufman, A., Hanley, T., Rice, J., and Garvey, S.: Evaluation of Field-deployed Low Cost PM Sensors, U.S. Environmental Protection Agency, Washington, DC, EPA/600/R-14/464 (NTIS PB 2015-102104), 2014a. 

Williams, R., Long, R., Beaver, M., Kaufman, A., Zeiger, F., Heimbinder, M., Hang, I., Yap, R., Acharya, B., Ginwald, B., Kupcho, K., Robinson, S., Zaouak, O., Aubert, B., Hannigan, M., Piedrahita, R., Masson, N., Moran, B., Rook, M., Heppner, P., Cogar, C., Nikzad, N., and Griswold, W.: Sensor Evaluation Report, U.S. Environmental Protection Agency, Washington, DC, EPA/600/R-14/143 (NTIS PB2015-100611), 2014b. 

Yurganov, L., McMillan, W., Grechko, E., and Dzhola, A.: Analysis of global and regional CO burdens measured from space between 2000 and 2009 and validated by ground-based solar tracking spectrometers, Atmos. Chem. Phys., 10, 3479–3494,, 2010. 

Yurganov, L. N., McMillan, W. W., Dzhola, A. V., Grechko, E. I., Jones, N. B., and Van der Werf, G. R.: Global AIRS and MOPITT CO measurements: Validation, comparison, and links to biomass burning variations and carbon cycle, J. Geophys. Res-Atmos., 113, D09301,, 2008. 

Zhang, T., Chillrud, S. N., Pitiranggon, M., Ross, J., Ji, J., and Yan, B.: Development of an approach to correcting MicroPEM baseline drift, Environ. Res., 164, 39–44,, 2018. 

Zhou, Z., Dionisio, K. L., Arku, R. E., Quaye, A., Hughes, A. F., Vallarino, J., Spengler, J. D., Hill, A., Agyei-Mensah, S., and Ezzati, M.: Household and community poverty, biomass use, and air pollution in Accra, Ghana, P. Natl. Acad. Sci. USA, 108, 11028–11033,, 2011. 

Zimmerman, N., Presto, A. A., Kumar, S. P. N., Gu, J., Hauryliuk, A., Robinson, E. S., Robinson, A. L., and R. Subramanian: A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring, Atmos. Meas. Tech., 11, 291–313,, 2018. 

Short summary
We present findings from a 1-year pilot deployment of low-cost integrated air quality sensor packages in rural Malawi using calibration models developed during collocation with US regulatory monitors. We compare the results with data from remote sensing products and previous field studies. We conclude that while the remote calibration approach can help extract useful data, great care is needed when assessing low-cost sensor data collected in regions without reference instrumentation.