Reply on RC1-response to detailed comments from authors

RC1. Alphasense NO2-A43F electrochemical NO2 sensors (and Alphasense NO2-B43F) have a known cross-sensitivity to ozone (Spinelle et al.). Although the Praxis Urban sensor system and the St Ebbe’s monitoring site do not appear to measure ozone, the study fails to mention/address this concern. While inclusion of this variable into feature training could restrict the spread of this model to other networks, it could greatly enhance the performance of the NO2 model. Spinelle et al. also found that sensors from the same manufacturer can behave differently in the same environmental conditions. This manuscript would greatly benefit from applying your model to more than one sensor to demonstrate its capability to nullify discrepancies from sensor to sensor. (Spinelle, L.; Gerboles, M.; Kotsev, A.; Signorini, M. Evaluation of Low-Cost Sensors for Air Pollution Monitoring: Effect of Gaseous Interfering Compounds and Meteorlogical Conditions; Publications Office of the European Union:Luxemborg, 2017. https://doi.org/10.2760/548327)

We think it is very valuable for context and in developing further learnings. We agree that it is worthwhile adding a note on cross sensitivity with ozone and (will) include a reference to Spinelle 2017 in the revised manuscript We confirm that ozone data is available at the St Ebbes monitoring station and agree that these data would help in evaluating the effectiveness of ozone as an additional training feature for the development of the RF model and improved correction model performance. However, only 6 in 16 sensors deployed across our network have ozone monitoring capability and this was not the focus for application of low-cost sensor data for local air quality management.
Documenting the performance of the models as-is, is valuable as a demonstrator for the performance that is achievable with the constrained approach presented (i.e. without the ozone cross sensitivity training), not least as this is representative of many real-world lowcost sensor applications where (many) NO 2 only electrochemical sensor network in operation.

RC1. It is unclear how this model could be applied to sensors throughout a network. Would each sensor need to spend x number of months at a reference site to develop the model prior to deployment? How well would a baseline established at the reference site transfer to the deployment site?
Author response. For deployment in real world situations I would anticipate that the model, or a variant thereof, would be training for each 'local' network and this model would be directly deployable across a local network e.g. within a town or small city where the influencing variables are likely to be consistent. The correction model itself is constrained by the diversity of data used to train it, both in terms of variability sensor to sensor and in terms of the pollution/environmental conditions to which the sensors are exposed (mainly NO 2 & RH). The more diverse the training data, the greater the applicability of the model. One of the main challenges for most applications, and particularly in a study environment such as Oxford which has generally / relatively good air quality, is the under-representation of higher pollution events in the training datasets which may result in over correction (under prediction) of real-world concentrations. In an ideal situation one could imagine co-location at low, medium, high and very high pollution conditions, but as I am sure you are aware such situations are almost impossible to engineer. Table 1 were identified empirically from an analysis of typical sensor performance from the sensor network and from similar parameters logged at the St Ebbe's AURN station" It is not fully clear how these criteria were chosen. Was this based on limits set by the sensor manufacturer? Please clarify. It would also be useful to state the sample population percentage that was removed based on these criteria, as you did on line 188.

RC1 Line 163: "The filtering criteria presented in
Author response. Thank you for this comment, we clarify these criteria were developed independently of the manufacturer. Please see sections 2.3.1 to 2.3.4 for an explanation of the derivation of the filter criteria and associated techniques. We will add a footnote to Table 1 to reflect this. RC1 Line 69: "multiple linear regression (MLR) models have been successfully used with variable results" Conflicting statement, please clarify.
Author response. We suggest modifying this to "multiple linear regression (MLR) models have been developed with variable results" RC1 Line 136: Please provide more information regarding the location of the sensor relative to the reference instrumentation.