the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Mobile Air Quality Monitoring and Comparison to Fixed Monitoring Sites for Quality Assurance
Andrew R. Whitehill
Melissa Lunden
Brian LaFranchi
Surender Kaushik
Paul A. Solomon
Abstract. Air pollution monitoring using mobile ground-based measurement platforms can provide high quality spatiotemporal air pollution information. As mobile air quality monitoring campaigns extend to entire fleets of vehicles and embrace lower cost air quality sensors, it is important to address the quality assurance needs for validating these measurements. We explore collocation-based evaluation of air quality instruments in a mobile platform against fixed regulatory sites, both when the mobile platform is parked at the fixed regulatory site and when moving at distances of meters to kilometers from the site. We demonstrate agreement within 4 ppbv (for NO2 and O3 + NO2) to 10 ppbv (for NO) when using a running median of 40 hourly differences between the moving mobile platform measurements and stationary site measurements. The comparability is strong when only measurements from residential roads are used but are only slightly diminished when all roads except highways are included in the analysis. We present a method for assessing mobile measurements of ozone (O3), nitrogen dioxide (NO2), and odd oxygen (OX = O3 + NO2) on an ongoing basis through comparisons with fixed regulatory sites.
- Preprint
(1454 KB) -
Supplement
(317 KB) - BibTeX
- EndNote
Andrew R. Whitehill et al.
Status: open (until 07 Jun 2023)
-
RC1: 'Comment on amt-2023-82', Anonymous Referee #1, 25 May 2023
reply
In their manuscript “Mobile Air Quality Monitoring and Comparison to Fixed Monitoring Sites for Quality Assurance”, Whitehill and coauthors present comparison measurements of ozone, NO, NO2, and Ox (O3 + NO2), measured on-board mobile platforms with those measured in stationary air quality network stations. The mobile platform measurements were performed both, during parking near the network stations and during mobile measurements in the vicinity of the network stations.
Linear regression analysis of the data from the mobile and the stationary platforms showed good agreement, albeit with small systematic negative biases for O3 and NO2, and large coefficients of determination for ozone, NO2 and Ox for both data sets from both platforms. NO data, however, did show much poorer agreement and correlation, compared to the other variables.
Generally, agreement between the mobile platform data and the network station data was better when the mobile platform data were collected closer to the network station. Nevertheless, for the abovementioned variables reasonable agreement between the data was found for distances up to 10 km from the network station.
These results were used to develop a method to quality assure mobile measurement data from long-term field measurements by using data acquired within a certain distance to an air quality network station and using median values over 40 hours of data.
The manuscript is well structured and clearly written. The results show that for some pollutants, comparison of mobile data with data from an air quality network station in the vicinity might be used to determine calibration drifts or malfunctions of mobile instruments under certain conditions. For this purpose, the manuscript will be useful for applications where mobile measurements were conducted over extended times within a certain area, which also contains a fixed monitoring site. However, the poor comparison results for NO, which is a pollutant, which is dominated by local sources, shows that such a quality assurance approach can only be applied for pollutants which have a very homogeneous spatial distribution over an area of several km extension. Strong spatial variability of pollutant concentrations will make such an approach impossible. This limitation will likely not only apply for NO (which was measured in the study), but also for particle-related variables like BC or particle number concentration, which also show a large spatial inhomogeneity. It would be desirable, if the authors would more critically assess their approach with respect to such limitations also in the general sections of their manuscript like the abstract and the conclusions section. This critical assessment should also include the fact that regionally homogeneous distribution of pollutants – which is necessary for the suggested in-field quality assurance method – also likely makes it less necessary to map out pollutant distribution with a narrow-gridded driving pattern, which takes a lot of effort to perform. The balance between mapping out pollutants with sufficient spatial resolution and having the possibility of using “remote” stationary measurements for quality assurance – potentially by applying sufficient temporal averaging – should be discussed.
I recommend publication in AMT after these and several other minor issues have been addressed, as detailed below.
Detailed comments:
Section 2 – Overview of Methods:
According to previous publications and the Aclima website, also black carbon and particle number concentration was measured on Aclima vehicles. It is not clear, whether this was also the case during the measurements used for this study. Since especially these particle-related variables probably have a very large spatial (and temporal) variability, it would be very relevant to include such data in this assessment of comparability of mobile and stationary measurements.
Section 3.1 – Mobile platforms parked at reference site - methods:
How large was the distance between the mobile measurement platform and the stationary measurement setup inlets? Which were the altitudes above ground level for both inlets? Further below (caption of Figure 2) it is stated that the distance was between 80 and 145 m and between 10 and 85 m for the two sites. Why did the distances vary so much? For both sites, the largest distances of the parking locations to the measurement sites are larger than the distance to the closest roads. Why was the vehicle not parked directly at the site for comparison? How large is the influence of the distance on the comparability of the results?
Section 3.2 – Mobile platforms parked at reference site - Results and Discussion; Line 188-193:
Can the typical differences between the car measurements and the fixed site measurements be explained by the sampling situation? Are these differences arbitrary differences or can they be explained by the sampling environment (e.g. traffic), meteorological conditions (e.g. wind direction) or other external influences? Do these differences and the variability of these differences (i.e. the r2 values) between the car and fixed site measurements reflect the spatial inhomogeneity of the respective pollutant concentrations? How are they related? How do they depend on the distance between both sampling locations? As a consequence, can (mobile) measurements of spatial inhomogeneity of pollutants be used to estimate how well such a measurement comparison (or the quality assurance approach presented further below) will work for a certain pollutant?
Section 4.1 – Mobile platforms driving around a fixed reference site in Denver – Methods:
Line 198: Impact by emission plumes from mobile sources is not only a problem for parked vehicles, but much more for driving vehicles (i.e. mobile measurements), which are often surrounded by other vehicles, driving on the same road.
Line 212: Can “lower traffic” be somehow quantified? An “empty” highway affects the measurements probably less than a congestion in front of a traffic light or at an intersection on a residential road.
Table 2: This is rather detailed information which could be shifted into the supplement.
Figure 3: In this figure correlations between 1-min averaged ozone concentrations measured at a fixed site and during mobile measurements at distances up to 10 km are shown. For distances larger than may be 100 m, it is clear that different air parcels are compared in every data point. Therefore, these correlation plots are not a comparison of the performance of the measurement instruments in the car and on the fixed site, but an investigation of spatial homogeneity of the ozone concentrations. Since ozone is rather homogeneously distributed, such a comparison can be used for quality assurance of the instrument (at the same time it is questionable, why in such a case small-meshed mobile measurements should be made). For other pollutants like NO (but likely also BC or PNC), the correlations are (or would likely be) much weaker and consequently the quality assurance approach would not work. All this should be treated and critically discussed in such a paper which focuses on the possibility to use such measurements as quality assurance approach (not only in a short sentence, which explains why NO shows much weaker correlations – line 262/263).
In addition, I think all these correlation plots could go into the supplement and rather a plot that shows r2 (and slope in another panel) versus distance for the various road types should be shown here. The same is true for Figure 4.
Line 268-270: I agree that it is important to exclude local emission plumes from the averages. I wonder why the highly time-resolved data were not used to exclude such plumes before averaging, like it was shown in this journal earlier for mobile measurements (e.g. Drewnick et al., AMT 2012).
Line 372-374: To me it seems rather doubtful that a few seconds of data can reasonably represent a one-hour average. How would the comparison (mobile/stationary) change or improve, if a minimum data coverage would be introduced?
Section 5.3 – Using driving data for ongoing performance evaluation:
If I understand this performance evaluation approach correctly, it assumes a constant drift of the calibration of the mobile instruments over longer time intervals or a malfunction which causes a change in the calibration or the response of these instruments, in order to be able to detect the bias. What about temperature-related drifts in instrument calibrations, which would possibly occur repeatedly over the course of a day? Would such biases be detected by this approach?
Figure 8 – caption: shouldn’t it be “running median” instead of “running mean”?
Line 450-455: How do you know that these DeltaX-values actually reflect changes in the calibrations of the mobile instruments and consequently could be used to correct for those? Couldn’t there be other reasons for these differences, like local sources or an issue with the stationary instruments?
Citation: https://doi.org/10.5194/amt-2023-82-RC1 -
RC2: 'Comment on amt-2023-82', Anonymous Referee #2, 26 May 2023
reply
The comment was uploaded in the form of a supplement: https://amt.copernicus.org/preprints/amt-2023-82/amt-2023-82-RC2-supplement.pdf
-
RC3: 'Comment on amt-2023-82', Anonymous Referee #3, 31 May 2023
reply
Paper summary, in my words:
This paper presents an assessment of measurements made from mobile platforms against fixed-site regulatory measurements as a comparison benchmark. The authors describe approaching the above in two different ways: parked collocation between mobile and fixed-site, and in-motion comparison between mobile and fixed-site monitors across different buffer sizes, road classifications, and averaging intervals. The stated broader vision of the paper is to provide a framework for identifying and addressing instrumental problems (e.g. drift) in a certain type of dataset (multiple mobile monitors doing highly-routinized driving over long durations).
When comparing stationary mobile monitoring data vs the fixed-site monitoring data in Denver, the authors quantify the agreement between the Aclima cars and each fixed-site, concluding that the stationary mobile-monitor does well for O3, fairly well for Ox (though not as well as hypothesized), ok for NO2, and poorly for NO. The conclusions are based on r2 values and slopes from OLS as metrics. The authors discuss the results in the context of regional vs. primary local pollution.
The authors then use in-motion mobile monitoring data to assess by-pollutant variability across road category and spatial aggregation buffer size in two different locations (Denver and California). The stated difference in these analyses is that the Denver reference monitors had higher time resolution (1-minute) compared to those in California (1-hour), but had less temporal and seasonal coverage.
The general conclusion from the Denver in-motion analysis is that mobile data works similarly to data collected while parked when comparing against a fixed-site. Measurements collected closer to the fixed-site monitor tended to be better than data collected further afield. The authors allude to a tradeoff between including more measurements (which tend to make the comparisons more robust, all else equal) and including measurements made further away (which tend to be weaker comparisons due to spatial variability).
The authors then address the above by considering how much data is needed, temporally-speaking, as they do this analysis at a single fixed buffer size, to make an assessment of whether or not a mobile monitor is agreeing with a fixed-site monitor. They conclude that for all pollutants, 40 hours of data is what is required.
Overall analysis of paper:
I think it is important for the field to continue developing best practices in mobile monitoring to ensure data quality and distill meaning. This manuscript does work in that regard that is novel enough to warrant publication, after significant revision. There are only likely to be more of these big-scale mobile monitoring efforts, and moving towards transparent, algorithmic QA practices is a good thing.
Unfortunately, the analysis here does not offer much depth or applicability for the key thing that mobile monitoring has the power to uncover—spatial variability in pollutant concentration for primary emitted pollutant, such as NO, which performs poorly in all comparisons outlined by the authors. Presumably things like black carbon, particle number, methane/ethane, many types of VOCs, and certain primary organic aerosol constituents would all perform more like NO than O3. The authors do not offer much in the way of explanation of what their results mean for (arguably) the most important pollutant that they measured, given the “strong suit” of what mobile monitoring can address. This seems like a real hole in the manuscript that I think needs to be (much) better addressed and discussed.
Relatedly, the results that do compare most favorably are for pollutants that are much less spatially variable. While it isn’t a bad thing to perform this analysis on these pollutants (e.g. O3), and in fact serves as a good ‘base case check,’ these are not the type of pollutants that one goes to the great lengths of having a fleet of mobile monitors for. Again, I think this is a conceptual hole in the manuscript.
Lastly, I have a lot of more detailed criticisms about figures, tables, etc. that make this paper not publishable in its current form. I suggest both a systematic re-assessment of the results in this paper in the context of primary pollutant spatial variability and addressing the detailed “minor” issues below, followed by a re-review, before publication.
Big(ger) picture stuff
-There are a lot of pollutants, including ones of interest to many (PN, BC) that are mentioned as being measured, but are nowhere in the results. Why? Please address/discuss.
-Section 3 - the lack of agreement between NO from the car and the fixed-site must (assuming both instruments work/were calibrated, as they are stated to) be due to spatial variability between the parking spot and the monitor. The parking location is mentioned as a range of distances, and clearly where the car is parked (and/or which way winds were coming, etc.) should explain the disagreement. This should be addressed in some detail.
Additionally, given the reliance on the difference measurements in Section 5, I would expect more of a discussion of the difference measurements in the last part of Section 3. It appears that we are looking at an ozone monitor that has a systematic offset from the fixed-site?
-Section 4: Suggestions for improving the scatterplots are presented below. But more generally, I think that a figure that synthesizes the findings beyond the many scatterplots should be present, e.g. R2 vs. buffer size, for all pollutants. As mentioned above, I think much more emphasis needs to be spent on what these results mean for pollutants that are expected to be spatially heterogenous. Are we really looking at quality-checking the analyzer in a mobile lab compared to the ground-truth fixed-site, or are we simply comparing different air masses with different concentrations? This paper is presented as a means of quality-checking baseline instrument performance. But a lot of the pollutant comparisons are really assessments of their spatial variability, which need not have anything to do with instrument performance. Throughout the manuscript these ideas seem conflated.
-Section 5: a lot of the value of Section 5 compared to Section 4 seems to be: how does the above approach work with the more-common, longer-duration (hourly) fixed-site measurements? And the difference in the temporal coverage between the two datasets (Denver and California) is what allows the authors to pursue this question. However, the authors then average all of their measurements to hourly, but say that even very brief intervals (1s) can be used for these hourly averages. To me, this does not seem justified, especially for pollutants with spatial and/or temporal variability (which again are largely the kind of interest to the mobile monitoring community). Moreover, I would think that shorter-duration mobile monitoring averages would be more scattered than longer-duration averages, all else equal. Is this so? How does this vary by pollutant? The authors should justify this choice, I think.
Also (from Section 5), Figure 8 seems to be a major distilled result that the authors are driving towards with all of the previous analyses. However, some of the fundamental aspects of this Figure were hard for me to understand. Why is the “range of median differences” the important quantity here? And should this quantity be normalized in some way, as Figure 8 makes it seem like NO agrees more than O3 for a window see of 20 hours or less?
Things to change/reconsider in tables and figures:
-Table S1 (and S2): I don’t think anyone will particularly care about Start Time and End Time. However, it would be useful for the reader to be able to see how many minute-avg. points each stationary sampling period yielded—please add this.
-Figure S1: What values are the whiskers signifying? This should be stated somewhere. What do the points (outliers? Defined how?) signify? This should also be stated. It seems a bit hard to believe, looking at the NO plots from figure 1 that there would only be one outlier point at each site in the deltaNO quantity—is something off here? Also, this should be in the main text and not the SI, in my opinion. Also, I am surprised that the results in this figure are not connected to the analysis at the end, or discussed in that context—clearly there is a systematic offset between the mobile monitor and each fixed-site measurement of ozone, which warrants some discussion.
-Figure 1: This comment will apply to all of the scatterplots: For figure 1 you are showing the one:one line, but not showing the OLS results. For many of the other similar plots you show OLS results but not the one:one line. Given the variety of scales used (between pollutants, spatial aggregations, etc.), I strongly recommend adding both of these (in different colors) to each plot, as quick visual reference points for the reader.
-Figure 2: This figure is meant to establish the spatial context of the two sites, including the length scale and position of parking relative to the stationary monitor. I don’t think this figure accomplishes these goals. There should be distance scale bars. Given the emphasis on distance-from metrics in later sections, it would also be helpful to see how these sites fit into any larger land-use context. Also, given the large range of car-to-site distances, some kind of areal shading makes more sense than a single point marking parking. Also, this should come before Figure 1. Other things would be useful to show here too (which were not discussed in the text at all, and possibly should be): wind-rose insets (Seems like it would be quite important potentially for the La Casa site, though again the reader has no way of knowing what is west of the monitoring location (assuming that these maps are oriented with north being “up”)), and north arrows.
-Table 2: While this information is important, this seems less relevant for the main text than pretty much everything currently in the supplement. I suggest moving this to the SI.
-Figure 2 (and 3): again, please add one:one lines, and again this is too low of resolution to be useful. I would also suggest some kind of different plotting style given that it is too difficult for the reader to see the data for e.g. 100m-Major. A bigger picture comment mentioned above, but I will re-emphasize it here: that ozone (a regional pollutant not expected to have much spatial variability) behaves well is not surprising. I’m not saying it isnt worth presenting this result, but it makes no sense to me to present O3 in the main text and NO in the SI, given that NO is spatially variable, and hence of interest to this whole endeavor in a way that O3 really isnt.
-Table 3: Why would you omit 100m, 3000m, etc. here? You make a point to display the plots in the previous figure, but then only show a subset of the regression results. Whatever goes in Figure 2 should go in Table 3 (or vice versa).
-Figure 4: shouldn’t N for the individual non-highway categories (e.g. major, residential) sum to the N shown in the bottom row “Non-Highway” category? I am noticing that for the 100m column they do not, on this figure.
-Figure 5: This map also does not effectively communicate what the authors presumably intend, and it needs significant work. I am guessing (though it is not explicit) that the colored roads are where the Aclima cars drive? Or are these a subset of the roads drive, but the sections of which that are within the largest buffer size? Its unclear. There should be scale bars and a north arrow here. At this level of zoom it’s impossible to tell residential from major roads. I would consider substantially re-working this figure to best communicate the relevant ideas—where the reference monitors are (and probably label them on the map), how the buffers around each compare to the spatial extent of the domains (so perhaps just draw some rings). I would let the inset maps ”do the talking” as far as the road categorization goes, because it’s very difficult to get a sense for that at this level of zoom. There are other minor, but still important aspects that need work here, such as the marker size for the reference monitors—one of the most important pieces of info you are trying to convey—is a very small fraction of the font size of the legend.
-Figure 6: I don’t understand what the authors are trying to convey with this figure, exactly. And why just show Livermore, but not the other monitoring sites that are mentioned? I suggest showing each monitoring station I think some text is needed to very clearly outline the message of the figure (“As you can see from Figure 6, the driving routes are concentrated around the fixed site. Also, you can see how each of the three monitors (West Oakland, Livermore, San Francisco) compare to each other in terms of vehicle-related land use in the vicinity of the monitor as well as where we drove). These maps also should have scale bars, north arrows, and buffers indicating the relevant buffer sizes used in the analysis. All of that would make this figure convey much more spatial context than it currently does.
-Figure 8: The figure caption says mean, while the y-axis says median. I also suggest adding as horizontal lines the bias from the zero/span checks, for comparison (as is done in the text).
Other minor issues:
-self-sampling for the parked periods is not addressed, but should be
-did they authors apply the sum of their framework to any of their individual monitors/cars, in order to see if there was any drift/QA issues? This seems like the obvious thing to have done. It at least should be addressed as to why not?
-“2. Overview of Methods” should probably be a bit more precise given the multiple following “Methods” sections, and the fact that some of the text contained in the multiple Results sections is also “Methods.” I suggest a change that makes clear this section is really mostly about the instrumentation itself, and not any of the data analysis methods, experimental design, etc.
Citation: https://doi.org/10.5194/amt-2023-82-RC3
Andrew R. Whitehill et al.
Andrew R. Whitehill et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
210 | 63 | 10 | 283 | 21 | 4 | 3 |
- HTML: 210
- PDF: 63
- XML: 10
- Total: 283
- Supplement: 21
- BibTeX: 4
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1