Journal cover Journal topic
Atmospheric Measurement Techniques An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 3.248 IF 3.248
  • IF 5-year value: 3.650 IF 5-year 3.650
  • CiteScore value: 3.37 CiteScore 3.37
  • SNIP value: 1.253 SNIP 1.253
  • SJR value: 1.869 SJR 1.869
  • IPP value: 3.29 IPP 3.29
  • h5-index value: 47 h5-index 47
  • Scimago H index value: 60 Scimago H index 60
Volume 11, issue 1 | Copyright
Atmos. Meas. Tech., 11, 291-313, 2018
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Research article 15 Jan 2018

Research article | 15 Jan 2018

A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring

Naomi Zimmerman1, Albert A. Presto1, Sriniwasa P. N. Kumar1, Jason Gu2, Aliaksei Hauryliuk1, Ellis S. Robinson1, Allen L. Robinson1, and R. Subramanian1 Naomi Zimmerman et al.
  • 1Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USA
  • 2Sensevere LLC, Pittsburgh, PA 15222, USA

Abstract. Low-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost sensors are frequently sensitive to environmental conditions and pollutant cross-sensitivities, which have historically been poorly addressed by laboratory calibrations, limiting their utility for monitoring. In this study, we investigated different calibration models for the Real-time Affordable Multi-Pollutant (RAMP) sensor package, which measures CO, NO2, O3, and CO2. We explored three methods: (1) laboratory univariate linear regression, (2) empirical multiple linear regression, and (3) machine-learning-based calibration models using random forests (RF). Calibration models were developed for 16–19 RAMP monitors (varied by pollutant) using training and testing windows spanning August 2016 through February 2017 in Pittsburgh, PA, US. The random forest models matched (CO) or significantly outperformed (NO2, CO2, O3) the other calibration models, and their accuracy and precision were robust over time for testing windows of up to 16 weeks. Following calibration, average mean absolute error on the testing data set from the random forest models was 38ppb for CO (14% relative error), 10ppm for CO2 (2% relative error), 3.5ppb for NO2 (29% relative error), and 3.4ppb for O3 (15% relative error), and Pearson r versus the reference monitors exceeded 0.8 for most units. Model performance is explored in detail, including a quantification of model variable importance, accuracy across different concentration ranges, and performance in a range of monitoring contexts including the National Ambient Air Quality Standards (NAAQS) and the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. A key strength of the RF approach is that it accounts for pollutant cross-sensitivities. This highlights the importance of developing multipollutant sensor packages (as opposed to single-pollutant monitors); we determined this is especially critical for NO2 and CO2. The evaluation reveals that only the RF-calibrated sensors meet the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. We also demonstrate that the RF-model-calibrated sensors could detect differences in NO2 concentrations between a near-road site and a suburban site less than 1.5km away. From this study, we conclude that combining RF models with carefully controlled state-of-the-art multipollutant sensor packages as in the RAMP monitors appears to be a very promising approach to address the poor performance that has plagued low-cost air quality sensors.

Publications Copernicus
Short summary
Low-cost sensors promise neighborhood-scale air quality monitoring but have been plagued by inconsistent performance for precision, accuracy, and drift. CMU and SenSevere collaborated to develop the RAMP, which uses electrochemical sensors. We present a machine learning algorithm that overcomes previous performance issues and meets US EPA's data quality recommendations for personal exposure for NO2 and tougher "supplemental monitoring" standards for CO & ozone across 19 RAMPs for several months.
Low-cost sensors promise neighborhood-scale air quality monitoring but have been plagued by...