Estimation of Missing Values in Time Series of Air Pollution Data in Tehran City

Document Type : Research Paper

Authors

Department of Spatial Information Systems, Faculty of Surveying Engineering and Geospatial Information, College of engineering, University of Tehran, Tehran, Iran.

Abstract

Today, air pollution has become one of the most critical problems in densely populated cities, which causes many city residents to suffer from lung problems every year and can have irreparable effects on citizens' health. Air pollution recording devices in cities record pollution hourly. The technical issues of these devices sometimes cause some of the important data not to be recorded, and as a result, fixed values ​​are created in the data. In this study, fixed values ​​have been estimated. For this purpose, the study of air pollution events in Tehran including the concentration of PM2.5, PM10, SO2, NO2, O3 and CO pollutants was conducted. The LANN algorithm, used in the estimation and forecasting of single-variable time series, has been implemented and compared for all pollutants. Also, in another part of the study, other environmental pollutants have been considered in the estimation of fixed values, and by using the neural network method, the estimation of fixed values ​​for all pollutants has been done. RMSE index was also used to check and compare algorithms. The value of RMSE in the LANN method was lower than other simpler models including mean, linear regression and LOCF, so its value was 30 to 50% lower, depending on the type of pollutant. Also, the neural network algorithm had lower RMSE than other methods in estimating PM2.5 values ​​and its value was 7.78.

Keywords


  1. Al-Helali, B., Chen, Q., Xue, B., & Zhang, M. (2021). A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Computing, 1-20.
  2. Aljuaid, T., & Sasi, S. (2016). Proper imputation techniques for missing values in data sets. 2016 International Conference on Data Science and Engineering (ICDSE).
  3. Andiojaya, A., & Demirhan, H. (2019). A bagging algorithm for the imputation of missing values in time series. Expert Systems with Applications, 129, 10-26.
  4. Ashrafi, K., & Ahmadi Orkomi, A. (2014). Atmospheric stability analysis andits correlationwith the concentration of air pollutants: a case study ofa critical air pollution episode in Tehran. Iranian Geophys, 8(3), 49-61.
  5. Ashrafi, K., & Hoshyaripour, G. A. (2010). A model to determine atmospheric stability and its correlation with CO concentration. International Journal of Civil and Environmental Engineering, 2(2), 82-88.
  6. Bokde, N., Beck, M. W., ءlvarez, F. M., & Kulat, K. (2018). A novel imputation methodology for time series based on pattern sequence forecasting. Pattern Recognition Letters, 116, 88-96.
  7. Caillault, É. P., Lefebvre, A., & Bigand, A. (2020). Dynamic time warping-based imputation for univariate time series data. Pattern Recognition Letters, 139, 139-147.
  8. Engels, J. M., & Diehr, P. (2003). Imputation of missing longitudinal data: a comparison of methods. Journal of clinical epidemiology, 56(10), 968-976.
  9. Flores, A., Tito, H., & Centty, D. (2019). Model for time series imputation based on average of historical vectors, fitting and smoothing. IJACSA) International Journal of Advanced Computer Science and Applications, 10(10), 346-352.
  10. Flores, A., Tito, H., & Silva, C. (2019). Local average of nearest neighbors: Univariate time series imputation. International Journal of Advanced Computer Science and Applications, 10(8), 45-50.
  11. Ghazali, S. M., Shaadan, N., & Idrus, Z. (2020). Missing data exploration in air quality data set using R-package data visualisation tools. Bulletin of Electrical Engineering and Informatics, 9(2), 755-763.
  12. Hadeed, S. J., O'Rourke, M. K., Burgess, J. L., Harris, R. B., & Canales, R. A. (2020). Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of the Total Environment, 730, 139-140.
  13. Hamami, F., & Dahlan, I. A. (2020). Univariate Time Series Data Forecasting of Air Pollution using LSTM Neural Network. 2020 International Conference on Advancement in Data Science, E-learning and Information Systems (ICADEIS).
  14. Junger, W., & De Leon, A. P. (2015). Imputation of missing data in time series for air pollutants. Atmospheric Environment, 102, 96-104.
  15. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895-2907.
  16. Kowarik, A., & Templ, M. (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16.
  17. Liu, X., Wang, X., Zou, L., Xia, J., & Pang, W. (2020). Spatial imputation for air pollutants data sets via low rank matrix completion algorithm. Environment international, 139, 105713.
  18. Mishchuk, O., Tkachenko, R., & Izonin, I. (2019). Missing data imputation through SGTM neural-like structure for environmental monitoring tasks. International Conference on Computer Science, Engineering and Education Applications.
  19. Noor, N. M., Yahaya, A. S., Ramli, N. A., & Al Bakri Abdullah, M. M. (2015). Filling the missing data of air pollutant concentration using single imputation methods. In Applied Mechanics and Materials(Vol. 754, pp. 923-932). Trans Tech Publications Ltd.
  20. Plaia, A., & Bondi, A. (2006). Single imputation method of missing values in environmental pollution data sets. Atmospheric Environment, 40(38), 7316-7330.
  21. Seinfeld, J. H., & Pandis, S. N. (2016). Atmospheric chemistry and physics: from air pollution to climate change. John Wiley & Sons.
  22. Shaadan, N., & Rahim, N. (2019). Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods. Journal of Physics: Conference Series,
  23. Shahbazi, H., Karimi, S., Hosseini, V., Yazgi, D., & Torbatian, S. (2018). A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models. Atmospheric Environment, 187, 24-33.
  24. Tito, H., Flores, A., & Silva, C. (2019). Local average of nearest neighbors: univariate time series imputation. International Journal of Advanced Computer Science and Applications, 10(8), 45-50.
  25. Tran, B. N. (2018). Evolutionary computation for feature manipulation in classification on high-dimensional data.Victoria University of Wellington.
  26. Tran, C. T., Zhang, M., Andreae, P., & Xue, B. (2017, July). Multiple imputation and genetic programming for classification with incomplete data. In Proceedings of the Genetic and Evolutionary Computation Conference(pp. 521-528).
  27. Yicun, G., Mohammad Khorshiddoust, A., Mohammadi, G. H., Hoseini Sadr, A., & Aghlmand, F. (2020). The relationship between PM2. 5 concentrations and atmospheric conditions in severe and persistent urban pollution in Tabriz, northwest of Iran. Arabian Journal of Geosciences, 13(5), 1-12.
  28. Yuan, H., Xu, G., Yao, Z., Jia, J., & Zhang, Y. (2018, October). Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers(pp. 1293-1300).
  29. Zeileis, A., & Grothendieck, G. (2005). Zoo: S3 infrastructure for regular and irregular time series. ArXiv preprint math/0505527.