Optimization of Meteorological Variables to Predict Air Pollutant Concentrations for Use in Artificial Neural Network Model to Reduce the Cost and Time of Analysis

Document Type : Research Paper


1 Department of Environment, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran

2 Department of Climatology, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran


Today, air pollution is one of the main and most harmful problems in human societies, which has caused many environmental problems. Air quality is changing daily, even when the amount of pollutants entering the air is constant, factors that determine climate change, such as wind speed, wind direction, air mass thermal profile, amount of solar energy to perform photochemical reactions, wind duration or rainfall, alter air quality specifically. The air has a limited capacity and does not tolerate the discharge of various wastes and toxins that humans enter today. Multi- collinearity test was performed to remove additional input variables in SPSS software. The correlation between the independent and dependent variables is measured by two indicators of Variance Inflation Factor (VIF) and tolerance. The Variance Inflation Factor (VIF) indicates how much the variance of the estimated regression coefficients has increased since there are no correlated variables in the model. If the value of this index is close to one, there is no linearity. The relative tolerance factor is the relative scatter of a variable. If its value is close to one, it means that in an independent variable a small part of its scattering is justified by other independent variables, and if this value is close to zero, it means that one The variable is almost a linear combination of other independent variables. If the VIF values of the independent variables are more than 10 and tolerance is less than 0.2, then it can be said that the model is suffering from multi-collinearity. To quantify the severity of the multi- collinearity, tolerance and Variance Inflation Factor (VIF) results were used. The Forward Selection (FS) method is based on regression and was used to select the best subsets of input variables. The Forward Selection (FS) technique has been used by many researchers to build powerful predictive models. This method is based on the degree of dependence of the independent variables with the dependent. After that, the variable that creates the most dependence with the dependent variable is considered as the first input and the variables with less dependence constitute a set of subsequent inputs. This step is repeated n-1 to evaluate the effect of each of the variables on the model output, and a subset of the input variables is obtained to predict the outputs. The linear relation between variables created several models for each pollutant. Then the application of Multi-Layer Perceptron (MLP) network was used to predict pollutants in Matlab software. To reduce errors and increase accuracy in forecasting, both independent and dependent data were normalized between zero and one. Neural networks are nonlinear models that are widely used to identify systems, predict time periods, and pattern. These networks can be tools for the flexibility of nonlinear regressions, which are generally composed of one or more layers with different neurons. The structure of the neural network typically consists of three layers, the input layer that distributes the data in the network, the hidden layer that processes the data, and the output layer that extracts the results for specific inputs. For each pollutant, seven models were evaluated, but for O3 pollutant, eight models were calculated due to the effect of pollutant NO2. The results of multilayer perceptron neural network analysis show that, MODEL 1 with FB = 0.0170, IOA = 0.967, NMSE = 0.100 and the highest R2 = 0.7341 was suitable for same-day predicting of CO. To predict one-day advance of CO, MODEL 2 and MODEL 5 have the highest R2 values, but IOA statistics in MODEL 2 is more than MODEL 5, and the values of FB and NMSE in MODEL 2 is lower than that of MODEL 5, so MODEL 2 is more suitable for one-day advance of CO pollutants. In predicting the PM10 pollutant, the MODEL 4 has a maximum value of IOA= 0.960 and FB = 0.00151 with one of the lower value than other models in same day predicting of PM10, and the MODEL 4 has the lowest amount of NMSE = 0.487 and RMSE = 0.0718 in one-day advance predicting of PM10, so MODEL 4 is selected for prediction of PM10 pollutant as the optimal model. The prediction results of SO2 pollutant indicate that the MODEL 3 has the lowest FB = -0.00302 and NMSE = 0.135 and the highest IOA = 0.943 and R2 = 0.6118, respectively. Therefore, this MODEL is perfect for same-day prediction of SO2 concentration. Based on the result MODEL 6 with lowest values of NMSE = 0.105, FB = -0.0048, and the highest IOA = 0.972 is suitable MODEL for predicting one-day advance of SO2. In predicting NO2 the MODEL 2 and MODEL 3 represents the highest performance compared to other models in same-day and one-day advance in prediction of NO2 pollutant. Comparing the two models mentioned for NO2 shows, that both models have the same conditions in minimum and maximum values of the statistics, so considering the RMSE of the test phase, which is less in model 2 than model 6, indicate that model 2 it is a more appropriate model in predicting NO2. The prediction results of the O3 pollutant indicate that the MODEL 7 in the same day forecasting and the MODEL 5 in one-day forecasting in terms of the IOA index have the same value, and the indexes NMSE = 0.00120 and FB = 0.00137 in the MODEL 5 have the minimum values and in model 7, the value of R2 = 0.711 is highest, so the input composition of MODEL 5 is considered as the optimal model. The results of this study showed that for the prediction of any pollutants, no need to use all seven variables from the output of the multi- collinearity test. The optimal number of independent variables for the prediction of each pollutant was obtained differently. Therefore, we can conclude that the selection of effective independent variables by FS method will reduce the analysis cost and time, as well as increase the accuracy of the pollutant predictions.
Keywords: Multi-collinearity, Air pollution, FS technique, Multi-Layer Perceptron Network, Kermanshah


الماسی، ع.، مرادی، م.، شرفی، ک. و عباسی، ش. 1393. تغییرات فصلی کیفیت هوای کرمانشاه از نظر غلظت آلاینده PM10 در دوره 4 ساله (1387-90)، سلامت و بهداشت، 5(2): 149-158.
بخشی‌زاده، ف.، رضائیان، ه. و اکبری، م. 1394. مدل‌سازی مکانی-زمانی سه بعدی پراکنش آلایندة اکسیدهای ازت هوا ناشی از ترافیک در تقاطع خیابان ولی‌عصر-فاطمی شهر تهران، تحلیل فضایی مخاطرات محیطی، 2(1): 43-62.
جدی، ح.، عباسپور، ر.ع.، خالصیان، م. و علوی‌پناه، ک. 1396. پیشبینی غلظت آلاینده مونوکسیدکربن در کلانشهر تهران با استفاده از شبکه‌های عصبی مصنوعی، علوم و تکنولوژی محیط‌زیست، 19(5): 15-25.
رستمی فصیح، ز.، مصداقی‌نیا، ع.، ندافی، ک.، نبی‎زاده نودهی، ر.، محوی، ا.ح. و هادی، م. 1394. پیشبینی شاخص کیفیت هوا برمبنای متغیرهای هواشناسی و مؤلفه‌هایخودهمبسته با استفاده از شبکه عصبی مصنوعی، علوم پزشکی رازی، 22(137): 31-43.
رایگانی، ب. و خیراندیش، ز. 1396. بهره‌گیری از سری زمانی داده‌های ماهواره‌ای به‌منظور اعتبارسنجی کانون‌های شناسایی شده تولید گرد و غبار استان البرز، تحلیل فضایی مخاطرات محیطی، 4(4): 1-18.
رفیع‌پور گتابی، م.، آل شیخ، ع.ا.، علیمحمدی، ع. و صادقی نیارکی، ا. 1395. توسعة مدل پیش‌بینی غلظت ازن در هوا با استفاده از شبکه عصبی مصنوعی، محیط‌زیست طبیعی، منابع طبیعی ایران، 69(1): 47-60.
سلطانی‌گردفرامرزی، ط.، مفیدی، ع. و گندمکار، ا. 1394. بررسی همدیدی روزهای بسیار آلوده در شهر مشهد مورد مطالعه 13 و 14 نوامبر، تحلیل فضایی مخاطرات محیطی، 2(4): 95-112.
نصیری، ب.، زارعی چقابلکی، ز.، حلیمی، م. و رستمی فتح‌آبادی، م. 1395. بررسی تغییرات ارتفاع و ضخامت لایه مرزی در شرایط گردوغباری شهر اهواز، تحلیل فضایی مخاطرات محیطی، 3(2): 52-64.
Akbarzadeh, A., Vesali Naseh, M.R. and NodeFarahani, M. 2020. Carbon monoxide prediction in the atmosphere of Tehran using developed support vector machine. Pollution, 6(1): 43-57.
Alexandrov, V.D., Velikov, S.K., Donev, E.H. and Ivanov, D.M. 2005. Quantifying nonlinearities in ground level ozone behavior at mountain-valley station at ovnarsko, bulgaria by using neural networksa. Bulgarian Geophysical, 31: 1-4.
Alves, L., Sperandio Nascimento, E.G. and Moreira, D.M. 2019. Hourly tropospheric ozone concentration forecasting using deep learning.  WIT Transactions on Ecology and the Environment, 236: 129-138.
Azid, A., Juahir, H., Latif, M.T., Zain, S.M. and Osman, M.R. 2013. Feed-forward artificial neural network model for air pollutant index prediction in the southern region of peninsular malaysia. J.Environmental Protection 4(12):1-10.   
Balram, D., Lian, K.Y. and Sebastian, N. 2019. Air quality warning system based on a localized PM2.5 soft sensor using a novel approach of Bayesian regularized neural network via forward feature selection. Ecotoxicology and Environmental Safety, 182(30): 1-9.
Cabaneros, S.M., Hughes, B.R. and Calautit, J.K. 2017. Hybrid artificial neural network models for effective prediction and mitigation of urban roadside NO2 pollution. Energy Procedia, 142: 3524-3530.
Chen, G. 2008. Encyclopedia of statistics in quality and reliability. John Wiley and Sons Ltd pp. 1800.
Chen, S.X., Hong, X., Harris, C.J. and Sharkey, P.M. 2004. Sparse modeling using orthogonal forward regression with PRESS statistic and regularization. IEEE Transactions on Systems Man and Cybernetics Part B, 34(2): 898-911.
Cheng, S.Y., Li, L., Chen, D.S. and Li, J.B. 2012. A neural network based ensemble approach for improving the accuracy of meteorological fields used for regional air quality modeling. Environmental Management, 112: 404–414.
Cogliani, E. 2001. Air pollution forecast in cities by an air pollution index highly correlated with meteorological variables. Atmospheric Environment, 35(16): 2871-2877.
Coman, A., Ionescu, A. and Candau, Y. 2008. Hourly ozone prediction for a 24-h horizon using neural networks. Environmental Modelling and Software, 23(12): 1407–1421.
Dirk, V.P. and Bart. L. 2004. Customer attribution analysis for financial services using proportional hard models. Operational Research, 157(1):196 -277.
Eksioglu, B., Demirer, R. and Capar, I. 2005. Subset selection in multiple linear regression: a new mathematical programming approach. Computers and Industrial Engineering, 49(1): 155 -167.
Famili, A., Shen, W.M., Weber, R. and Simoudis, E. 1997. Data preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1-4): 3–23.
Gardner, M.W. and Dorling, S.R. 1999. Neural network modeling and prediction of hourly NOx and NO2 concentrations in urban air in London. Atmospheric Environment, 33(5): 709–719.
Guajardo, J., Weber, R. and Miranda, J. 2006. A forecating methodology using support vector regression and dynamic feature selection. Information & Knowledge Management, 5(4): 329-335.
Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. Machine Learning Research, 3: 1157–1182.
Hrust, L., Klaic, Z.B., Krizan, J., Antonic, O. and Hercog, P. 2009. Neural network forecasting of air pollutants hourly concentrations using optimised temporal averages of meteorological variables and pollutant concentrations. Atmospheric Environment, 43(35): 5588–5596.
Khan, J.A., Aelst, S.V. and Zamar. R.H. 2007. Building a robust linear model with forward selection and stepwise procedures. Computational Statistics and Data Analysis, 52(1): 239-248.
Kolehmainen, M., Martikainen, H. and Ruuskanen. J. 2001. Neural networks and periodic components used in air quality forecasting. Atmospheric Environment, 35(5): 815–825.
Kurt, A. and Oktay, A.B. 2010. Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Systems with Applications, 37(12): 7986–7992.
Pastor Barsenas, B., Soria ivas, E. and Martın-Guerrero, J.D. 2005. Unbiased sensitivity analysis and pruning techniques in neural networks for surface ozone modeling. Ecological Modelling, 182(2): 149–158.
Perez, P. 2012. Combined model for PM10 forecasting in a large city. Atmospheric Environment, 60: 271–276.
Prasad, K., Gorai, A.k. and Goyal, P. 2016. Developmen to ANFIS models for air quality forecasting and input optimization for reducing the computational cost and time. Atmospheric environment, 128: 246-262.
Rakotomamonjy, A. 2002. Variable selection using SVM based criteria. Machine Learning Research, 3: 1357–1370.
Sharifi, K., Khosravi, T., Moradi, M. and Pirsaheb, M. 2015. Air quality and variations in PM10 pollutant concentration in western Iran during a four-year period (2008-2011), Kermanshah- a case study. Engineering Science and Technology, 10(1): 47-56.
Stamenkovic, L.J., Antanasijevic, D.Z., Ristic, M.D., Peric Grujic, A.A. and Pocajt, V.V. 2016. Prediction of nitrogen oxides emissions at the national level based on optimized artificial neural network model. Air Quality Atmosphere & Health, 10:15-23.
Unnikrishnan, R. and Madhu, G. 2019. Comparative study on the efects of meteorological and pollutant
parameters on ANN modelling for prediction of SO2. SN Applied Sciences, 1: 1-12.
Wang, X.X., Chen, S., Lowe, D. and Harris, C.J. 2006. Sparse support vector regression based on orthogonal forward selection for the generalized kernel model. Neurocomputing, 70(1-3): 462 -474.
Zinatizadeh, A.A., Zinadini, S., Pirsaheb, M., Atafar, Z., Kurdian, A.R., Dezfoulinejad, A. and Yavari, F. 2014. Dust level forecasting and its interaction with gaseous pollutants using artificial neural network: A case study for kermanshah, Iran. Energy and Environment, 5(1): 51-58.
Zhao, C. 2016. Air quality forecasting using neural networks, master’s thesis, Supervisor: Prof. Juha Karhunen, Department of Computer Science, Aalto University.
Zhu, Y.M., Lu, X.X. and Zhou, Y. 2007. Suspended sediment flux modeling with artificial neural network: An example of the long Chuan Jiang River in the Upper Yangtze Catchment China. Geomorphology, 84(1): 111-125.