دانشگاه تهرانمحیط شناسی1025-862046120220715Optimization of Meteorological Variables to Predict Air Pollutant Concentrations for Use in Artificial Neural Network Model to Reduce the Cost and Time of Analysisبهینه سازی متغیرهای هواشناسی به منظور پیش بینی غلظت الاینده های هوا به منظور کاهش هزینه و زمان محاسبات در مدل شبکه عصبی مصنوعی1952167955910.22059/jes.2021.300440.1007998FAافسانهقاسمیگروه محیطزیست، دانشکدة منابع طبیعی، دانشگاه کردستان، سنندج، ایرانجمیلامان اللهیگروه محیطزیست، دانشکدة منابع طبیعی، دانشگاه کردستان، سنندج، ایرانمحمددارندگروه آب و هوا شناسی، دانشکده منابع طبیعی، دانشگاه کردستان، سنندج، ایرانJournal Article19700101Today, air pollution is one of the main and most harmful problems in human societies, which has caused many environmental problems. Air quality is changing daily, even when the amount of pollutants entering the air is constant, factors that determine climate change, such as wind speed, wind direction, air mass thermal profile, amount of solar energy to perform photochemical reactions, wind duration or rainfall, alter air quality specifically. The air has a limited capacity and does not tolerate the discharge of various wastes and toxins that humans enter today. Multi- collinearity test was performed to remove additional input variables in SPSS software. The correlation between the independent and dependent variables is measured by two indicators of Variance Inflation Factor (VIF) and tolerance. The Variance Inflation Factor (VIF) indicates how much the variance of the estimated regression coefficients has increased since there are no correlated variables in the model. If the value of this index is close to one, there is no linearity. The relative tolerance factor is the relative scatter of a variable. If its value is close to one, it means that in an independent variable a small part of its scattering is justified by other independent variables, and if this value is close to zero, it means that one The variable is almost a linear combination of other independent variables. If the VIF values of the independent variables are more than 10 and tolerance is less than 0.2, then it can be said that the model is suffering from multi-collinearity. To quantify the severity of the multi- collinearity, tolerance and Variance Inflation Factor (VIF) results were used. The Forward Selection (FS) method is based on regression and was used to select the best subsets of input variables. The Forward Selection (FS) technique has been used by many researchers to build powerful predictive models. This method is based on the degree of dependence of the independent variables with the dependent. After that, the variable that creates the most dependence with the dependent variable is considered as the first input and the variables with less dependence constitute a set of subsequent inputs. This step is repeated n-1 to evaluate the effect of each of the variables on the model output, and a subset of the input variables is obtained to predict the outputs. The linear relation between variables created several models for each pollutant. Then the application of Multi-Layer Perceptron (MLP) network was used to predict pollutants in Matlab software. To reduce errors and increase accuracy in forecasting, both independent and dependent data were normalized between zero and one. Neural networks are nonlinear models that are widely used to identify systems, predict time periods, and pattern. These networks can be tools for the flexibility of nonlinear regressions, which are generally composed of one or more layers with different neurons. The structure of the neural network typically consists of three layers, the input layer that distributes the data in the network, the hidden layer that processes the data, and the output layer that extracts the results for specific inputs. For each pollutant, seven models were evaluated, but for O3 pollutant, eight models were calculated due to the effect of pollutant NO2. The results of multilayer perceptron neural network analysis show that, MODEL 1 with FB = 0.0170, IOA = 0.967, NMSE = 0.100 and the highest R2 = 0.7341 was suitable for same-day predicting of CO. To predict one-day advance of CO, MODEL 2 and MODEL 5 have the highest R2 values, but IOA statistics in MODEL 2 is more than MODEL 5, and the values of FB and NMSE in MODEL 2 is lower than that of MODEL 5, so MODEL 2 is more suitable for one-day advance of CO pollutants. In predicting the PM10 pollutant, the MODEL 4 has a maximum value of IOA= 0.960 and FB = 0.00151 with one of the lower value than other models in same day predicting of PM10, and the MODEL 4 has the lowest amount of NMSE = 0.487 and RMSE = 0.0718 in one-day advance predicting of PM10, so MODEL 4 is selected for prediction of PM10 pollutant as the optimal model. The prediction results of SO2 pollutant indicate that the MODEL 3 has the lowest FB = -0.00302 and NMSE = 0.135 and the highest IOA = 0.943 and R2 = 0.6118, respectively. Therefore, this MODEL is perfect for same-day prediction of SO2 concentration. Based on the result MODEL 6 with lowest values of NMSE = 0.105, FB = -0.0048, and the highest IOA = 0.972 is suitable MODEL for predicting one-day advance of SO2. In predicting NO2 the MODEL 2 and MODEL 3 represents the highest performance compared to other models in same-day and one-day advance in prediction of NO2 pollutant. Comparing the two models mentioned for NO2 shows, that both models have the same conditions in minimum and maximum values of the statistics, so considering the RMSE of the test phase, which is less in model 2 than model 6, indicate that model 2 it is a more appropriate model in predicting NO2. The prediction results of the O3 pollutant indicate that the MODEL 7 in the same day forecasting and the MODEL 5 in one-day forecasting in terms of the IOA index have the same value, and the indexes NMSE = 0.00120 and FB = 0.00137 in the MODEL 5 have the minimum values and in model 7, the value of R2 = 0.711 is highest, so the input composition of MODEL 5 is considered as the optimal model. The results of this study showed that for the prediction of any pollutants, no need to use all seven variables from the output of the multi- collinearity test. The optimal number of independent variables for the prediction of each pollutant was obtained differently. Therefore, we can conclude that the selection of effective independent variables by FS method will reduce the analysis cost and time, as well as increase the accuracy of the pollutant predictions.<br />Keywords: Multi-collinearity, Air pollution, FS technique, Multi-Layer Perceptron Network, Kermanshah<br />ahشهر کرمانشاه به دلیل وجود صنایع، ترافیک و توفانهای گرد و غبار یکی از شهرهای آلوده کشور است. در این پژوهش پنج آلاینده PM10، CO، O3، NO2، SO2 با استفاده از شبکه عصبی پرسپترون چند لایه برای دو بازهی زمانی امروز و فردا پیشبینی شدند. دادههای مستقل شامل هفت کمیت هواشناسی دما، رطوبت نسبی، میزان دید، سرعت باد، نقطه شبنم، فشار، و بارش است. آزمون همخطی و تکنیک انتخاب پیشرو برای حذف متغیرهای ورودی اضافی و ایجاد زیر مجموعهای از متغیرهای اثر گذار در پیشبینی استفاده شد. مدل بهینه با استفاده از شاخصهایRMSE , ,NMSE IOA, R2 و FB برای هر آلاینده انتخاب گردید. نتایج نشان میدهد که مدل 2 با تعداد 6 کمیت مستقل برای پیشبینی غلظت آلاینده منوکسید کربن و دی اکسید نیتروژن مدلی بهینه است و برای پیشبینی آلاینده ازن مدل 5 با تعداد 3 کمیت ورودی مدل مطلوبی میباشد همچنین برای پیش بینی دی اکسید گوگرد مدل 6 با دو متغیر ورودی و برای پیشبینی ذرات معلق (PM10) مدل 4 با 4 متغیر ورودی مناسبترین مدل بودهاند. نتایج این پژوهش نشان میدهد که استفاده از تکنیک انتخاب پیشرو برای بهینه سازی تعداد متغیرها سبب افزایش دقت و کاهش هزینههای پیشبینی خواهد شد.https://jes.ut.ac.ir/article_79559_e1f7554c6f8801293559fdd0df0f2b8c.pdf