Assessing the accuracy of MLR, PCR, ARIMA, and MLP in predicting the aerosols optical depth

Document Type : Research Paper

Authors

1 Department of Environment Science, Faculty of Natural Resources, University of Kurdistan, Iran

2 Department of Environmental Sciences, Faculty of Natural Resources, University of Kurdistan, Iran

3 Department of Climatology, Faculty of Natural Resources, University of Kurdistan, Iran

Abstract

Introduction:
Atmospheric aerosols have different sources that we can refer to volcanic activities, dust, salt particles in the seas and oceans, or they due to human activities that we can refer to activities that such as industrial activities, transportation, fuel costs and … . aerosols have very important role in transitive radiation and chemical process that they are the earth’s climate controller. Among the internationally-conducted works in this area can refer to the Olcese et al., 2015 which have been done based on the use of the artificial neural network model (MLP). They used previous values of the AOD at two stations as input of artificial neural network model to estimate the AOD under cloudy conditions and in situations where little data is available. This method was used to predict the values of AOD on nine stations with 440nm wavelengths on the east coast of the United States during the 1999 to 2012. The calculated R2=0.85 between the observed and predicted AOD indicate a good performance of this model. To date, there is no research to estimate AOD by using different models such as Multiple linear Regression, Principal Component Regression, Artificial Neural Networks and Autoregressive integrated moving average model in Iran. Therefor in this research estimation AOD examined in two cases including estimate for areas with no Pyranometer stations and long- lasting estimation in stations with solar radiation detector for the future under.
Material and methods:
In this study related data to Pyranometer were collected for understudied are though the Meteorology office in center of Kurdistan province ranged 2005/01/01 until 2016/12/31. Thus, the total number of available data for the mentioned time period was 4382 data in the study area, and since there was no solar radiation for some days of the year, the total number of data used for Sanandaj city was reduced to 3956.
Study area:
Sanandaj is the capital of Kurdistan province. About geographic location this city is located in within limits 35 degree and 20 minutes north latitude and 47 degree east longitude from Greenwich Hour circle and in the 1373/4 meters height above sea level.
Multiple linear Regression Model:
In the Multiple linear Regression turn to check the relation between a dependent variable and several independent by earned relationship for them in the SPSS software, in the Multiple linear Regression the measure of AOD serve as dependent variable and meteorology numeral quantity such as temperature, relative humidity, wind speed and also altitude atmosphere were considered as independent variable. The general formula for the MLR model is as follows:
Y=β_0+β_1 x_1+⋯+β_n x_n+ε
In this case, y is dependent variable. X1, ..., Xn denote the independent variables, and also nβ0, ..., β report the fixed constants. Ԑ also indicates the remaining values.
Principal Component Regression Model:
Principal Component Regression Model is a combination of Principal Component Analysis (PCA) and Multiple Linear Regression (MLR). These calculations are as follows:
Y=φβ_PCR+e
Where φ is the matrix of base components, which is obtained as n * k, and βPCR represents the first of the components of the K score. The vector of e is a random error which defined as n٭1.Mark and scores for the components are based on the original version of the OLS method as follows:
β_PCR=(φ^' 〖φ)〗^(-1) φ^' y=(L^2 )^(-1) φ^' y
In this case, L2 is the amount of slice of the matrix, which is based on the Kth parameter, which also indicates the slip of the parameter k⅄. Finally, the following equation was reached.

β_PCR=∑_(K=1)^K▒(υ_k u_k^')/d_k y, K<min⁡(n,p)

in this model primary variable changed to new components and Independent from each, that both of the two components have Zero correlation coefficient, finally these used as primary variables.
Autoregressive Integrated Moving Average Model:
Autoregressive integrated moving average model is one of the important method in anticipation time series which presented by Box and Jenkins in 1970. ARIMA model is a Data- driven model, it means the mentioned model use of the structure of data and this model facet. Limitation if data have any meaningful nonlinearities relationships. ARIMA model is able in this way present the forecasts related to the time series. This model is a forecasting method with Statistical theory and because of having advantages such as high attention and strong adaptability ability is able to have a good usage in many bases.
Artificial Neural Networks Model:
Multilayer perceptron (MLP) is the most well know and mostly the most used among different kind of neural networks and in most cases act as signals that transfer input to output in the network. In these kind of multilayer network layers are joined as outputs of first layer act as second layer inputs, and output from second layer are the third layer inputs and it will be continued till last layer output, that they are the main outputs and the certain and real answer.
Discussion of Results & Conclusions
The first model, Multiple linear Regression according to the made result for this model, the measure of the AOD in understudied city has a direct connection with temperature and wind speed parameters out the level 850 hectopascal, but also this have an opposite connection with relative moisture and atmospheric layer altitude also the measure of got determination factor by this model allocated itself less numerical value and it is used because of linear structure in the data. The equation presented for it is as follows:
AOD=458/0+039/0T_850-127/0〖RH〗_850+021/0〖Speed〗_850-064/0 BLH
The R^2=0.071 , RMSE=0.1698 and MAE=0.1498 were obtained for training phase and R^2=0.096 , RMSE=0.1703 and MAE=0.1494 were acquired for testing phase. The results of the training and testing phases of the MLR model indicate the low accuracy of this model in predicting the AOD in Sanandaj city. The second used model in this research was Principal Component Regression model. In this model AOD have direct connection with temperature and wind speed but it has a negative connection with the other parameters such as relative moisture and atmospheric layer altitude. The extracted equation for PCR model as follows:
AOD=457/0+041/0T_850-126/0〖RH〗_850+021/0〖Speed〗_850-065/0BLH
In this section, the R^2=0.071 , RMSE=0.1699 and MAE=0.15 were obtained for training phase and R^2=0.069 , RMSE=0.1694 and MAE=0.1484 were acquired for testing phase. According to the result, got out puts by MLR and PCR models have a close result to estimate the AOD for stations with no Pyranometer. Autoregressive Integrated Moving Average Model was the third used model. This model had the best function to estimate AOD in the station with no Pyranometer. The obtained equation for ARIMA model as follows:
AOD=0061/0+7084/0y_(t-1)+0572/0 y_(t-2)+2189/0y_(t-3)
In this section, the R^2=0.91 , RMSE=0.0501 and MAE=0.033 were obtained for training phase and R^2=0.89 , RMSE=0.086 and MAE=0.0374 were acquired for testing phase. Artificial Neural Networks model was the fourth used model. In the research two hidden layers were used in this model. The number of optimized neurons for the understudied area was different with available data. The number of optimized neurons determined for Sanandaj city were 24 and 33 neurons to estimate the AOD in the long time (a year) in the station with no Pyranometer. In this section, the R^2=0.75 , RMSE=0.1162 and MAE=0.0921 were obtained for training phase and R^2=0.63 , RMSE=0.14 and MAE=0.113 were acquired for testing phase. It can be concluded that for estimate AOD in the area with Pyranometer instrument is better using the autoregressive stage instead follows the training and testing phases of the different models. Because, as it has been showed, the data required for the autoregressive stage is only the data of the AOD at the station. In general, the results of this research showed that use of different and efficient models can be a suitable solution for estimating AOD for regions with Pyranometer, as well as the area without a Pyranometer.

Keywords

Main Subjects