Electric consumption using Time Series
In this project we will try to predict the electricity consumption of Turkey, the database provides the data every hour, during December 31, 2015 until September 2020. Methods such as Holt-Winters, Arima and Neural Networks will be used to find a prediction of electricity consumption. After performing different tests, we found that the ARIMA method is the one that performs better and gave us the best results. Based on this we make the predictions. The database was obtained from the site: https://www.kaggle.com/
The Data
First we load the data, initially the dates are in separated columns, so we group it to have it in a single column.
## 'data.frame': 41232 obs. of 2 variables:
## $ Date : POSIXct, format: "2015-12-31 00:00:00" "2015-12-31 01:00:00" ...
## $ Consumption..MWh.: num 29591 27785 26517 26092 25872 ...
A fundamental process in time series is to check that a date has empty or null data, so we have to check that all dates have a value.
## [1] "2016-03-27 03:00:00 CST"
We note that a date is missing, so we will put the average of the last 48 hours, because if we take the general average, it can be a no representative value in the analysis. Additionally, the time series has several values of 0, which may be due to general electrical failures or some error, for which these values take the same smoothing.
The calculation per hour requires a high computational consumption since we have more than 40 000 values, for ease of results we are going to group our analysis into a daily format. This is only for computational purposes, but eventually if more precision is required with the consumption per hour, it could be done.
Now let’s see graphically how the series behaves.
Before starting with the actual creation of the models, let’s see the assumptions, this is more for methods like ARIMA, which part of its assumptions require normality in the difference of the series.
pearson.test(diff(serie))$p.value
## [1] 5.219748e-171
lillie.test(diff(serie))$p.value
## [1] 6.392955e-75
Both at the visual test and at the statistical tests, we obtain that the hypothesis of normality is rejected, this could eventually affect possible results, but we continue to build models. Now we separate our time series into the different components that a series has, such as trend, seasonality and random movement. In the following graph, if our data and the random movement had the same behavior, this would give us an indication that the predictions will not be very precise.
It is clear seasonality, there is a tendency for consumption to go up for a certain time, then go down, then go up again but not at the same level as the first time and then this cycle is repeated.
Now let’s analyze the periodicity of our series.
## [1] 192.000000 6.995951 172.800000
The best period is adjusted to 192 days or 7 days, we will take the period of 7 days, this only for interpretation.
ARIMA
#auto.arima(serie.train)
The results using the R function auto arima:
#Series: serie.train
#ARIMA(5,1,2)
#Coefficients:
# ar1 ar2 ar3 ar4 ar5 ma1 ma2
# 0.0885 -0.6879 -0.2704 -0.2725 -0.5313 -0.3998 0.6290
#s.e. 0.0368 0.0220 0.0265 0.0208 0.0255 0.0474 0.0195
#sigma^2 = 1.741e+09: log likelihood = -20327.34
#AIC=40670.69 AICc=40670.77 BIC=40714.13
Now we apply a computational power calibration to find a better ARIMA model, this gives us the following result:
#calibrar.arima(serie.train, serie.test, periodo = 7)
#Call:
#arima(x = datos, order = c(3, 0, 0), seasonal = list(order = c(0, 1, 0), period = 7))
#Coefficients:
# ar1 ar2 ar3
# 1.0951 -0.2112 -0.1188
#s.e. 0.0243 0.0360 0.0245
#sigma^2 estimated as 997420726: log likelihood = -19789.96, aic = 39587.93
Holt-Winters
As with ARIMA, we apply the method provided by R and also apply computational power to find the best Holt-Winters model.
## Holt-Winters exponential smoothing without trend and without seasonal component.
##
## Call:
## HoltWinters(x = datos, alpha = 0.3, beta = FALSE, gamma = FALSE)
##
## Smoothing parameters:
## alpha: 0.3
## beta : FALSE
## gamma: FALSE
##
## Coefficients:
## [,1]
## a 910442.6
Neural Networks
We apply the forecast package provided by R, which uses the Nnetar function, additionally we indicate 5 nodes. The number of nodes can be modified, but after testing several nodes, we found 5 as the most optimal.
modelo.redes <- nnetar(serie.train, size = 5)
pred.redes <- predict(modelo.redes, h = 30, PI = T)
Relative Error table and graph
Let’s make a comparison with all the methods used, we want to reduce the MSE,the closest correlation to 1 and the lowest relative error.
## MSE RMSE RE CORR
## AUTO.ARIMA 1668421462 40846.32 0.03031724 0.7416341
## ARIMA FUERZA BRUTA 149502696 12227.13 0.01019876 0.9710208
## HOLT-WINTERS 4042762184 63582.72 0.05749640 0.1730538
## HW FUERZA BRUTA 2312567770 48089.16 0.03738585 NA
## REDES 6142210386 78372.26 0.07811688 0.7320864
There is no doubt that the model found by computational power by ARIMA is the one that performs better.
Prediction graph
There is an interesting behavior, only the ARIMA methods manage to capture the trend of going up and down perfectly. We see how the ARIMA method found in a traditional way underestimates these drops in consumption.
Conclusions
In this project we find interesting results using models related to time series. The most important is how by relying on computational power, you could improve the results found with the functions that R brings by default. Another important thing is to always use all possible models, because a priori we do not know which model is better. Theoretically we could think that one method is superior to others, but in practical examples this is not always the case and that’s why we must use everything possible and take advantage of computational power in our favor.