Autoregressive Models – Part I (Definitions 😞 and a Practical Example 😊!)
- dadoaentender
- Nov 5
- 4 min read

Definitions:
Autoregressive models use past values of a given variable to predict its future values. In other words, the "raw material" of an autoregressive model is a time series (a topic covered in...). (1st edition of “Data to Understand” ). In this type of model, the variable of interest is explained based on its own past values (hence the prefix “auto” in the term autoregressive). Thus, the goal is to estimate the future values of this variable based on patterns of temporal dependence present in the data.
Before we continue, it's worth explaining the concept of regression (after all, we're talking about autoregressive models ) . Regression is a statistical method used to numerically model the relationship between a dependent variable (what we want to predict) and one or more independent variables (the factors we believe influence that variable). In autoregressive models, the difference lies in the fact that the previous values of the variable itself are used as "explanatory factors." For example, when trying to predict gasoline or ethanol fuel consumption based on consumption values from previous months, we are using an autoregressive model.
Autoregressive models must evaluate the components of a time series. We discussed these components in the previous edition (have you subscribed to " Dado a Entender " yet? I think you should... 😉). One of the simplest models that meets this requirement is the... ETS ( Exponential Triple Smoothing ) , also known as the Holt-Winters model.
ETS is a model statistician / mathematician which combines three main components:
Error (E) : Measures the discrepancy between actual and predicted values. It can be additive (simple) or multiplicative (proportional).
Trend (T) : Captures the direction (growth or decline) of the data over time. It can also be additive or multiplicative.
Seasonality (S) : Reflects recurring patterns in data at regular intervals (e.g., monthly, quarterly). It can be additive, multiplicative, or nonexistent.
ETS adjusts these components to minimize the sum of squared errors (MSE) of the prediction, using optimization algorithms such as the gradient descent method. 😨 ? 😨?😨?
Okay... Let's take it easy! In future editions we'll talk more about MSE and gradient descent. For now, accept that MSE is a metric for evaluating model performance and that gradient descent is a method for adjusting the model, reducing the MSE. This optimization uses the most contiguous part of the time series (training data) from its chronological beginning. To assess whether the trained model handles "unprecedented" data well, it's important to reserve a portion of the end of the time series (not used in training) and check if the model's predictions satisfactorily approximate the real data. This step is called "model testing".
ETS is available for use in programming languages such as Python and R. In addition, ETS is also available in Excel, making it widely accessible, even for those without programming experience.
Practical Example:
I chose to present an example based on Excel's ETS to appeal to a wider audience. However, I strongly recommend using the programming languages mentioned earlier if the reader is interested in less "well-behaved" predictive time series modeling, using more sophisticated algorithms than ETS.
For the following example, I used the gasoline sales time series provided by ANP (National Agency of Petroleum, Natural Gas and Biofuels). This time series began in January 2012 and ended in October 2024 when I published this article. It shows the volume monthly (in m³) of gasoline sold by Brazilian state. In the example, I only used gasoline sales data from the state of São Paulo.
Figure 1 illustrates the use of ETS in Excel. Columns “A” and “B” of the spreadsheet contain, respectively, the month/year of each data point and the volume of gasoline sold in the corresponding period (data extracted from the ANP time series). To proceed with the parameterization, simply click on any non-empty cell in columns “A” or “B” and then choose Data -> Forecast Spreadsheet . Excel attempts to automatically identify seasonality, but it's also possible to define this information manually.
In the example, I defined the start of the forecast as November 2022 and the end as November 2024. Thus, the model was trained with data from January 2012 to October 2022 and tested with data from November 2022 to October 2024. During the parameterization stage, Excel displays a graph with a preview of the prediction (bold orange line with the prediction and simple orange lines indicating the confidence interval).

After clicking the "Create" button, Excel creates a data table with the time series data and the predictions made, including the confidence intervals (which become wider, indicating the possibility of larger errors for predictions further into the future). Additionally, a graph is generated with this information, shown in Figure 2.

Figure 3 presents the analysis of the model's performance when subjected to new data (test data). The table shows:
Gasoline sales (m³): actual values from the test section;
Forecast (Gasoline Sales (m³)): values predicted by the model;
Absolute Percentage Error: the percentage difference between the actual and predicted values.

The average absolute percentage error was 6.98% , indicating a fairly reasonable performance for a simple model like the ETS. Despite this, some observations are important:
The error was greater in certain months, such as May 2023 (12.35%) and April 2024 (11.98%). This may reflect real fluctuations or limitations of the model in capturing more complex patterns.
The well-behaved nature of the series (without major swings or disruptions) favored the performance of the ETS.
Conclusion:
In this article, I discussed the concepts and use of autoregressive models in time series forecasting. I also presented an example of applying the ETS autoregressive model to forecasts in a real time series.
Although ETS proved adequate in this example, it's important to remember that more sophisticated models may be needed for more complex or unstable time series. But that's a topic for another post 😉!
References :
Dritsaki, C., Niklis, D., and Stamatiou, P. Oil consumption forecasting using arima models: an empirical study for Greece. International Journal of Energy Economics and Policy, 11(4):214–224, 2021.
Hyndman, RJ and Athanasopoulos, G. Forecasting: Principles and practice. OTexts, 2018.
Barros, AC, Ferreira, Pedro Guilherme Costa and Mattos, DM d., Oliveira, IC d., and Duca, VE ld A. Time Series Analysis in R: An Introductory Course. FGV IBRE, 2018.
ANP. Sales of petroleum derivatives and biofuels. Technical report, https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/vendas-de-derivados-de-petroleo-e-biocombustiveis , 2024.



Comments