Time Series and Forecasting Intro#

OBJECTIVES

  • Know how to approach a time series forecasting problem.

  • Be able to make several baseline time series models.

  • Make exponential smoothing models including Holt-Winters.

import pandas_datareader as pdr
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 import pandas_datareader as pdr
      2 import matplotlib.pyplot as plt
      3 import pandas as pd

ModuleNotFoundError: No module named 'pandas_datareader'

Forest and Trees#

The data below contains information on individuals and if they have heart disease (AHD). Your objective is to set up a pipeline to one hot encode any categorical features and leave the numeric features alone. Fit a decision tree and random forest model on the data with all default settings, and explore the confusion matrices on the test data. What do you think the best evaluation metric for the setting is and which estimator did a better job?

heart = pd.read_csv('https://raw.githubusercontent.com/jfkoehler/nyu_bootcamp_fa24/refs/heads/main/data/Heart.csv', index_col = 0)
heart.head()
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.pipeline import Pipeline
from sklearn import set_config
set_config(transform_output="pandas")
X = heart.drop('AHD', axis = 1)
y = heart['AHD']
cat_cols = X.select_dtypes(include = 'object').columns.tolist()
transformer = make_column_transformer(
    (OneHotEncoder(sparse_output=False), cat_cols),
    remainder = 'passthrough'
)
tree_pipe = Pipeline([
    ('transformer', transformer),
    ('tree', DecisionTreeClassifier())
])
forest_pipe = Pipeline([
    ('transformer', transformer),
    ('forest', RandomForestClassifier())
])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 11)
tree_pipe.fit(X_train, y_train)
forest_pipe.fit(X_train, y_train)
fig, ax = plt.subplots(1, 2, figsize = (20, 5))
ConfusionMatrixDisplay.from_estimator(tree_pipe, X_test, y_test, ax = ax[0])
ConfusionMatrixDisplay.from_estimator(forest_pipe, X_test, y_test, ax = ax[1])
ax[0].set

GridSearchCV#

While we have models working, there are many hyperparameters that could be altered. To experiment with different hyperparameter settings we can use GridSearchCV to compare the resulting performance.

X = heart[['Age', "Sex", "RestBP", "Chol"]]
y = heart['AHD']
tree = DecisionTreeClassifier(  )
params = {'max_depth': [1, 2, 3, 4, 5]}
from sklearn.model_selection import GridSearchCV
#instantiate grid search object
tree_grid = GridSearchCV(tree, param_grid = params)
#fit on X and y
tree_grid.fit(X, y)
#score the model
tree_grid.score(X, y)
#best parameters?
tree_grid.best_params_
#all results as DataFrame
pd.DataFrame(tree_grid.cv_results_)
y = y.map({'Yes': 1, 'No': 0})
#repeat with different scoring metric
tree_grid = GridSearchCV(tree, param_grid = params, scoring='recall')
tree_grid.fit(X, y)
#best paramaters
tree_grid.best_params_

Time Series Models#

housing = pd.read_csv('https://raw.githubusercontent.com/jfkoehler/nyu_bootcamp_fa24/refs/heads/main/data/housing_stock.csv')
housing.info()
#make sure datetime
housing.head(2)
#set date as index
housing['DATE'] = pd.to_datetime(housing['DATE'])
housing.set_index('DATE', inplace = True)
housing.head(2)

Exploratory Methods#

  • rolling

  • diff

  • shift

  • pct_change

housing.plot()
#smooth with mean
housing.rolling(12).mean().plot()
#difference
housing.diff(12).plot()
#shift
housing.shift(1)
#pct_change
housing.pct_change().plot()

Autocorrelation#

Correlation between \(x_t\) and \(x_{i + k}\).

# !pip install sktime[all_extras]
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(housing);
from sktime.transformations.series.acf import AutoCorrelationTransformer
from sktime.datasets import load_airline
y = load_airline()
transformer = AutoCorrelationTransformer(n_lags=36)
y_hat = transformer.fit_transform(y.values)
plt.plot(y_hat, 'o')
plt.vlines(range(len(y_hat)), 0, y_hat)

Common types of time series forecasting problems#

  • Forecast a single value of the time series. (use y historic values of y to predict y at one point in the future)

  • Forecast a whole range of future values. (use historic values of y to predict y at many points in the future)

  • Use other features to aid in your forecast. (predict many y in the future and use y and X)

  • Forecast multiple time series simultaneously

Before modeling#

  • Get the time period into datetime dtype format

  • put it in the index

  • sort the index

Modeling considerations#

  • Make sure you are clear what you are predicting (one value, a range of values, one-step-ahead that updates every time period?)

  • Make sure you aren’t leaking information from the future into your training data

  • Make null baseline models

Example: Store sales data from Walmart#

Let’s forecast one single value one time period into the future#

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Prepare data#

Read in the dataset, set the index as a datetime index, sort it, and take a look at it.#

data = pd.read_csv('https://raw.githubusercontent.com/jfkoehler/nyu_bootcamp_fa24/main/data/train.csv',
                   parse_dates=['Date'], index_col='Date')
data.head()
data.index
data.sort_index(inplace=True)
data.head(2)

Rename sales column#

data.rename(columns={'Weekly_Sales': 'sales'}, inplace=True)
data.head(2)

Filter the DataFrame to Store 1 weekly sales only#

store_1_data = data[data['Store'] == 1].copy()
store_1_data.head()
store_1_data.tail(3)

Aggregate all departments to compute the total sales per store. Keep only that column.

Each department is in its own row, so you can aggregate sales by date.

store_1_sales = store_1_data.groupby(store_1_data.index)[['sales']].sum()
store_1_sales.head(3)

Plot it#

store_1_sales.plot();

Split the dataset into train and test sets.#

Subset the data to train and test. The test set must come after the training set. Why?

Let’s make the two datasets manually#

from sktime.split import temporal_train_test_split
X_train, X_test = temporal_train_test_split(store_1_sales, test_size=12)
X_train.head()
X_train.tail(2)
X_test.head(2)
X_test.tail(2)
X_train.shape
X_test.shape

Plot the two DataFrames

X_train['sales'].plot(label='train')
X_test['sales'].plot(title='Weekly Sales for Store 1', label='test')
plt.legend()
plt.ylim(1.1E6, 2.5E6);

Model 1: Naive Forecast#

Take the last time period’s sales and estimate the same value for all future time periods. This method is a naive forecast.

\[{\Large \hat y_{t+1} = y_t}\]

Make a predictions DataFrame and set the predicted values equal to the last value in the Sales df.

from sktime.forecasting.naive import NaiveForecaster
X_train.tail(1)

Save our results in a df_predictions DataFrame.#

forecaster = NaiveForecaster()
forecaster.fit(X_train)
forecaster.predict(fh = [i for i in range(X_test.shape[0])])
df_predictions = X_test.copy()
df_predictions.head(3)

Take the last training dataset value and make that the guess for all future time periods.#

df_predictions['last'] = forecaster.predict(fh = [i for i in range(X_test.shape[0]+1)])
df_predictions.head()

Plot the historic data, the actual “future” data, and the predicted “future” data.#

df_predictions.tail()
plt.figure(figsize = (15, 5))

plt.plot(X_train['sales'], label='Train')
plt.plot(X_test['sales'], label='Test')
plt.plot(df_predictions['last'], label='Naive Last')

plt.legend()
plt.title('Naive Forecast');

Score our model#

On RMSE.

from sklearn.metrics import mean_squared_error
np.sqrt(mean_squared_error(df_predictions['sales'], df_predictions['last']))

That’s the most basic baseline.#

Model 2: Seasonality#

Forecast = 52 observations prior if you have weekly sales data and the seasonal pattern repeats every 52 weeks.

X_train.head()
X_train.tail()
df_predictions['seasonal'] = X_train.iloc[-12:].values
X_test.shape
X_train.iloc[-12:].shape
df_predictions.head()
plt.figure(figsize = (15, 5))

plt.plot(X_train['sales'], label='Train')
plt.plot(X_test['sales'], label='Test')
plt.plot(df_predictions['last'], label='Naive Last')
# plt.plot(df_predictions['simple_mean'], label='Historical Mean')
plt.plot(df_predictions['seasonal'], label='Seasonal')

plt.legend()
plt.title('Baseline Forecasts');
np.sqrt(mean_squared_error(df_predictions['sales'], df_predictions['seasonal']))

Model 3: Simple Exponential Smoothing#

Smoothing just means using a moving average. Read more on moving averages from Hyndman here.

Simple exponential smoothing uses weighted averages where the weights decrease exponentially for older data points.

\[{\hat y_{t+1} = \alpha y_t + \alpha (1-\alpha)y_{t-1} + \alpha(1-\alpha)^2 y_{t-2} + ...}\]

The one-step-ahead forecast for time t+1 is a weighted average of all of the observations in the series (y1,…,yt). The rate at which the weights decrease is controlled by the parameter, α (which is between 0 and 1 with larger values of α meaning faster decay).

Older values matter less.

If alpha =.5:

  • What is the weight of the most recent observation (t)?

  • What is the weight of (t-1)?

  • What is the weight of (t-2)?

Let’s use statsmodels for more advanced time series models to make life easier.#

from sktime.forecasting.exp_smoothing import ExponentialSmoothing
ses = ExponentialSmoothing()
ses.fit(X_train)
ses.predict(fh = [i for i in range(X_test.shape[0])])

Forecast with .forecast#

Here you can just pass an integer for the number of forecasts.

ses_forecast = ses.predict(fh = [i for i in range(X_test.shape[0])])
ses_forecast.head()
df_predictions['ses'] = ses_forecast
df_predictions.head(2)

Plot#

plt.figure(figsize = (15, 5))

plt.plot(X_train['sales'], label='Train')
plt.plot(X_test['sales'], label='Test')
plt.plot(df_predictions['last'], label='Naive Last')
plt.plot(df_predictions['seasonal'], label='Seasonal')
plt.plot(df_predictions['ses'], label='Simple Exponential Smoothing')

plt.legend()
plt.title('Baseline Forecasts');

You can play with the smoothing level, but SES forecasts the same number every time, so this isn’t going to be a winner.


Model 4: Holt-Winters’ (additive and multiplicative versions)#

  • Part of the exponential smoothing family of algorithms.

  • It’s also called Triple Exponential Smoothing

Holt-Winters’ can handle trend and seasonality. It weights recent values more than old ones. This is a very strong classical time series model. 🎉

Params to choose:#

  • number of seasonal_periods

  • seasonal period effects additive or multiplicative

  • additive or multiplicative trend

hw = ExponentialSmoothing(trend = 'add', seasonal = 'mul', sp = 52)
hw.fit(X_train)
hw.predict(fh = [i for i in range(X_test.shape[0])])
# hw.summary()
df_predictions['hw'] = hw.predict(fh = [i for i in range(X_test.shape[0]+1)])
df_predictions.head(3)

Plot#

plt.figure(figsize=(12, 8))

plt.plot(X_train['sales'], label='Train')
plt.plot(X_test['sales'], label='Test')
plt.plot(df_predictions['seasonal'], label='Seasonal')
plt.plot(df_predictions['hw'], label="Holt-Winters'")

plt.legend()
plt.title('Baseline Forecasts');

Evaluate#

mean_squared_error(df_predictions['sales'], df_predictions['hw'], squared=False)

Why did Holt-Winters’ work well?#

A time series dataset can be decomposed into its trend, seasonality, and residual components.

Let’s break down the components using statsmodels seasonal_decompose function.

from statsmodels.tsa.seasonal import seasonal_decompose
seasonal_decompose(store_1_sales, period=52).plot();

Holt-Winters’ uses exponential smoothing and takes into account the trend and seasonality to forecast future sales.

You have to choose the number of seasonal periods, and whether the trend and seasonal components are additive or multiplicative.


Seasonality#

Two primary kinds of seasonality:

Additive vs. Multiplicative Seasonality#

Seasonality can present itself in two ways:

  1. Additive: the peaks and valleys keep the same magnitude over time.

  2. Multiplicative: the peaks and valleys change in magnitude over time.

You can plot the data over time to see which kind of seasonality effects the data show:

Additive Seasonality#

adding.png

Multiplicative Seasonality#

multi.png

Alternatively, GridSearch it - or use a function that GridSearchs for you!

We’ll see some ways to deal with the trend in the next lesson. For now, just know that Hyndman says don’t use multiplicative for trend.

The Holt-Winters method takes into account both trend and seasonality to forecast future sales.

With this method, we will apply exponential smoothing to the seasonal components as well as the level and trend components.

Here are the constituent parts of the additive model:

Level Equation

$$L_t = \alpha (y_t - S_{t-s}) + (1 - \alpha)(L_{t-1} + b_{t-1})$$

Trend Equation

$$b_t = \beta *(L_t - L_{t-1}) + (1 - \beta)b_{t-1}$$

Seasonality Equation

$$S_t = \gamma(y_t-L_t)+(1-\gamma)S_{t-s}$$

Combine these into a single equation:

Forecast Equation

$${ F_{t+k} = L_t + kb_t+S_{t+k-s}}$$

(Where α, β, and γ are the smoothing parameters — each between 0 and 1 — and s is the length of the seasonal cycle.)

  • The trend equation captures the overall direction of sales.

  • The level equation is a weighted average between the seasonally adjusted observation and the non-seasonal forecast for time t.

  • The seasonal equation is a weighted average between the current seasonal index and the seasonal index of the same season s time periods ago.


Write out the train, test, and prediction DataFrames to pickle file format to use later#

X_train.to_pickle('./data/train.pkl')
X_test.to_pickle('./data/test.pkl')
df_predictions.to_pickle('./data/df_predictions.pkl')

If my observations are by the minute and there is a big spike in users once per week, what number should I use for seasonality?#

60*24*7

Exercise#

Below, data on shampoo sales is loaded and displayed. Your task is to build an Exponential Smoothing model to predict the last 6 months of sales. How does your model compare to a baseline? Plot the results.

from sktime.datasets import load_shampoo_sales
shampoo = load_shampoo_sales()
shampoo.info()
shampoo.plot(grid = True, title = 'Shampoo Sales');

Check out the great free ebook:#

Forecasting: Principles and Practice by Rob J Hyndman & George Athanasopoulos to learn more about time series. Hyndman is also behind a lot of the time series code and strategy in an R package that Python packages use.

Summary#

  • Always sort your time series data by date

  • Make sure you aren’t leaking information from the future into your training data

  • Make baseline models first

  • Consider the tradeoffs between complexity and performance when choosing which model to use

  • Holt-Winters’ Exponential Smoothing is really powerful and doesn’t have too many parameters to choose

  • Seasonality is how many observations it takes for a pattern to repeat