predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

JahangirVajedsamiei · 2019-05-08T15:29:51Z

Description

predict_in_sample of auto_arima produces fitted-values fluctuating around zero, does not follow real data pattern (see the blue line in actual results)! The expected result is made by sm.ARIMA using the same parameters as the auto-arima.

Steps/Code to Reproduce

model_auto = pm.auto_arima(array, start_p=0, start_q=0, max_p=10, max_q=10, max_d=3, error_action="ignore", seasonal=False, D=None, trace=True, stepwise=True, enforce_stationarity=False, enforce_invertibility=False, maxiter=5000)
model_auto.summary()
model_auto.fit(array)
preds = model_auto.predict_in_sample(array)
plt.plot(preds, color='blue')

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(array, order=(1,1,2)).fit(disp=0)
predict = model.predict(typ='levels')
plt.plot(array, color='lightblue')
plt.plot(predict,color='green')

-->

Expected Results

Actual Results

Versions

tgsmith61591 · 2019-05-08T15:44:53Z

My guess is #138 is going to address this. We weren't un-integrating the predictions back into the original endogenous space.

In particular, this fix

JahangirVajedsamiei · 2019-05-08T16:32:09Z

Thank you, for the fast reply! I look forward to the fix.
best,

tgsmith61591 · 2019-05-12T21:13:46Z

@JahangirVajedsamiei could you provide a reproducible data example so I can start addressing this?

JahangirVajedsamiei · 2019-05-12T22:48:13Z

@JahangirVajedsamiei could you provide a reproducible data example so I can start addressing this?

Here is an exemplary dataframe in excel. It is the respiration rate of a marine mussel over time.
Thank you in advance!
array.xlsx

fmv1992 · 2019-05-16T14:35:59Z

I'm also affected by the .predict_in_sample returns results around 1e-10 values.

tgsmith61591 · 2019-05-16T14:40:28Z

See above. #138 should hopefully address this in the next release.

[MRG+1] Fixes #140 (hopefully)

suprememingjie · 2021-09-22T09:22:41Z

@tgsmith61591
Hi
Coould you tell me why my first in-sample prediction always starts with 0.

code below:
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05,dynamic=True)

I have tried both, results are same.

Many Thanks
Oli

tgsmith61591 · 2021-09-22T18:36:34Z

This was addressed a while ago. What version are you running?

suprememingjie · 2021-09-23T01:44:20Z

This was addressed a while ago. What version are you running?

pmdarima version 1.8.2
python 3.6.8

tgsmith61591 · 2021-09-23T01:46:42Z

Could you provide a minimally reproducible example including code and data?

suprememingjie · 2021-09-23T02:08:19Z

`
import numpy as np
import pandas as pd
from pmdarima import auto_arima
import pmdarima as pm
import matplotlib.pyplot as plt
monthly-beer-production-in-austr.csv

df = pd.read_csv('monthly-beer-production-in-austr.csv')
df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month")
input_df=df[:-12]
fc_df=df[-12:]

arima=auto_arima(input_df, seasonal=True, m=12,max_p=7, max_d=5,max_q=7, max_P=4, max_D=4,max_Q=4)
arima_fit=arima.fit(input_df.values)
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
plt.plot(input_df)

hd_series = pd.Series(fitted, index=input_df.index)
hdlower_series = pd.Series(conf_int[:, 0], index=input_df.index)
hdupper_series = pd.Series(conf_int[:, 1], index=input_df.index)

plt.plot(hd_series,color='red')
plt.fill_between(hdlower_series.index,
hdlower_series,
hdupper_series,
color='k', alpha=.15)
plt.show()
`

tgsmith61591 · 2021-09-24T12:39:05Z

@suprememingjie your final model results in an ARIMA(order=(5, 1, 4), seasonal_order=(1, 0, 1, 12). Since there is a differencing term (d=1) it is impossible for us to predict a value for the first observation, since the model was fit over lag values. This produces some junky values for the first d indices when predicting in-sample. Rather than truncate and leave them out, they are returned as zeroes so the results match dimensionally with the ground truth. It may make more sense for us to return the first d samples as np.nan so it's clear what's happening... we'll think on that.

When I omit the first result, here's what I see:

In [21]: fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
    ...: plt.plot(input_df)
    ...:
    ...: hd_series = pd.Series(fitted[1:], index=input_df.index[1:])
    ...: hdlower_series = pd.Series(conf_int[1:, 0], index=input_df.index[1:])
    ...: hdupper_series = pd.Series(conf_int[1:, 1], index=input_df.index[1:])
    ...:
    ...: plt.plot(hd_series,color='red')
    ...: plt.fill_between(hdlower_series.index,
    ...: hdlower_series,
    ...: hdupper_series,
    ...: color='k', alpha=.15)
    ...: plt.show()

By the way, as a quick side note, the auto_arima function returns a fitted model, so this line in your code is unnecessary and can be removed:

# no need to fit this again:
arima_fit = arima.fit(input_df.values)

suprememingjie · 2021-09-26T03:07:34Z

@tgsmith61591 aha, I made sense the day before yesterday before saw your comment, and also I did the same as you told omitting the first observation which produce the same plot. Anyway, thanks for your kind help. Hope you , your team and your country are well.

Best wish

tgsmith61591 added the 🪲 : bug label May 9, 2019

tgsmith61591 mentioned this issue May 12, 2019

[MRG+1] Confidence intervals for in-sample predictions #138

Merged

tgsmith61591 added a commit that referenced this issue Jul 12, 2019

Fixes #140 (hopefully)

aa249bd

tgsmith61591 mentioned this issue Jul 12, 2019

[MRG+1] Fixes #140 (hopefully) #166

Merged

7 tasks

tgsmith61591 closed this as completed in #166 Jul 14, 2019

tgsmith61591 added a commit that referenced this issue Jul 14, 2019

Merge pull request #166 from tgsmith61591/more-issue-140

a2398fd

[MRG+1] Fixes #140 (hopefully)

theabc50111 mentioned this issue Dec 18, 2022

How to make predict_in_sample() derived from the first d term of original data when differencing order (d) isn't zero #533

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

JahangirVajedsamiei commented May 8, 2019

tgsmith61591 commented May 8, 2019 •

edited

Loading

JahangirVajedsamiei commented May 8, 2019

tgsmith61591 commented May 12, 2019

JahangirVajedsamiei commented May 12, 2019

fmv1992 commented May 16, 2019

tgsmith61591 commented May 16, 2019

suprememingjie commented Sep 22, 2021

tgsmith61591 commented Sep 22, 2021

suprememingjie commented Sep 23, 2021

tgsmith61591 commented Sep 23, 2021

suprememingjie commented Sep 23, 2021 •

edited

Loading

tgsmith61591 commented Sep 24, 2021 •

edited

Loading

suprememingjie commented Sep 26, 2021

predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

Comments

JahangirVajedsamiei commented May 8, 2019

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

tgsmith61591 commented May 8, 2019 • edited Loading

JahangirVajedsamiei commented May 8, 2019

tgsmith61591 commented May 12, 2019

JahangirVajedsamiei commented May 12, 2019

fmv1992 commented May 16, 2019

tgsmith61591 commented May 16, 2019

suprememingjie commented Sep 22, 2021

tgsmith61591 commented Sep 22, 2021

suprememingjie commented Sep 23, 2021

tgsmith61591 commented Sep 23, 2021

suprememingjie commented Sep 23, 2021 • edited Loading

tgsmith61591 commented Sep 24, 2021 • edited Loading

suprememingjie commented Sep 26, 2021

tgsmith61591 commented May 8, 2019 •

edited

Loading

suprememingjie commented Sep 23, 2021 •

edited

Loading

tgsmith61591 commented Sep 24, 2021 •

edited

Loading