Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140

Closed
JahangirVajedsamiei opened this issue May 8, 2019 · 13 comments · Fixed by #166
Closed

Comments

@JahangirVajedsamiei
Copy link

Description

predict_in_sample of auto_arima produces fitted-values fluctuating around zero, does not follow real data pattern (see the blue line in actual results)! The expected result is made by sm.ARIMA using the same parameters as the auto-arima.

Steps/Code to Reproduce

model_auto = pm.auto_arima(array, start_p=0, start_q=0, max_p=10, max_q=10, max_d=3, error_action="ignore", seasonal=False, D=None, trace=True, stepwise=True, enforce_stationarity=False, enforce_invertibility=False, maxiter=5000)
model_auto.summary()
model_auto.fit(array)
preds = model_auto.predict_in_sample(array)
plt.plot(preds, color='blue')

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(array, order=(1,1,2)).fit(disp=0)
predict = model.predict(typ='levels')
plt.plot(array, color='lightblue')
plt.plot(predict,color='green')

-->

Expected Results

Screen Shot 2019-05-08 at 5 19 07 PM

Actual Results

Screen Shot 2019-05-08 at 5 15 20 PM

Versions

@tgsmith61591
Copy link
Member

tgsmith61591 commented May 8, 2019

My guess is #138 is going to address this. We weren't un-integrating the predictions back into the original endogenous space.

In particular, this fix

@JahangirVajedsamiei
Copy link
Author

Thank you, for the fast reply! I look forward to the fix.
best,

@tgsmith61591
Copy link
Member

@JahangirVajedsamiei could you provide a reproducible data example so I can start addressing this?

@JahangirVajedsamiei
Copy link
Author

@JahangirVajedsamiei could you provide a reproducible data example so I can start addressing this?

Here is an exemplary dataframe in excel. It is the respiration rate of a marine mussel over time.
Thank you in advance!
array.xlsx

@fmv1992
Copy link

fmv1992 commented May 16, 2019

I'm also affected by the .predict_in_sample returns results around 1e-10 values.

@tgsmith61591
Copy link
Member

See above. #138 should hopefully address this in the next release.

tgsmith61591 added a commit that referenced this issue Jul 12, 2019
tgsmith61591 added a commit that referenced this issue Jul 14, 2019
@suprememingjie
Copy link

@tgsmith61591
Hi
Coould you tell me why my first in-sample prediction always starts with 0.

code below:
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05,dynamic=True)

I have tried both, results are same.

Many Thanks
Oli

@tgsmith61591
Copy link
Member

This was addressed a while ago. What version are you running?

@suprememingjie
Copy link

This was addressed a while ago. What version are you running?

pmdarima version 1.8.2
python 3.6.8

@tgsmith61591
Copy link
Member

Could you provide a minimally reproducible example including code and data?

@suprememingjie
Copy link

suprememingjie commented Sep 23, 2021

`
import numpy as np
import pandas as pd
from pmdarima import auto_arima
import pmdarima as pm
import matplotlib.pyplot as plt
monthly-beer-production-in-austr.csv

df = pd.read_csv('monthly-beer-production-in-austr.csv')
df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month")
input_df=df[:-12]
fc_df=df[-12:]

arima=auto_arima(input_df, seasonal=True, m=12,max_p=7, max_d=5,max_q=7, max_P=4, max_D=4,max_Q=4)
arima_fit=arima.fit(input_df.values)
fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
plt.plot(input_df)

hd_series = pd.Series(fitted, index=input_df.index)
hdlower_series = pd.Series(conf_int[:, 0], index=input_df.index)
hdupper_series = pd.Series(conf_int[:, 1], index=input_df.index)

plt.plot(hd_series,color='red')
plt.fill_between(hdlower_series.index,
hdlower_series,
hdupper_series,
color='k', alpha=.15)
plt.show()
`

@tgsmith61591
Copy link
Member

tgsmith61591 commented Sep 24, 2021

@suprememingjie your final model results in an ARIMA(order=(5, 1, 4), seasonal_order=(1, 0, 1, 12). Since there is a differencing term (d=1) it is impossible for us to predict a value for the first observation, since the model was fit over lag values. This produces some junky values for the first d indices when predicting in-sample. Rather than truncate and leave them out, they are returned as zeroes so the results match dimensionally with the ground truth. It may make more sense for us to return the first d samples as np.nan so it's clear what's happening... we'll think on that.

When I omit the first result, here's what I see:

In [21]: fitted, conf_int = arima_fit.predict_in_sample(return_conf_int=True, alpha=0.05)
    ...: plt.plot(input_df)
    ...:
    ...: hd_series = pd.Series(fitted[1:], index=input_df.index[1:])
    ...: hdlower_series = pd.Series(conf_int[1:, 0], index=input_df.index[1:])
    ...: hdupper_series = pd.Series(conf_int[1:, 1], index=input_df.index[1:])
    ...:
    ...: plt.plot(hd_series,color='red')
    ...: plt.fill_between(hdlower_series.index,
    ...: hdlower_series,
    ...: hdupper_series,
    ...: color='k', alpha=.15)
    ...: plt.show()

image

By the way, as a quick side note, the auto_arima function returns a fitted model, so this line in your code is unnecessary and can be removed:

# no need to fit this again:
arima_fit = arima.fit(input_df.values)

@suprememingjie
Copy link

@tgsmith61591 aha, I made sense the day before yesterday before saw your comment, and also I did the same as you told omitting the first observation which produce the same plot. Anyway, thanks for your kind help. Hope you , your team and your country are well.

Best wish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants