-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/comp lags feat order #2272
Fix/comp lags feat order #2272
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey 👋
sorry I saw just later that you were just moving the create_multivariate_linear_timeseries
method in the other script.
I left some comment about this method.
Feel free to ignore if they don't have to do with the PR :-)
darts/tests/utils/tabularization/test_create_lagged_training_data.py
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2272 +/- ##
==========================================
- Coverage 94.02% 94.01% -0.02%
==========================================
Files 138 138
Lines 14152 14177 +25
==========================================
+ Hits 13307 13328 +21
- Misses 845 849 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice @madtoinou, thanks for fixing this, and adding the missing tests 👍
I had some suggestions, mainly:
- Add changelog entry
- moving the auto regressive logic into tabularization, and handle the entire X creation in there
- slight refactoring of the tests to avoid duplicate code
@@ -775,10 +784,31 @@ def create_lagged_component_names( | |||
|
|||
components = get_single_series(variate).components.tolist() | |||
if isinstance(variate_lags, dict): | |||
if "default_lags" in variate_lags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have access here to all input TimeSeries, can't we just construct the full dict for the user?
Not sure if it's required, probably handled by the RegressionModel, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the RegressionModel
already handles it (_generate_lags()
), I wanted to avoid code duplication (and circular imports) while preventing unexpected behavior if a user calls this method with parameters not directly extracted from a model.
darts/utils/data/tabularization.py
Outdated
@@ -940,18 +974,38 @@ def _create_lagged_data_by_moving_window( | |||
# `t + lag_i` within this window is, therefore, `-1 + lag_i + min_lag_i`: | |||
if isinstance(lags_i, list): | |||
lags_to_extract = np.array(lags_i, dtype=int) + min_lag_i - 1 | |||
# Feats are already grouped by lags and ordered | |||
reordered_feats = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of generating this array for every series, you can also make use of slicing.
For example in this case, the final lags_feats_order
can be slice(None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in order to do so, would I need to make sure that none of the lags are ordered by components? Not sure to understand how I can use slice()
here.
# Check that outputs match: | ||
assert np.allclose(expected_X, X[:, :, 0]) | ||
assert np.allclose(expected_y, y[:, :, 0]) | ||
assert feats_times.equals(times[0]) | ||
|
||
# passing lags a dictionary | ||
lags_as_dict = self.convert_lags_to_dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in most of the tests, this code is duplicated from the non-dict lags case. We could add the logic as local function inside each test (including lagged feature extraction and checks), and then just that call it once with non-dict lags and once with dict lags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, let me know what you think
…/darts into fix/comp_lags_feat_order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great now, thanks a lot for fixing this @madtoinou 🚀
Summary
Order of the extracted features are not consistent when lags were defined component-wise compared to the shared lags approach.
Other Information
create_lagged_component_names()