-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add 'split.data.by.bins.vector' and fix miscellaneous bugs in splitting
Modify 'split.data.time.based' to be able to split by activity-based bins. Rename the function to 'split.data.by.time.or.bins'. Introduce wrapper functions 'split.data.by.bins.vector' and 'split.data.time.based' to call 'split.data.by.time.or.bins'. Add 'include.duplicate.ids' parameter in 'split.get.bins.activity.based' to obtain bins covering all data elements from 'df' by which the split is being performed, regardless of the elements ids uniqueness. In 'split.data.activity.based', after calculating the bins to place data elements into, replace the time-based splitting by 'split.data.by.bins.vector'. Time-based splitting is incorrect for the case that the date of the last element in a bin is the same as the date of the first element of the next bin. Adjust calculation of 'offset.end' in 'split.data.activity.based' to fix a bug where because of a short last window the end offset would cross the border of the last window, overlapping into the second last. Because of this overlap the last sliding windows would not be calculated as expected. This works towards #239. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
- Loading branch information
Showing
1 changed file
with
107 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters