Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

construct.ranges for cumulative ranges #265

Open
bockthom opened this issue Jul 24, 2024 · 4 comments
Open

construct.ranges for cumulative ranges #265

bockthom opened this issue Jul 24, 2024 · 4 comments

Comments

@bockthom
Copy link
Collaborator

bockthom commented Jul 24, 2024

Description

In coronet, we have a function construct.ranges that takes a list of revisions and creates range names out of it, as in the following example:

> bins = c("2020-01-01", "2020-04-01", "2020-07-01", "2020-10-01", "2020-12-31")
> construct.ranges(bins, sliding.window = FALSE)
[1] "2020-01-01-2020-04-01" "2020-04-01-2020-07-01" "2020-07-01-2020-10-01" "2020-10-01-2020-12-31"

This function is able to construct sliding-window ranges, but not to construct cumulative ranges.

We have a dedicated function construct.cumulative.ranges, but this function has a completely different interface (it takes a start date, an end date, and a time period), similar to construct.consecutive.ranges and construct.overlapping.ranges.
However, the function construct.ranges itself (which takes just a vector of dates) is not capable of constructing cumulative.

Therefore, I suggest to enhance the function construct.ranges by an additional parameter to construct cumulative ranges, or ––if adding a new parameter introduces more problems than benefits––also an additional function might be helpful - but then we have the problem of naming conflicts with the existing functions. So, I'd be glad if we find a suitable way to enhance the existing function construct.ranges.

Desired output for construct.ranges with cumulative ranges:

[1] "2020-01-01-2020-04-01" "2020-01-01-2020-07-01" "2020-01-01-2020-10-01" "2020-01-01-2020-12-31"

Motivation

Constructing ranges in a cumulative way is particularly useful when analyzing commit-interaction data, but also in many other use cases.
In general, enhancing the currently existing function would provide an easy way to construct range-data objects cumulatively by simply passing a list of fixed bins to the range-construction function, and passing the resulting ranges to split.data.time.based.by.ranges afterwards.

@bockthom bockthom added this to the v4.5 milestone Jul 24, 2024
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 3, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 3, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
@MaLoefUDS
Copy link
Contributor

MaLoefUDS commented Sep 3, 2024

I have done a prototype implementation here and some tests here. Especially, let me know whether sliding windows and cumulative ranges are mutually, I would need to slightly update my implementation in that case.

Edit: Also let me know if this addition fits for you with my currently open wish-wash PR or if we should wait for a new one.

@bockthom
Copy link
Collaborator Author

bockthom commented Sep 3, 2024

I have done a prototype implementation

The implementation looks good to me (except for two typos/inconsistencies).

and some tests

The structure of the tests looks good, but I did not have time yet to find out whether the behavior in the tests is correct or not.

Especially, let me know whether sliding windows and cumulative ranges are mutually

I've seen that you have already tests for the combination of cumulative ranges and sliding windows - but just from looking at the tests I cannot judge whether such a combination is useful or not. Could you please post a small example directly showing how the ranges look like in such a case?

Also let me know if this addition fits for you with my currently open wish-wash PR or if we should wait for a new one.

If the implementation stays as small as it is currently, I'd go for adding it to your "open wish-wash PR". But let's discuss this tomorrow.

@MaLoefUDS
Copy link
Contributor

Regarding sliding window ranges and our recent discussion. sliding.windows in construct.ranges is differs in some way from what we understand by sliding.windows in splitting.

  • In splitting, we somehow define regions to split our data into and if we specify sliding.windows, that means that we on top split the previously defined regions each into 2 halves and connect each second one.
  • In construct.ranges, when sliding.windows is specified, we assume that the input revisions are the result of splitting based on sliding.windows, and therefore, we do not split them in halves again, we just do the connecting every second one stuff.

Now regarding the cumulative ranges that means the following (example):

We want to split data into the following bins: 2016-01-01 - 2017-01-01, 2017-01-01 - 2018-01-01, and 2018-01-01 - 2019-01-01. We also specify sliding.windows = TRUE and therefore in the end receive network(-split)s that have the following bounds: 2016-01-01 - 2017-01-01, 2016-07-01 - 2017-07-01, 2017-01-01 - 2018-01-01, 2017-07-01 - 2018-07-01 and 2018-01-01 - 2019-01-01. (which is also exactly the output of construct.ranges(..., sliding.window = TRUE).).
When we construct ranges and specify to construct cumulative ranges, all resulting ranges start with the start of the earliest range, i.e., the resulting ranges would be 2016-01-01 - 2017-01-01, 2016-01-01 - 2017-07-01, 2016-01-01 - 2018-01-01, 2016-01-01 - 2018-07-01 and 2016-01-01 - 2019-01-01.

Taking everything into account, I think cumulative sliding-window ranges may be as useful as cumulative regular ranges, depending on the use case, but im not entirely sure ^^

@bockthom
Copy link
Collaborator Author

bockthom commented Sep 6, 2024

Ok, let's keep the case to construct cumulative ranges for sliding-window ranges.
(I don't think that this will be actually used; but, in general, the resulting ranges look reasonable).

MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 10, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 10, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 11, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 11, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 12, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Sep 12, 2024
Works towards se-sic#265.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants