Skip to content

Commit

Permalink
Merge pull request #249 from MaLoefUDS/dev
Browse files Browse the repository at this point in the history
Miscellaneous improvements to splitting

Reviewed-by: Thomas Bock <bockthom@cs.uni-saarland.de>
  • Loading branch information
bockthom authored Feb 29, 2024
2 parents 0693d7a + d7c1de6 commit 56ff0b3
Show file tree
Hide file tree
Showing 13 changed files with 447 additions and 281 deletions.
11 changes: 7 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,31 @@

### Added

- Add issue-based artifact-networks (PR #244, 98a93ee721a293410623aafe46890cfba9d81e72, 771bcc8d961d419b53a1e891e9dc536371f1143b, 368e79264adf5a5358c04518c94ad2e1c13e212b)
- Add issue-based artifact-networks, in which issues form vertices connected by edges that represent issue references. If possible, disambiguate duplicate JIRA issue references that originate from [codeface-extraction](https://github.com/se-sic/codeface-extraction) (PR #244, PR #249, 98a93ee721a293410623aafe46890cfba9d81e72, 771bcc8d961d419b53a1e891e9dc536371f1143b, 368e79264adf5a5358c04518c94ad2e1c13e212b, fa3167c289c9785f3a5db03d9724848f1441a63d, 4646d581d5e1f63260692b396a8bd8f51b0da48fda, ed77bd726bf92e06c2fc9145a5847787a8d0588b)
- Add a new `split.data.by.bins` function (not to be confused with a previously existing function that had the same name and was renamed in this context), which splits data based on given activity-based bins (PR #244, ece569ceaf557bb38cd0cfad437b69b30fe8a698, ed5feb214a123b605c9513262f187cfd72b9e1f4)
- Add new `assert.sparse.matrices.equal` function to compare two sparse matrices for equality for testing purposes (PR #248, 9784cdf12d1497ee122e2ae73b768b8c334210d4, d9f1a8d90e00a634d7caeb5e7f8f262776496838)
- Add tests for file `util-networks.misc.R` for issue #242 (PR #248, f3202a6f96723d11c170346556d036cf087521c8, 030574b9d0f3435db4032d0e195a3d407fb7244b, 380b02234275127297fcd508772c69db21c216de, 8b803c50d60fc593e4e527a08fd4c2068d801a48, 7335c3dd4d0302b024a66d18701d9800ed3fe806, 6b600df04bec1fe70c272604f274ec5309840e65)
- Add tests for file `util-networks-misc.R` for issue #242 (PR #248, f3202a6f96723d11c170346556d036cf087521c8, 030574b9d0f3435db4032d0e195a3d407fb7244b, 380b02234275127297fcd508772c69db21c216de, 8b803c50d60fc593e4e527a08fd4c2068d801a48, 7335c3dd4d0302b024a66d18701d9800ed3fe806, 6b600df04bec1fe70c272604f274ec5309840e65)
- Add the possibility to simplify edges of multiple-relation networks into a single edge at all instead of a single edge per relation (PR #250, 2105ea89b5227e7c9fa78fea9de1977f2d9e8faa)
- Add `get.bin.dates.from.ranges` function to convert date ranges into bins format (PR #249, a1842e9be46596321ee86860fd87d17a3c88f50f, 858b1812ebfc3194cc6a03c99f3ee7d161d1ca15)

### Changed/Improved

- Enhance testing data by adding `add_link` and `referenced_by` issue events which connect issues to form edges in issue-based artifact-networks (PR #244, 9f840c040d552e8639aa82c3dd537c189679b348, ea4fe8d3c84f948af6147cf0137e80181ebb7a1e)
- Enhance testing data by adding `add_link` and `referenced_by` issue events, which connect issues to form edges in issue-based artifact-networks. This includes duplicate edge information in JIRA data as produced by [codeface-extraction](https://github.com/se-sic/codeface-extraction) (PR #244, 9f840c040d552e8639aa82c3dd537c189679b348, ea4fe8d3c84f948af6147cf0137e80181ebb7a1e, 6eb731102301b1af08f4affb40d1f8df94500e34)
- Add input validation for the `bins` parameter in `split.data.time.based` and `split.data.by.bins` (PR #244, ed0a5302ea8c8934d7200b95be7ac1446305af07, 5e5ecbac44d07927b953ae9d4330a616f8224ba7)
- Rename `split.data.by.bins` into `split.dataframe.by.bins` as this it what it does (PR #244, ed5feb214a123b605c9513262f187cfd72b9e1f4)
- Enhance `get.author.names.from.network` and `get.author.names.from.data` to always have the same output format. Now it doesn't depend on the `global` flag anymore (PR #248, d87d32564156f13c83ebe3361c2b68e5d0ac16ac, ddbfe68d3e628e82f34e09b36fffe886646986c5)
- Change `util-tensor.R` to correctly use the new output format of `get.author.names.from.network` (PR #248, 72b663ebf7169c0da5c687fe215529f3be0c08c5)
- Throw an error in `convert.adjacency.matrix.list.to.array` if the function is called with wrong parameters (PR #248, ece2d38b4972745af3a83e06f32317a06465a345, 1a3e510df15f5fa4e920e9fce3e0e162c27cd6d1)
- Rename `compare.networks` to `assert.networks.equal` to better match the purpose of the function (PR #248, d9f1a8d90e00a634d7caeb5e7f8f262776496838)
- Explicitly add R version 4.3 to the CI test pipeline (9f346d5bc3cfc553f01e5e80f0bbe51e1dc2b53e)
- Simplify call chain-, and branching-routes in network-splitting functions and consequently set the `bins` attribute on every output network-split (while minimizing recalculations) (PR #249, a1842e9be46596321ee86860fd87d17a3c88f50f)
- Test for the presence and validity of the `bins` attribute on network-, and data-splits (PR #249, c064affcfff2eb170d8bdcb39d837a7ff62b2cbd, 93051ab848ec94de138b0513dac22f6da0d20885)

### Fixed

- Reformat `event.info.1` column of issue data according to the <issue-%source-%id> format, if the content of the `event.info.1` field references another issue (PR #244, 62ff9d0f31adbefb3381936237dc4ab984e33acb)
- Fix an issue in activity-based splitting where elements close to the border of bins might be assigned to the wrong bin. The issue was caused by the usage of `split.data.time.based` inside `split.data.activity.based` to split data into the previously derived bins, when elements close to bin borders share the same timestamps. It is fixed by replacing `split.data.time.based` by `split.data.by.bins` (PR #244, ece569ceaf557bb38cd0cfad437b69b30fe8a698)
- Remove the last range when using a sliding-window approach and the last range's elements are fully contained in the second last range (PR #244, 48ef4fa685adf6e5d85281e5b90a8ed8f6aeb197)
- Remove the last range when using a sliding-window approach and the last range's elements are fully contained in the second last range (PR #244, 48ef4fa685adf6e5d85281e5b90a8ed8f6aeb197, 943228fbc91eed6854dacafa7075441e58b22675)
- Rename vertex attribute `IssueEvent` to `Issue` in multi-networks, to be consistent with bipartite-networks (PR #244, 26d7b7e9fd6d33d1c0a8a08f19c5c2e30346a3d9)
- Fix `get.expanded.adjacency` to work if the provided author list does not contain all authors from network and add a warning when that happens since it causes some authors from the network to be lost in the resulting matrix (PR #248, ff59017e114b10812dcfb1704a19e01fc1586a13)
- Fix `get.expanded.adjacency.matrices` to have correct names for the columns and rows (PR #248, e72eff864a1cb1a4aecd430e450d4a6a5044fdf2)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-06-01 06:50:26";"open";"[""unresolved""]"
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"resolution_updated";"Björn";"bjoern@example.org";"2013-06-01 06:53:06";"fixed";"""unresolved"""
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"referenced_by";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-332";"""issue"""
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"add_link";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-332";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"referenced_by";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-328";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"add_link";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-328";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]"
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"commented";"Olaf";"olaf@example.org";"2013-06-01 06:50:26";"open";"[""unresolved""]"
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"resolution_updated";"Björn";"bjoern@example.org";"2013-06-01 06:53:06";"fixed";"""unresolved"""
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"referenced_by";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-332";"""issue"""
"ZEPPELIN-328";"[ZEPPELIN-328] Interpreter page should clarify the % magic syntax for interpreter group.name";"[""issue"", ""bug""]";"closed";"[""fixed""]";"2013-04-21 23:52:09";"2013-05-25 20:02:08";"[""GUI"", ""Interpreters""]";"add_link";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-332";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"referenced_by";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-328";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"add_link";"Thomas";"thomas@example.org";"2017-05-21 12:00:00";"ZEPPELIN-328";"""issue"""
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"created";"Björn";"bjoern@example.org";"2016-07-12 16:01:30";"open";"[""unresolved""]"
"ZEPPELIN-332";"[ZEPPELIN-332] CNFE when running SQL query against Cassandra temp table";"[""issue"", ""bug""]";"open";"[""unresolved""]";"2016-07-12 16:01:30";"";"[""Interpreters""]";"commented";"Björn";"bjoern@example.org";"2016-07-12 16:02:30";"open";"[""unresolved""]"
Expand Down
24 changes: 20 additions & 4 deletions tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
## Copyright 2017-2019 by Claus Hunsen <hunsen@fim.uni-passau.de>
## Copyright 2018 by Barbara Eckl <ecklbarb@fim.uni-passau.de>
## Copyright 2018 by Jakob Kronawitter <kronawij@fim.uni-passau.de>
## Copyright 2023 by Maximilian Löffler <s8maloef@stud.uni-saarland.de>
## Copyright 2023-2024 by Maximilian Löffler <s8maloef@stud.uni-saarland.de>
## Copyright 2024 by Leo Sendelbach <s8lesend@stud.uni-saarland.de>
## All Rights Reserved.

Expand Down Expand Up @@ -117,18 +117,34 @@ patrick::with_parameters_test_that("Network construction of an issue-based artif
type = TYPE.ARTIFACT)
## 2) edges
edges = data.frame(
from = c("<issue-github-3>", "<issue-github-3>", "<issue-jira-ZEPPELIN-332>"),
to = c("<issue-github-2>", "<issue-github-6>", "<issue-jira-ZEPPELIN-328>"),
from = c("<issue-github-3>", "<issue-github-3>", "<issue-jira-ZEPPELIN-328>"),
to = c("<issue-github-2>", "<issue-github-6>", "<issue-jira-ZEPPELIN-332>"),
date = get.date.from.string(c("2016-08-07 15:30:00", "2016-08-07 15:37:02", "2017-05-21 12:00:00")),
artifact.type = c("IssueEvent", "IssueEvent", "IssueEvent"),
issue.id = c("<issue-github-3>", "<issue-github-3>", "<issue-jira-ZEPPELIN-332>"),
issue.id = c("<issue-github-3>", "<issue-github-3>", "<issue-jira-ZEPPELIN-328>"),
event.name = c("add_link", "add_link", "add_link"),
author.name = c("Thomas", "Karl", "Thomas"),
weight = c(1, 1, 1),
type = TYPE.EDGES.INTRA,
relation = "issue"
)

## 3) when constructing directed networks, we cannot deduplicate jira edges
if (test.directed) {
edges = rbind(edges, data.frame(
from = "<issue-jira-ZEPPELIN-332>",
to = "<issue-jira-ZEPPELIN-328>",
date = get.date.from.string("2017-05-21 12:00:00"),
artifact.type = "IssueEvent",
issue.id = "<issue-jira-ZEPPELIN-332>",
event.name = "add_link",
author.name = "Thomas",
weight = 1,
type = TYPE.EDGES.INTRA,
relation = "issue"
))
}

## configurations
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("issues.only.comments", FALSE)
Expand Down
Loading

0 comments on commit 56ff0b3

Please sign in to comment.