Skip to content

Commit

Permalink
Merge pull request #252 from Leo-Send/pullrequest
Browse files Browse the repository at this point in the history
Adding commit-interactions

Reviewed-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Reviewed-by: Christian Hechtl <hechtl@cs.uni-saarland.de>
  • Loading branch information
hechtlC authored Apr 23, 2024
2 parents 49c0d2c + ee54b1a commit 2b38824
Show file tree
Hide file tree
Showing 15 changed files with 769 additions and 15 deletions.
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

# coronet – Changelog

## unversioned

### Added

- Add commit-interaction data and add functions `read.commit.interactions` for reading, as well as `get.commit.interactions`, `set.commit.interactions` and utility functions for working with commit-interaction data (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, b4fd2a29c9b5fd561b1106c6febb54a32b0085ab, fd0aa05f824b93545ae8e05833b95b3bd9809286, bca35760eb0aac86c04923f2d534b2d8cece204e) as well as tests for these features (PR #252, eeba7e29932bc973513c963fb9e716e9230d570f, 8bb39f4df39b49dfaff8f19feb6db5e5fbd81fac, 54b6f655248720436af116fe72521f9cb0348429, 7a5497aaf9114017d1b3b9b68b6cccd7ca8ac114, 7b8585f87675795822c07230192d6454de31dcc7, ef725407bf8818c8fff96ea6f343338b7162cbe0)
- Add commit-interaction networks that can be created with `create.author.network` and `create.artifact.network` if the `artifact.relation` and `author.relation` is configured to be `commit.interaction` (PR #252, d82857fbebd1111bb16588a4223bb24a8dcd07de, 329d97ec3de36a9e1bcadc0c7a53c1d92e8b481c) as well as tests for these features (PR #252, 07e7ed744209b0251217fa8f7f35d9b9875face2, 7068cfa10d993dcae3f5e3f76f8cafa99fa8b350)
- Add helper function for prefixing function names with file names in `util-read.R` (PR #252, f8ea987b138173cf0509c7910e0572d8ee1b3f1f)

### Changed/Improved

### Fixed

## 4.4

### Announcement
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ Alternatively, you can run `Rscript install.R` to install the packages.
- `jsonlite`: For parsing the issue data
- `rTensor`: For calculating EDCPTD centrality
- `Matrix`: For sparse matrix representation of large adjacency matrices
- `fastmap`: For fast implementation of a map
- `purrr`: For fast implementtion of a mapping function

### Submodule

Expand Down Expand Up @@ -264,6 +266,11 @@ Relations determine which information is used to construct edges among the verti
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts are connected when they reference each other (i.e., one artifact calls a function contained in the other artifact).
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits (same as for the relation `cochange`).

- `commit.interaction`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts.
* This relation does not apply for bipartite networks.

#### Edge-construction algorithms for author networks

When constructing author networks, we use events in time (i.e., commits, e-mails, issue events) to model interactions among authors on the same artifact as edges. Therefore, we group the events on artifacts, based on the configured relation (see the [previous section](#relations)).
Expand Down Expand Up @@ -597,6 +604,12 @@ There is no way to update the entries, except for the revision-based parameters.
- `custom.event.timestamps.locked`:
* Lock custom event timestamps to prevent them from being read if empty or not yet present when calling the getter.
* [`TRUE`, *`FALSE`*]
- `commit.interactions`:
* Allow construction of author and artifact networks using commit-interaction data
* [`TRUE`, *`FALSE`*]
- `commit.interactions.filter.global`:
* Filter out entries from commit interaction data that are not matched to a specific function or file
* [*`TRUE`*, `FALSE`]

### NetworkConf

Expand Down
5 changes: 4 additions & 1 deletion install.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
## Copyright 2020-2023 by Thomas Bock <bockthom@cs.uni-saarland.de>
## Copyright 2019 by Anselm Fehnker <fehnker@fim.uni-passau.de>
## Copyright 2021 by Christian Hechtl <hechtl@cs.uni-saarland.de>
## Copyright 2024 by Leo Sendelbach <s8lesend@stud.uni-saarland.de>
## All Rights Reserved.
##
## Adapted from https://github.com/siemens/codeface/blob/be382e9171fb91b4aa99b99b09b2ef64a6dba0d5/packages.r
Expand All @@ -44,7 +45,9 @@ packages = c(
"viridis",
"jsonlite",
"rTensor",
"Matrix"
"Matrix",
"fastmap",
"purrr"
)


Expand Down
1 change: 1 addition & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ We have two test projects you can use when writing your tests:
* Commit messages
* Pasta
* Synchronicity
* Commit interactions
* Custom event timestamps in `custom-events.list`
* Revisions
2. - Casestudy: `test_empty`
Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
scope: REGION
result-map:
test_function:
demangled-name: test_function
file: test3.c
num-instructions: 30
insts:
- base-hash:
region: 45620620587549
function: test_function
commit: 1143db502761379c2bfcecc2007fc34282e7ee61
repository: test-repo
interacting-hashes:
- region: 87546092348456
commit: 5a5ec9675e98187e1e92561e1888aa6f04faa338
repository: test-repo
amount: 2
callees:
- test_callee
commits:
- commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af
repository: test-repo
test2:
demangled-name: test2
file: test2.c
num-instructions: 26
insts:
- base-hash:
region: 50956672345141
commit: 3a0ed78458b3976243db6829f63eba3eead26774
repository: test-repo
interacting-hashes:
- region: 98750276234511
commit: 0a1a5c523d835459c42f33e863623138555e2526
repository: test-repo
amount: 1
- base-hash:
region: 67230588834344
commit: 0a1a5c523d835459c42f33e863623138555e2526
repository: test-repo
interacting-hashes:
- region: 33295067820043
function: test2
commit: 418d1dc4929ad1df251d2aeb833dd45757b04a6f
repository: test-repo
- region: 20194653678423
function: test2
commit: d01921773fae4bed8186b0aa411d6a2f7a6626e6
repository: test-repo
amount: 3
callees:
- test_callee
commits:
- commit: 3383d8e5561dfc6fb2b65e0a194df94ccb5e08af
repository: test-repo
83 changes: 83 additions & 0 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
## Copyright 2021 by Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
## Copyright 2022 by Jonathan Baumann <joba00002@stud.uni-saarland.de>
## Copyright 2023 by Maximilian Löffler <s8maloef@stud.uni-saarland.de>
## Copyright 2024 by Leo Sendelbach <s8lesend@stud.uni-saarland.de>
## All Rights Reserved.


Expand Down Expand Up @@ -98,6 +99,13 @@ test_that("Compare two ProjectData objects on empty data", {
proj.data.two$set.project.conf.entry("commit.messages", "message")
proj.data.two$get.commit.messages()
expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.messages).")

proj.data.one$set.project.conf.entry("commit.interactions", TRUE)
proj.data.one$get.commit.interactions()
expect_false(proj.data.one$equals(proj.data.two), "Two non-identical ProjectData objects (commit.interactions).")
proj.data.two$set.project.conf.entry("commit.interactions", TRUE)
proj.data.two$get.commit.interactions()
expect_true(proj.data.one$equals(proj.data.two), "Two identical ProjectData objects (commit.interactions).")
})

test_that("Compare two ProjectData objects on non-empty data", {
Expand Down Expand Up @@ -511,3 +519,78 @@ test_that("Create RangeData objects from Codeface ranges and check data path", {

expect_identical(range.paths, expected.paths, "RangeData data paths")
})

test_that("Compare two ProjectData Objects with commit.interactions", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)

proj.data.one = ProjectData$new(project.conf = proj.conf)
proj.data.two = proj.data.one$clone(deep = TRUE)

## test if the project data is equal and the commit interactions are as well
expect_equal(proj.data.one$get.commit.interactions(), proj.data.two$get.commit.interactions())
expect_true(proj.data.one$equals(proj.data.two))

## change commit interactions of one project data and assert that equality check fails
proj.data.two$set.commit.interactions(create.empty.commit.interaction.list())
expect_false(proj.data.one$equals(proj.data.two))

## change commit data in one to test if commit-interactions are correctly updated
## call get.commit.interactions() once to restore read interactions
proj.data.two$get.commit.interactions()

## change commits in one project data
commit.data = proj.data.one$get.commits()
commit.data[["hash"]][[5]] = 1
proj.data.one$set.commits(commit.data)

## use isTRUE to compress result of all.equal into a single boolean
expect_false(isTRUE(all.equal(proj.data.one$get.commit.interactions(),
proj.data.two$get.commit.interactions())))

## The data frame should still have 4 entries:
expect_true(nrow(proj.data.one$get.commit.interactions()) == 4)
## after cleanup is called, the data frame should only have 3 entries:
proj.data.one$cleanup.commit.interactions()
expect_true(nrow(proj.data.one$get.commit.interactions()) == 3)

## set commit list of one project data to empty and test that last
## two rows of result data frame are empty
proj.data.two$set.commits(create.empty.commits.list())

## create empty data frame of correct size
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
## assure that the correct type is used
for(i in seq_len(8)) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file",
"base.func", "base.file", "base.author",
"interacting.author")
commit.interactions.data.expected[["commit.hash"]] =
c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6")
commit.interactions.data.expected[["base.hash"]] =
c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526")
commit.interactions.data.expected[["func"]] = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2")
commit.interactions.data.expected[["file"]] = c("GLOBAL", "test2.c", "GLOBAL", "test2.c")
commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c")

expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected)

## reactivate filtering of commit interactions
proj.data.two$set.project.conf.entry("commit.interactions.filter.global", TRUE)
expect_true(nrow(proj.data.two$get.commit.interactions()) == 2)
})
98 changes: 98 additions & 0 deletions tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -212,3 +212,101 @@ patrick::with_parameters_test_that("Network construction of an empty 'comments-o
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))

patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'file'", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "file")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)
proj.data = ProjectData$new(project.conf = proj.conf)

net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction",
artifact.directed = test.directed))

network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)
network.built = network.builder$get.artifact.network()
## build the expected nbetwork
vertices = data.frame(
name = c("test2.c", "test3.c", "GLOBAL"),
kind = "File",
type = TYPE.ARTIFACT
)
edges = data.frame(
from = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
to = c("test2.c", "test2.c", "test3.c", "test2.c"),
func = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
hash = c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526"),
base.func = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
)
network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices)

expect_true(igraph::identical_graphs(network.built, network))
}, patrick::cases(
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))

patrick::with_parameters_test_that("Network construction with commit-interactions as relation, artifact type 'function'", {
## configuration object for the datapath
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, "function")
proj.conf$update.value("commit.interactions", TRUE)
proj.conf$update.value("commits.filter.untracked.files", FALSE)
proj.conf$update.value("commits.filter.base.artifact", FALSE)
proj.conf$update.value("commit.interactions.filter.global", FALSE)
proj.data = ProjectData$new(project.conf = proj.conf)

net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(artifact.relation = "commit.interaction",
artifact.directed = test.directed))

network.builder = NetworkBuilder$new(project.data = proj.data, network.conf = net.conf)
network.built = network.builder$get.artifact.network()
## build the expected network
vertices = data.frame(
name = c("test2.c::test2", "test3.c::test_function", "GLOBAL"),
kind = "Function",
type = TYPE.ARTIFACT
)
edges = data.frame(
from = c("GLOBAL", "test2.c::test2", "GLOBAL", "test2.c::test2"),
to = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
hash = c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
"5a5ec9675e98187e1e92561e1888aa6f04faa338",
"d01921773fae4bed8186b0aa411d6a2f7a6626e6"),
file = c("GLOBAL", "test2.c", "GLOBAL", "test2.c"),
base.hash = c("3a0ed78458b3976243db6829f63eba3eead26774",
"0a1a5c523d835459c42f33e863623138555e2526",
"1143db502761379c2bfcecc2007fc34282e7ee61",
"0a1a5c523d835459c42f33e863623138555e2526"),
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
)
network = igraph::graph.data.frame(edges, directed = test.directed, vertices = vertices)

expect_true(igraph::identical_graphs(network.built, network))
}, patrick::cases(
"directed: FALSE" = list(test.directed = FALSE),
"directed: TRUE" = list(test.directed = TRUE)
))
Loading

0 comments on commit 2b38824

Please sign in to comment.