Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge-attribute handling when simpliyfing networks #251

Closed
bockthom opened this issue Feb 8, 2024 · 8 comments
Closed

Fix edge-attribute handling when simpliyfing networks #251

bockthom opened this issue Feb 8, 2024 · 8 comments

Comments

@bockthom
Copy link
Collaborator

bockthom commented Feb 8, 2024

In PR #250, we introduce a new possibility for simplifying multi-relation edges. However, there are no tests yet for this new feature, and also the edge-attribute handling still needs to be adjusted accordingly. Hence, in this issue, I describe the missing steps that should be approached after PR #250 is merged:

Let me start with an example:
Assume that we have an author network containing two vertices A and B, which has multiple relations, namely "mail" and "cochange". Assume that there are three cochange edges and four mail edges between A and B.
Then the default simplification reduced the three cochange edges to one cochange edge, and the four mail edges to one mail edge. The new "simplify.multiple.relations" parameter instead simplifies the three cochange and four mail at edges to one single edges that represents all mail and cochange edges together.

Problem: Edge attributes need to be handled properly for the new case. The edge-attribute handling for simplification is specified here:

coronet/util-networks.R

Lines 55 to 70 in a0c3a52

EDGE.ATTR.HANDLING = list(
## network-analytic data
weight = "sum",
type = "first",
relation = "first",
## commit data
changed.files = "sum",
added.lines = "sum",
deleted.lines = "sum",
diff.size = "sum",
artifact.diff.size = "sum",
## everything else
"concat"
)

Currently, we only take the "first" relation - which was not a problem, was only edges of the same relation were simplified to one edge. Now, when we simplify edges of different relations to the same edge, "first" is wrong, as we need a unique vector of all relations instead of just taking the first one. Unfortunately, the edge-attribute handling of igraph does not provide a dedicated function for that. However, igraph allows to provide a custom function that is applied when handling edge attributes during simplification. Hence, such a custom function is necessary for relation here.

However, changing the edge-attribute handling will also break some other parts of coronet, which assume that the edge attribute relation are only a single element - but now it also can be a vector. For example, this might break the following lines, in which the data sources are determined from the relations - this might break if there are vectors of relations instead of single elements:

coronet/util-networks.R

Lines 1673 to 1689 in a0c3a52

get.data.sources.from.relations = function(network) {
## get all relations in the network
data.sources = unique(igraph::E(network)$relation)
## map them to data sources respectively using the defined translation constant
data.sources = sapply(data.sources, function(relation) {
## check for a \code{character(0)} relation and abort if there is one
if (length(relation) == 0) {
logging::logwarn("There seems to be an empty relation in the network. Cannot proceed.")
return (NA)
}
## use the translation constant to get the appropriate data source
return(RELATION.TO.DATASOURCE[[relation]])
}, USE.NAMES = FALSE) # avoid element names as we only want the data source's name
return(data.sources)

It is important, that the new functionality is backwards compatible, this is, if the default simplification procedure is used, then everything should work as before, and only if there are multiple relations combined into a single edge, we need to find a way that all the functions are also able to deal with a vector of relations instead of a single edge.

In addition, PR #250 also lacks of tests. So proper tests should also be added 😉

@MaLoefUDS
Copy link
Contributor

I have implemented a simple disambiguation of the relation attribute here. After executing all tests, I did not find any breaking sections in coronet (as of right now, I did not add any tests myself for this). Especially for the case you mentioned, im not sure why it should break. Manual inspection showed it working just fine 😄

@bockthom
Copy link
Collaborator Author

bockthom commented Feb 22, 2024

I have implemented a simple disambiguation of the relation attribute here.

I've never used modifyList, but it seems to be helpful :) However, I am not sure if your implementation works as intended in all cases. One characteristic that I forgot mentioning before was sorting: We need to make sure that c("cochange, "issue") and c("issue","cochange") are treated equal, which is why we should sort the unique vectors afterwards, to be on the safe side.

After executing all tests, I did not find any breaking sections in coronet (as of right now, I did not add any tests myself for this). Especially for the case you mentioned, im not sure why it should break. Manual inspection showed it working just fine 😄

Hm, I am not convinced yet. In the case mentioned above, the following line returns different values in different cases:

data.sources = unique(igraph::E(network)$relation) 

If every edge represents a single relation, then data.sources can be the following list:
list("issue", "mail", "cochange")
However, if we simplify multi-relation edges, it can be the following list:
list(c("cochange", "issue"), c("issue"), c("issue","mail"), c("cochange", "issue", "mail"))
In this case, we have four edges with four different unique relations, but this does not resemble the data sources, but a combination of them. I am not sure if the subsequent sapply call (see the cited code snippet above) can deal with these vectors that consist of multiple elements properly. Please check again what happens in this function in such a case.

MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Feb 28, 2024
PR se-sic#250 introduces the 'simplify.multiple.relations' configuration option,
which can alter the edge-simplification algorithm for multi-networks.
Correspondingly, the edge-attribute handling does need adjustment to
work with this configuration option.

Disambiguate and sort the 'relation' edge-attribute when simplifying
multi-networks with the 'simplify.multiple.relations' option set.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Feb 28, 2024
As the 'relation' edge-attribute potentially holds a list of relation
sources, it needs to be unlisted first before it can be disambiguated.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
@MaLoefUDS
Copy link
Contributor

Oh by the way, I have fixed that one function affected by the changes to the relation attribute. I forgot that pushes to that branch do not notify anyone of you, but please look at this to oversee the fix. I have started to working on tests for the new simplification and will let you know more details in our next meeting 😄

@bockthom
Copy link
Collaborator Author

bockthom commented Mar 5, 2024

I had a quick look at the last two commits on the referenced branch. Looks good to me!
Regarding the edge attribute handling, an additional idea comes to my mind now: Wouldn't it be possible to move function relation.handling directly to the constant EDGE.ATTR.HANDLING? If this would be possible, then we would make sure that this is handled the same way whenever EDGE.ATTR.HANDLING is used (wherever this will happen in the future), and we would get rid of the modifyList call here. Just an idea, not sure if it works, but it is worth a try 😄

@MaLoefUDS
Copy link
Contributor

Im not entirely sure, I understand what you mean. Obviously, we cannot move relation.handling into EDGE.ATTR.HANDLING's definition, as it is not always the way we want to work. So I assume you mean that at the point of simplifying with the simplify.multiple.relations parameter set, then we overwrite the default relation handling function with relation.handling?

If I assumed correctly: I used modifyList especially because it does not alter the EDGE.ATTR.HANDLING object but returns a new one, because I did not want to break things by accident. However, I see your point in that it would provide benefit when overriding it at this point. So I checked it out in code and found that even when overriding it by doing EDGE.ATTR.HANDLING$relation = relation.handling it will not persist. I tried having two consecutive calls to simplify.network in one test and even though overriding works in both calls individually, it does not persist into the next calls, i.e., in the second call to simplify.network, EDGE.ATTR.HANDLING will have returned to its default. I am not sure what R does there and if it can still be done, maybe you know more :).

@bockthom
Copy link
Collaborator Author

bockthom commented Mar 6, 2024

Obviously, we cannot move relation.handling into EDGE.ATTR.HANDLING's definition, as it is not always the way we want to work.

Why not? I don't see any situation at which changing the definition of EDGE.ATTR.HANDLING would break. Calling "sort" and "unique" should not change anything in the previously existing way of simplification. But maybe I overlook something that I don't have in mind - do you have a certain situation in mind in which replacing the definition would break?

@MaLoefUDS
Copy link
Contributor

Well I supposed when simplifying multi-relational networks the classical way, but upon looking at the code, I think it would retain functionality. Apart from that, EDGE.ATTR.HANDLING is only ever accessed in one other location:

coronet/util-networks.R

Lines 910 to 915 in a0c3a52

} else if (!igraph::is.directed(authors.net) && igraph::is.directed(artifacts.net)) {
logging::logwarn(paste0("Author network is undirected, but artifact network is not.",
"Converting artifact network..."))
artifacts.net = igraph::as.undirected(artifacts.net, mode = "each",
edge.attr.comb = EDGE.ATTR.HANDLING)
}

I am not sure on the effects of changing the relation attribute in this call.

@bockthom
Copy link
Collaborator Author

bockthom commented Mar 6, 2024

According to the documentation of igraph::as.undirected, edge.attr.comb should only be effective if mode is "collapse" or "mutual". As "each" is used in this case, edge.attr.comb should not have any effect. Even when we would assume that we would use another mode here, I don't see any reason why the relation in artifact networks should be handled differently than in author networks.

MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Mar 7, 2024
As the 'relation' edge-attribute can only be singular values if
'simplify.multiple.relations' is not set, we can set the previously
introduced lambda as default.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Mar 7, 2024
PR se-sic#250 introduces the 'simplify.multiple.relations' configuration option,
which can alter the edge-simplification algorithm for multi-networks.
Correspondingly, the edge-attribute handling does need adjustment to
work with this configuration option.

Disambiguate and sort the 'relation' edge-attribute when simplifying
multi-networks with the 'simplify.multiple.relations' option set.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Mar 7, 2024
As the 'relation' edge-attribute potentially holds a list of relation
sources, it needs to be unlisted first before it can be disambiguated.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
MaLoefUDS added a commit to MaLoefUDS/coronet that referenced this issue Mar 7, 2024
As the 'relation' edge-attribute can only be singular values if
'simplify.multiple.relations' is not set, we can set the previously
introduced lambda as default.

This works towards se-sic#251.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants