-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepositoryProvenance: Remove the originalVcsInfo
property
#3927
Conversation
33f14f3
to
bc8c1c6
Compare
I believe we need to reword the commit message to provide a more clear and convincing rationale why the ort/model/src/main/kotlin/Provenance.kt Lines 72 to 73 in e5b135d
So how can we now still match the package in that case? |
bc8c1c6
to
dfd50b5
Compare
I'll finish this PR after #3991 is in master. |
dfd50b5
to
cd8fe9c
Compare
I've added a comment to improve the KDoc for the |
cd8fe9c
to
83b586a
Compare
83b586a
to
1536a59
Compare
1536a59
to
263e5d8
Compare
@sschuberth @mnonnenmacher this is no more on hold, now ready for review again. |
model/src/main/kotlin/Provenance.kt
Outdated
@@ -82,7 +82,7 @@ data class RepositoryProvenance( | |||
|
|||
// If pkg.vcsProcessed equals originalVcsInfo or vcsInfo this provenance was definitely created when | |||
// downloading this package. | |||
if (pkg.vcsProcessed == originalVcsInfo || pkg.vcsProcessed == vcsInfo) return true | |||
if (pkg.vcsProcessed == originalVcsInfo || pkg.vcsProcessed.equalsIgnoreResolvedRevision(vcsInfo)) return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we simply omit this check instead, as what you're implementing in lines 118 / 119 basically is what we already do i lines 91 / 92, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function I added below in line 118 / 119 does not compare revision
against resolvedRevision
, like line 91 does. I thought about the logic in lines 88 - 92 and decided to not touch it in this PR as it makes things even more complicate. I'd prefer to limit this commit to just fixing that one existing check, of course ackknowledging that this matches() function can be improved later. Would that be ok?
model/src/main/kotlin/Provenance.kt
Outdated
* for the tag again. | ||
* The original [VcsInfo] equal to the [meta-data of the package][Package.vcsProcessed], so with an empty resolved | ||
* revision (as of 835b197). It is different from [vcsInfo] only in terms of the [resolved revision] | ||
* [VcsInfo.resolvedRevision]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- All VcsInfo properties except for the resolved revision must be equal
Here, I'm having a problem with the argumentation. If the scan results says to have a scan result for id "foo" of (unresolved) revision "master", then we actually must never use than scan result, as we have no clue whether "master" changed since the time it was scanned.
In other words, shouldn't be exactly the opposite: We do not care about the revision
s, but the resolvedRevision
s must match in order to use a stored scan result.
What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message:
does not involde
"does not involve"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'm having a problem with the argumentation. If the scan results says to have a scan result for id "foo" of (unresolved) revision "master", then we actually must never use than scan result, as we have no clue whether "master" changed since the time it was scanned.
To my understanding the implementation as-is, even before my recent work on this provenance area of code, behaves like I described it. That is, even if the revision is "master", a stored scan result can be used.
One can argue which of the approaches is better, but I believe there is no clear answer:
- In terms of correctness one could argue for any of the two options why it is better.
- In terms of execution speed, not re-downloading `ScanResult" for master clearly is better.
So, I believe the implementation as-is is a good choice. What do you think?
We do not care about the revisions, but the resolvedRevisions must match in order to use a stored scan result.
That's not possible, because a package does not have a resolvedRevision
. It would be possible, if we'd re-resolve
the revisions before each scan result look-up and / or cache the resolutions along with a separate max cache age.
To me this is a different topic separate from this PR.
263e5d8
to
4ddeb1e
Compare
Change requests addressed or commented on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed by voice.
In ORT speak the actually downloaded revision is called resolved revision. Signed-off-by: Frank Viernau <frank.viernau@here.com>
VcsInfo has a dedicated property for the revision coming from the package meta data named "revision", and another dedicated property for the revision resolved by the downloader, namely "resolvedRevision". Do not mix up these two purposes. Note: Since `Provenance.matches()` involves the comparison of the revision, the scan result storage implementation won't match "old" entries for packages without a revision in their meta-data anymore. Signed-off-by: Frank Viernau <frank.viernau@here.com>
The property `pkg.vcsProcessed.resolvedRevision` is always blank because it reflects only the package meta-data while the `resolvedRevision` is populated only by the downloader, so not part of the package meta-data. The property `vcsInfo.resolvedRevision` is never blank because it originates from the return value of a successful download of the downloader. This downloader guarantees to return a non-blank resolved revision for successful downloads from a VCS. According to the above, the `takeIf` condition always evaluates to true, so it is safe to remove it. Signed-off-by: Frank Viernau <frank.viernau@here.com>
The previously used expression (`pkg.vcsProcessed == vcsInfo`) always evaluates to false, because `vcsInfo` always has a resolved revision while `pkg.vcsProcessed` hasn't. Fix that by excluding the resolved revision from the comparison. Signed-off-by: Frank Viernau <frank.viernau@here.com>
The purpose of the provenance is to point to some code location. That location is either a source artifact, a VCS location / revision or an unknown location. The main use case in ORT for provenance is to describe what has been downloaded or scanned. For that purpose `originalVcsInfo` is irrelevant, because `vcsInfo` already describes exactly what has been downloaded. Remove `originalVcsInfo` to make provenance be usable for that single purpose. If it should become necessary to also have the original provenance a separate attribute can be added elsewhere. Note that finding a scan result for a given package does not involve the `originalVcsInfo`, in fact all its attribute values are contained also in `vcsInfo` except for the `resolvedRevision`. Signed-off-by: Frank Viernau <frank.viernau@here.com>
Use the code branches only for constructing the provenance, but not anymore for constructing the scan result. Signed-off-by: Frank Viernau <frank.viernau@here.com>
Signed-off-by: Frank Viernau <frank.viernau@here.com>
4ddeb1e
to
ee39c70
Compare
Change requests addressed or commented on.
@sschuberth @mnonnenmacher , I updated my PR to (hopefully exactly) what we discussed. |
The purpose of the provenance is to point to some code location. That
location is either a source artifact, a VCS location / revision or
an unknown location.
The main use case in ORT for provenance is to describe what has been
downloaded or scanned. For that purpose the original VCS information is
irrelevant. Remove it to make provenance be usable for that single
purpose. If it should become necessary to also have the original
provenance a separate attribute can be added elsewhere.
Note that finding a scan result for a given package does not involve the
originalVcsInfo
, in fact all its attribute values are contained alsoin
vcsInfo
except for theresolvedRevision
.Note: The change is marked as breaking, because "old" scan result storage entries for packages without a revision in their meta-data won't be matched anymore. They would then "automatically" be re-scanned.