Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepositoryProvenance: Remove the originalVcsInfo property #3927

Merged
merged 7 commits into from
May 26, 2021

Conversation

fviernau
Copy link
Member

@fviernau fviernau commented Apr 23, 2021

The purpose of the provenance is to point to some code location. That
location is either a source artifact, a VCS location / revision or
an unknown location.

The main use case in ORT for provenance is to describe what has been
downloaded or scanned. For that purpose the original VCS information is
irrelevant. Remove it to make provenance be usable for that single
purpose. If it should become necessary to also have the original
provenance a separate attribute can be added elsewhere.

Note that finding a scan result for a given package does not involve the
originalVcsInfo, in fact all its attribute values are contained also
in vcsInfo except for the resolvedRevision.

Note: The change is marked as breaking, because "old" scan result storage entries for packages without a revision in their meta-data won't be matched anymore. They would then "automatically" be re-scanned.

@fviernau fviernau requested review from tsteenbe and a team as code owners April 23, 2021 07:59
@fviernau fviernau force-pushed the ditch-original-vcs branch 3 times, most recently from 33f14f3 to bc8c1c6 Compare April 23, 2021 10:32
@sschuberth
Copy link
Member

I believe we need to reword the commit message to provide a more clear and convincing rationale why the originalVcsInfo field is not needed anymore. In particular as the field's documentation states:

* would be no way to match the package to the [Provenance] without downloading the source code and searching
* for the tag again.

So how can we now still match the package in that case?

model/src/main/kotlin/Provenance.kt Outdated Show resolved Hide resolved
model/src/main/kotlin/ScanResult.kt Outdated Show resolved Hide resolved
@fviernau
Copy link
Member Author

fviernau commented May 6, 2021

I'll finish this PR after #3991 is in master.

@fviernau
Copy link
Member Author

fviernau commented May 7, 2021

I believe we need to reword the commit message to provide a more clear and convincing rationale why the originalVcsInfo field is not needed anymore. In particular as the field's documentation states:

* would be no way to match the package to the [Provenance] without downloading the source code and searching
* for the tag again.

So how can we now still match the package in that case?

I've added a comment to improve the KDoc for the originalVcsInfo property, to make it less mis-leading before removing it.
I've also added a short explanation to the commit message of the commit which removes originalVcsInfo. @sschuberth please see if you like it.

@fviernau fviernau requested a review from sschuberth May 7, 2021 10:58
@fviernau fviernau added the breaking change Pull requests that break compatibility with previous behavior label May 7, 2021
@fviernau
Copy link
Member Author

@sschuberth @mnonnenmacher this is no more on hold, now ready for review again.

downloader/src/main/kotlin/Downloader.kt Outdated Show resolved Hide resolved
@@ -82,7 +82,7 @@ data class RepositoryProvenance(

// If pkg.vcsProcessed equals originalVcsInfo or vcsInfo this provenance was definitely created when
// downloading this package.
if (pkg.vcsProcessed == originalVcsInfo || pkg.vcsProcessed == vcsInfo) return true
if (pkg.vcsProcessed == originalVcsInfo || pkg.vcsProcessed.equalsIgnoreResolvedRevision(vcsInfo)) return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we simply omit this check instead, as what you're implementing in lines 118 / 119 basically is what we already do i lines 91 / 92, or?

Copy link
Member Author

@fviernau fviernau May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function I added below in line 118 / 119 does not compare revision against resolvedRevision, like line 91 does. I thought about the logic in lines 88 - 92 and decided to not touch it in this PR as it makes things even more complicate. I'd prefer to limit this commit to just fixing that one existing check, of course ackknowledging that this matches() function can be improved later. Would that be ok?

* for the tag again.
* The original [VcsInfo] equal to the [meta-data of the package][Package.vcsProcessed], so with an empty resolved
* revision (as of 835b197). It is different from [vcsInfo] only in terms of the [resolved revision]
* [VcsInfo.resolvedRevision].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. All VcsInfo properties except for the resolved revision must be equal

Here, I'm having a problem with the argumentation. If the scan results says to have a scan result for id "foo" of (unresolved) revision "master", then we actually must never use than scan result, as we have no clue whether "master" changed since the time it was scanned.

In other words, shouldn't be exactly the opposite: We do not care about the revisions, but the resolvedRevisions must match in order to use a stored scan result.

What am I missing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

does not involde

"does not involve"

Copy link
Member Author

@fviernau fviernau May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'm having a problem with the argumentation. If the scan results says to have a scan result for id "foo" of (unresolved) revision "master", then we actually must never use than scan result, as we have no clue whether "master" changed since the time it was scanned.

To my understanding the implementation as-is, even before my recent work on this provenance area of code, behaves like I described it. That is, even if the revision is "master", a stored scan result can be used.
One can argue which of the approaches is better, but I believe there is no clear answer:

  • In terms of correctness one could argue for any of the two options why it is better.
  • In terms of execution speed, not re-downloading `ScanResult" for master clearly is better.

So, I believe the implementation as-is is a good choice. What do you think?

We do not care about the revisions, but the resolvedRevisions must match in order to use a stored scan result.

That's not possible, because a package does not have a resolvedRevision. It would be possible, if we'd re-resolve
the revisions before each scan result look-up and / or cache the resolutions along with a separate max cache age.
To me this is a different topic separate from this PR.

@fviernau fviernau requested a review from sschuberth May 11, 2021 10:09
@fviernau fviernau dismissed sschuberth’s stale review May 12, 2021 10:06

Change requests addressed or commented on.

Copy link
Member

@sschuberth sschuberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed by voice.

In ORT speak the actually downloaded revision is called resolved
revision.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
VcsInfo has a dedicated property for the revision coming from the
package meta data named "revision", and another dedicated property for
the revision resolved by the downloader, namely "resolvedRevision".
Do not mix up these two purposes.

Note: Since `Provenance.matches()` involves the comparison of the
revision, the scan result storage implementation won't match "old"
entries for packages without a revision in their meta-data anymore.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
The property `pkg.vcsProcessed.resolvedRevision` is always blank because
it reflects only the package meta-data while the `resolvedRevision` is
populated only by the downloader, so not part of the package meta-data.

The property `vcsInfo.resolvedRevision` is never blank because it
originates from the return value of a successful download of the
downloader. This downloader guarantees to return a non-blank resolved
revision for successful downloads from a VCS.

According to the above, the `takeIf` condition always evaluates to
true, so it is safe to remove it.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
The previously used expression (`pkg.vcsProcessed == vcsInfo`) always
evaluates to false, because `vcsInfo` always has a resolved revision
while `pkg.vcsProcessed` hasn't. Fix that by excluding the resolved
revision from the comparison.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
The purpose of the provenance is to point to some code location. That
location is either a source artifact, a VCS location / revision or
an unknown location.

The main use case in ORT for provenance is to describe what has been
downloaded or scanned. For that purpose `originalVcsInfo` is
irrelevant, because `vcsInfo` already describes exactly what has been
downloaded. Remove `originalVcsInfo` to make provenance be usable for
that single purpose. If it should become necessary to also have the
original provenance a separate attribute can be added elsewhere.

Note that finding a scan result for a given package does not involve the
`originalVcsInfo`, in fact all its attribute values are contained also
in `vcsInfo` except for the `resolvedRevision`.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
Use the code branches only for constructing the provenance, but not
anymore for constructing the scan result.

Signed-off-by: Frank Viernau <frank.viernau@here.com>
Signed-off-by: Frank Viernau <frank.viernau@here.com>
@fviernau fviernau requested a review from sschuberth May 21, 2021 10:23
@fviernau fviernau dismissed sschuberth’s stale review May 21, 2021 10:24

Change requests addressed or commented on.

@fviernau
Copy link
Member Author

fviernau commented May 21, 2021

@sschuberth @mnonnenmacher , I updated my PR to (hopefully exactly) what we discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Pull requests that break compatibility with previous behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants