-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow curations to be applied without rerunning the analyzer #6188
Comments
BTW, depending on the implementation, this could also solve #5637. |
Suggested name for the new ORT result section that could contain package curations, package configurations and resolutions: "corrections" |
To ensure consistency within a pipeline of the ORT tools it is important that configuration is only applied or replaced before it is used as input for a tool. For example, replacing package curations after running the scanner could lead to inconsistencies if the provenance metadata of a package is changed and as a result the scan result for that package does not match the provenance anymore. A summary of the order in which tools should be run and configuration should be added to the result file to ensure consistency in later tools in the pipeline (package curations are split into technical "metadata corrections" and "legal curations" based on the idea from #6187): Analyzer |
Based on my above comment I propose that we extend the
For both solutions it would be possible to add commands that replace the existing configuration, for example one command could update the curations in an @oss-review-toolkit/core-devs Any preference which direction we should take? |
I'm having a hard time to decide / make up an opinion:
On the one hand I like this as things that seemingly belong together are stored together, and we already have
This would work around the question how to deal with configuration used by different tools, and better match the also already existing So we already have a mix of (primarily) storing configuration by origin, or by consumer. Maybe we should in this context also consider to make the "origin" of configuration a property of the configuration itself. |
I have a tendency towards a top-level section for those properties, but I think we need a better name than "configuration" to separate it from the "tool configuration" in config.yml and .ort.yml. Above "corrections" was suggested, this works for curations and package configurations but not for resolutions. So maybe two top-level sections
I agree, at least for corrections and resolutions. For example, for curations we should store the provider. But for the tool configurations this might not be possible, because it can come from multiple sources like environment variables, command line parameters or config.yml, and this could be different for each property. To make a draft: ort:
...
corrections:
packageCurations:
...
packageConfigurations:
...
resolutions:
issues:
...
ruleViolations:
...
vulnerabilities:
... |
I want to bring to attention the following use case relevant for support workflows, e.g. when using
This IMO should be as fast as possible, so I fear if we
What do you think? edit: I've decided to write this down to an issue, see oss-review-toolkit/orthw-shell#60. |
The problem with that workflow is that depending on the curation changes it could not be sufficient to only re-run the reporter, but it might be required to also re-run the advisor, scanner, and evaluator to get correct results.
I think the curations should still be fetched by the analyzer command to not always require an extra step to fetch curations. The benefit of the separate command is that can be run independent of which other tool needs to be run afterwards, so it could be used to fix a purl before running the advisor, to fix a VCS URL before running the scanner, and so on. But yes, it introduces the overhead of an additional serialization and deserialization.
To implement this correctly #5668 needs to be done first. |
That's only for provenance curations. The workflow still provides a lot of value, even when these provenance curations don't take effect. It targets: fixing issues using data which has already slowly been gathered (not re-gathering all data from scratch again to be up-to-date) . |
It's not only about provenance, for the advisor purl curations could be relevant and for the evaluator basically any curated property could affect rule violations. I somehow assumed that when you wrote "Recompute the reports" that this would also include at least re-running the the evaluator, because the results of only re-running the reporter after changing curations should be completely predictable. My point is, I'm fine if orthw supports only a specific use-case, but ORT should support all of them. |
The use case for re-applying curations in the evaluator stage would work with the following approach in a general way which IMO would be quite nice
The above method can be implemented in a separate dedicated command as well as in the evaluator, which should be possible without any code duplication. Requirement: Each curation in the ORT file must be associated with a provider ID where it came from. |
Agree that ORT should support all of them. My point was more that ORT should also keep supporting seeing the effect of local changes quickly, which has been so far possible with |
I believe the we should make a solution which works for solving [1][2][3] altogether.
[1] oss-review-toolkit/orthw-shell#60 Note: I'm not certain whether corrections fit |
Proposed ORT result model as discussed in the developer meeting: ort:
resolvedConfiguration:
packageCurations:
providers:
- id: clearly-defined
metadata:
serverUrl: https://...
revision: abc
- id: local-file
metadata:
filename: curations.yml
data:
clearly-defined:
- curation1
- curation2
local-file:
- curation1
packageConfigurations:
...
issueResolutions:
...
ruleViolationResolutions:
...
vulnerabilityResolutions:
...
The metadata would come from a new function My preference for setting the id in config.yml would be: packageCurationProviders:
- name: File
id: 'ort-ort-config'
config:
path: '~/devel/ort-config'
- name: File
id: 'my-org-ort-config'
config:
path: '~/devel/my-org-ort-config' |
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
The `OrtResult` does not store the uncurated packages as part of the analyzer result, but only the curated packages along with the applied package curation data. This tightly couples the curations with the analyzer without need, because the analyzer does not need (to consume) any curations at all. Also, computing the respective uncurated package from each curated package is not always possible due to missing data [1]. So, curations currently cannot properly be (re-applied) without re-running the analyzer [2]. Furthermore, the current representation stores package curation data redundantly in case the curation applies to multiple packages. Given that, it makes sense to store the curations separately from the uncurated package. So, utilize the new toplevel `resolvedConfiguration` to store the package curations and change the analyzer result to contain uncurated instead of curated packages. Note that this partially implements [1] and [2]. Adjusting the logic which turns curated into uncurated packages, e.g. `toUncuratedPackage()`, is left for a future change to limit the size of this change. Apart from that [3] can be implemented by relatively easily without redundantly encoding the provider (for each curation data). [1] #5637 [2] #6188 [3] #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [1] and also mentioned in [2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [1]: #6188 (comment) [2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
Extend `ResolvedConfiguration` to associate the curations with the ID of the package curation provider to enable tracability of curations back to the provider. The separate `ResolvedConfiguration.provider` list is introduced to align with the idea of adding provider metadata, as outlined in [^1] and also mentioned in [^2]. This implementation also is a first step towards use cases involving: 1. Replacing the curations for a given provider ID with the given ones. 2. Re-resolve curations only for a particular provider ID. Both are left as TODO for future changes to limit the size of this change, while for 1. a TODO comment is left in the code. Fixes #5668. [^1] #6188 (comment) [^2]: #5668 Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@mnonnenmacher @fviernau I lost a bit track of this since we've merged the resolved configuration stuff / the new way of storing curations in the ORT result. But going forward, how exactly do we plan to apply updated curations? IMO the partly implemented current approach of curation-override options per tool does not scale. I'd much more like to see a tool / command that can update the curations in a result file, and then that updated result file can be passed to other tools without specifying any override options. |
This option can be used to rather quickly check whether packages from an analyzer result can be downloaded without actually running the scanner / downloader. As such the option can also be used to more quickly verify curations after (re-)applying them to the analyzer result. For the latter, a proper solution yet needs to be implemented, see [1]. Note that the implementation is not complete yet. E.g. not all cases where a real download would succeed can be verified, as guessing revisions while keeping downloads to a minimum is difficult to implement for a dry run. [1]: #6188 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
This option can be used to rather quickly check whether packages from an analyzer result can be downloaded without actually running the scanner / downloader. As such the option can also be used to more quickly verify curations after (re-)applying them to the analyzer result. For the latter, a proper solution yet needs to be implemented, see [1]. Note that the implementation is not complete yet. E.g. not all cases where a real download would succeed can be verified, as guessing revisions while keeping downloads to a minimum is difficult to implement for a dry run. [1]: #6188 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Currently my preferred approach would be to introduce a new ORT CLI command like |
Ok, good, so we're in line about having a new command.
We already have the |
It can make the command difficult to use and implement if it can be used for two different things. For example, which options are relevant for which use case? |
That could be solved via prefixes to options (though I agree that might not be the nicest user experience), or we could use sub-subcommands, like the helper-cli already does. |
This option can be used to rather quickly check whether packages from an analyzer result can be downloaded without actually running the scanner / downloader. As such the option can also be used to more quickly verify curations after (re-)applying them to the analyzer result. For the latter, a proper solution yet needs to be implemented, see [1]. Note that the implementation is not complete yet. E.g. not all cases where a real download would succeed can be verified, as guessing revisions while keeping downloads to a minimum is difficult to implement for a dry run. [1]: #6188 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
This option can be used to rather quickly check whether packages from an analyzer result can be downloaded without actually running the scanner / downloader. As such the option can also be used to more quickly verify curations after (re-)applying them to the analyzer result. For the latter, a proper solution yet needs to be implemented, see [1]. Note that the implementation is not complete yet. E.g. not all cases where a real download would succeed can be verified, as guessing revisions while keeping downloads to a minimum is difficult to implement for a dry run. [1]: #6188 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
This option can be used to rather quickly check whether packages from an analyzer result can be downloaded without actually running the scanner / downloader. As such the option can also be used to more quickly verify curations after (re-)applying them to the analyzer result. For the latter, a proper solution yet needs to be implemented, see [1]. Note that the implementation is not complete yet. E.g. not all cases where a real download would succeed can be verified, as guessing revisions while keeping downloads to a minimum is difficult to implement for a dry run. [1]: #6188 Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Currently, the turn around-times to test (technical) curations (also see #6187) locally or on CI are rather high as curations are "baked" into Analyzer results. This means one needs to rerun the analyzer, even if the previous analysis was successful, just in order to get updated curation data applied to its results.
Just brainstorming some ideas how to address this (without any ordering implied):
The text was updated successfully, but these errors were encountered: