Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow curations to be applied without rerunning the analyzer #6188

Open
sschuberth opened this issue Dec 7, 2022 · 19 comments
Open

Allow curations to be applied without rerunning the analyzer #6188

sschuberth opened this issue Dec 7, 2022 · 19 comments
Assignees
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements model About the data model

Comments

@sschuberth
Copy link
Member

sschuberth commented Dec 7, 2022

Currently, the turn around-times to test (technical) curations (also see #6187) locally or on CI are rather high as curations are "baked" into Analyzer results. This means one needs to rerun the analyzer, even if the previous analysis was successful, just in order to get updated curation data applied to its results.

Just brainstorming some ideas how to address this (without any ordering implied):

  • Do not store curations as part of Analyzer results at all, but create a new "Curator" tool that takes the Analyzer result as input and adds curations data to a separate section in the ORT result file, similar to like any other tool except for the Analyzer works.
  • Create a helper command that can patch-up analyzer results with updates / different curations. Actually, there already is the PackageCurationsCommand's SetCommand which could probably be used for this.
@sschuberth sschuberth added enhancement Issues that are considered to be enhancements analyzer About the analyzer tool model About the data model labels Dec 7, 2022
@sschuberth sschuberth changed the title Allow curations to get applied without rerunning the analyzer Allow curations to be applied without rerunning the analyzer Dec 12, 2022
@sschuberth
Copy link
Member Author

BTW, depending on the implementation, this could also solve #5637.

@mnonnenmacher
Copy link
Member

Suggested name for the new ORT result section that could contain package curations, package configurations and resolutions: "corrections"

@mnonnenmacher
Copy link
Member

mnonnenmacher commented Dec 19, 2022

To ensure consistency within a pipeline of the ORT tools it is important that configuration is only applied or replaced before it is used as input for a tool. For example, replacing package curations after running the scanner could lead to inconsistencies if the provenance metadata of a package is changed and as a result the scan result for that package does not match the provenance anymore.

A summary of the order in which tools should be run and configuration should be added to the result file to ensure consistency in later tools in the pipeline (package curations are split into technical "metadata corrections" and "legal curations" based on the idea from #6187):

Analyzer
-> package metadata corrections
-> Scanner (uses provenance corrections), Advisor (potentially uses identifier corrections, e.g. purl)
-> package configurations, issue resolutions, vulnerability resolutions, legal curations
-> Evaluator
-> rule violation resolutions
-> Reporter

@mnonnenmacher
Copy link
Member

Based on my above comment I propose that we extend the OrtResult model to contain the configuration used by each tool. I see two options:

  • Extend the section for each tool to contain the configuration consumed by this tool. For example, the AnalyzerRun could contain the list of curations, or the EvaluatorRun the lists of PackageConfigurations and resolutions used during the run.
  • Extend the OrtResult to contain a separate section which contains all of the configuration values. For example, the analyzer adds curations to this model and later tools like the scanner or advisor only use curations from there but not from any other sources.

For both solutions it would be possible to add commands that replace the existing configuration, for example one command could update the curations in an OrtResult and also check if it already contains data that depends on them, like a scan result, and in this case either fail or print a warning that the result might be inconsistent.

@oss-review-toolkit/core-devs Any preference which direction we should take?

@sschuberth
Copy link
Member Author

I'm having a hard time to decide / make up an opinion:

Extend the section for each tool to contain the configuration consumed by this tool.

On the one hand I like this as things that seemingly belong together are stored together, and we already have AnalyzerConfiguration etc. However, the question rises how to deal with configuration that might might be consumed by multiple tools.

Extend the OrtResult to contain a separate section which contains all of the configuration values.

This would work around the question how to deal with configuration used by different tools, and better match the also already existing RepositoryConfiguration (which in turn also contains an AnalyzerConfiguration). But what about the existing "top-level" AnalyzerConfiguration etc. then?

So we already have a mix of (primarily) storing configuration by origin, or by consumer. Maybe we should in this context also consider to make the "origin" of configuration a property of the configuration itself.

@mnonnenmacher
Copy link
Member

mnonnenmacher commented Dec 30, 2022

I'm having a hard time to decide / make up an opinion:

Extend the section for each tool to contain the configuration consumed by this tool.

On the one hand I like this as things that seemingly belong together are stored together, and we already have AnalyzerConfiguration etc. However, the question rises how to deal with configuration that might might be consumed by multiple tools.

Extend the OrtResult to contain a separate section which contains all of the configuration values.

This would work around the question how to deal with configuration used by different tools, and better match the also already existing RepositoryConfiguration (which in turn also contains an AnalyzerConfiguration). But what about the existing "top-level" AnalyzerConfiguration etc. then?

I have a tendency towards a top-level section for those properties, but I think we need a better name than "configuration" to separate it from the "tool configuration" in config.yml and .ort.yml. Above "corrections" was suggested, this works for curations and package configurations but not for resolutions. So maybe two top-level sections corrections and resolutions?

So we already have a mix of (primarily) storing configuration by origin, or by consumer. Maybe we should in this context also consider to make the "origin" of configuration a property of the configuration itself.

I agree, at least for corrections and resolutions. For example, for curations we should store the provider. But for the tool configurations this might not be possible, because it can come from multiple sources like environment variables, command line parameters or config.yml, and this could be different for each property.

To make a draft:

ort:
  ...
  corrections:
    packageCurations:
      ...
    packageConfigurations:
      ...
  resolutions:
    issues:
      ...
    ruleViolations:
      ...
    vulnerabilities:
      ...

@fviernau
Copy link
Member

fviernau commented Jan 12, 2023

I want to bring to attention the following use case relevant for support workflows, e.g. when using orthw:

  1. Make changes of curations in your config repository
  2. Recompute the reports

This IMO should be as fast as possible, so I fear if we

  1. Add a separate command to patch-up curations the turn-around time significantly increases
    for large ORT result files
  2. Only have the possibility to replace all curations including querying remote curation providers if one uses them.
    But to keep turn-around time low, one should be able to replace only the local curations but keep the ones queried from
    remote.

What do you think?

edit: I've decided to write this down to an issue, see oss-review-toolkit/orthw-shell#60.

@mnonnenmacher
Copy link
Member

I want to bring to attention the following use case relevant for support workflows, e.g. when using orthw:

1. Make changes of curations in your config repository

2. Recompute the reports

The problem with that workflow is that depending on the curation changes it could not be sufficient to only re-run the reporter, but it might be required to also re-run the advisor, scanner, and evaluator to get correct results.

This IMO should be as fast as possible, so I fear if we

1. Add a separate command to patch-up curations the turn-around time significantly increases
   for large ORT result files

I think the curations should still be fetched by the analyzer command to not always require an extra step to fetch curations. The benefit of the separate command is that can be run independent of which other tool needs to be run afterwards, so it could be used to fix a purl before running the advisor, to fix a VCS URL before running the scanner, and so on. But yes, it introduces the overhead of an additional serialization and deserialization.

2. Only have the possibility to replace all curations including querying remote curation providers if one uses them.
   But to keep turn-around time low, one should be able to replace only the local curations but keep the ones queried from
   remote.

To implement this correctly #5668 needs to be done first.

@fviernau
Copy link
Member

fviernau commented Jan 13, 2023

The problem with that workflow is that depending on the curation changes it could not be sufficient to only re-run the reporter, but it might be required to also re-run the advisor, scanner, and evaluator to get correct results.

That's only for provenance curations. The workflow still provides a lot of value, even when these provenance curations don't take effect. It targets: fixing issues using data which has already slowly been gathered (not re-gathering all data from scratch again to be up-to-date) .

@mnonnenmacher
Copy link
Member

The problem with that workflow is that depending on the curation changes it could not be sufficient to only re-run the reporter, but it might be required to also re-run the advisor, scanner, and evaluator to get correct results.

That's only for provenance curations. The workflow still provides a lot of value, even when these provenance curations don't take effect. It targets: fixing issues using data which has already slowly been gathered (not re-gathering all data from scratch again to be up-to-date) .

It's not only about provenance, for the advisor purl curations could be relevant and for the evaluator basically any curated property could affect rule violations. I somehow assumed that when you wrote "Recompute the reports" that this would also include at least re-running the the evaluator, because the results of only re-running the reporter after changing curations should be completely predictable. My point is, I'm fine if orthw supports only a specific use-case, but ORT should support all of them.

@fviernau
Copy link
Member

fviernau commented Jan 13, 2023

The use case for re-applying curations in the evaluator stage would work with the following approach in a general way which IMO would be quite nice

  1. The configuration contains an ID for each configured provider. E.g. the config for the providers get extended.
    The OrtResult file contains the ORT config, containing the providers configured on CI.
  2. User download scan-result.json from CI to fix issues with it
  3. User makes local changes to package curations dir
  4. User runs evaluator specifying the IDs of providers to re-apply
    a. The evaluator command looks-up the configuration corresponding to the provider ID by the local (not the CI) ORT
    configuration. This is necessary because the local configuration is different from the CI one. E.g. path of package curation dir differs. Also credential may differ.
    b. The evaluator fetches curations from all specified providers (IDs)
    c. The evaluator replaces curations for the specified provider IDs, but keeps all curations for providers
    whose ID was not specified
    d. Execute evaluation as usual.

The above method can be implemented in a separate dedicated command as well as in the evaluator, which should be possible without any code duplication.

Requirement: Each curation in the ORT file must be associated with a provider ID where it came from.

@fviernau
Copy link
Member

It's not only about provenance, for the advisor purl curations could be relevant and for the evaluator basically any curated property could affect rule violations. I somehow assumed that when you wrote "Recompute the reports" that this would also include at least re-running the the evaluator, because the results of only re-running the reporter after changing curations should be completely predictable. My point is, I'm fine if orthw supports only a specific use-case, but ORT should support all of them.

Agree that ORT should support all of them. My point was more that ORT should also keep supporting seeing the effect of local changes quickly, which has been so far possible with --package-curations-dir option at the evaluator. Anyhow, my above comment proposes a solution which is generic for replacing curations.

@fviernau
Copy link
Member

fviernau commented Jan 13, 2023

I believe the we should make a solution which works for solving [1][2][3] altogether.
This could be done by the following changes (primary idea is to introduce IDs for providers)

  1. Change package array in analyzer result to store uncurated package meta-data,
    store applicable curations separately and apply them on-the-fly.
  2. Extend the package curation provider configuration by an identifier.
    Note: Introducing an id makes sense for having different configuration for the same
    provider, e.g. on developer machine and on CI: different clone paths of ort-config
    or different credentials. So, ORT files downloaded from CI can be seamlessly used
    on the local machine, e.g. download scan-result.json and re-apply curations for
    specified provider IDs.
    The configuration in ~/.ort/config/config.yml of a provider could then look like this:
packageCurationProviders:
- name: File
  config:
    id: 'ort-ort-config'
    path: '~/devel/ort-config'
- name: File
  config:
    id: 'my-org-ort-config'
    path: '~/devel/my-org-ort-config'
  1. ORT file keeps association between curations and provider ID
  • option 1: List of { providerId, entries[] }
ort:
  ...
  corrections:
    packageCurations:
    - providerId: my-org-ort-config
      entries:
      - 
      ...
    - providerId: ort-ort-config
      entries:
      -
      - 
      ...
  • option 2: Use map: providerId -> curation[]
ort:
  ...
  corrections:
    packageCurations:
    - my-org-ort-config:
      - 
      ...
    - ort-ort-config:
      -
      - 
      ...
  • option 3: Add a providerId property to all curation entries
ort:
  ...
  corrections:
    packageCurations:
    - id: "Maven:example:project:1.00"
      providerId: "my-org-ort-config"
      concludedLicense: "NONE"
  - id:
    ....
  1. Add a dedicated command for re-applying curations:
    1. Drop all curations for given IDs, keep the others
    2. Look-up providers by ID in ~/.ort/config/config.yml and create them
    3. Query providers and add results to the ORT file

[1] oss-review-toolkit/orthw-shell#60
[2] #5668
[3] #5637

Note: I'm not certain whether corrections fit packageConfigurations.pathExcludes. We may
consider dropping corrections and moving the children one level up.

@mnonnenmacher
Copy link
Member

Proposed ORT result model as discussed in the developer meeting:

ort:
  resolvedConfiguration:
    packageCurations:
      providers:
      - id: clearly-defined
        metadata:
          serverUrl: https://...
          revision: abc
      - id: local-file
        metadata:
          filename: curations.yml
      data:
        clearly-defined:
        - curation1
        - curation2
        local-file:
        - curation1
    packageConfigurations:
      ...
    issueResolutions:
      ...
    ruleViolationResolutions:
      ...
    vulnerabilityResolutions:
      ...

The metadata would come from a new function PackageCurationProvider.getMetadata(): Map<String, String>.

My preference for setting the id in config.yml would be:

packageCurationProviders:
- name: File
  id: 'ort-ort-config'
  config:
    path: '~/devel/ort-config'
- name: File
  id: 'my-org-ort-config'
  config:
    path: '~/devel/my-org-ort-config'

@fviernau fviernau self-assigned this Jan 16, 2023
fviernau added a commit that referenced this issue Jan 27, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 27, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 27, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 27, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 30, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 30, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 30, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 30, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 30, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Jan 31, 2023
The `OrtResult` does not store the uncurated packages as part of the
analyzer result, but only the curated packages along with the applied
package curation data.

This tightly couples the curations with the analyzer without need, because
the analyzer does not need (to consume) any curations at all. Also,
computing the respective uncurated package from each curated package is
not always possible due to missing data [1]. So, curations currently
cannot properly be (re-applied) without re-running the analyzer [2].
Furthermore, the current representation stores package curation data
redundantly in case the curation applies to multiple packages.

Given that, it makes sense to store the curations separately from the
uncurated package. So, utilize the new toplevel `resolvedConfiguration`
to store the package curations and change the analyzer result to contain
uncurated instead of curated packages.

Note that this partially implements [1] and [2]. Adjusting the logic
which turns curated into uncurated packages, e.g.
`toUncuratedPackage()`, is left for a future change to limit the size of
this change. Apart from that [3] can be implemented by relatively easily
without redundantly encoding the provider (for each curation data).

[1] #5637
[2] #6188
[3] #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 8, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 13, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 13, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 16, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [1] and also mentioned in [2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[1]: #6188 (comment)
[2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit that referenced this issue Feb 16, 2023
Extend `ResolvedConfiguration` to associate the curations with the
ID of the package curation provider to enable tracability of curations
back to the provider. The separate `ResolvedConfiguration.provider` list
is introduced to align with the idea of adding provider metadata, as
outlined in [^1] and also mentioned in [^2].

This implementation also is a first step towards use cases involving:

1. Replacing the curations for a given provider ID with the given ones.
2. Re-resolve curations only for a particular provider ID.

Both are left as TODO for future changes to limit the size of this
change, while for 1. a TODO comment is left in the code.

Fixes #5668.

[^1] #6188 (comment)
[^2]: #5668

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@sschuberth
Copy link
Member Author

@mnonnenmacher @fviernau I lost a bit track of this since we've merged the resolved configuration stuff / the new way of storing curations in the ORT result.

But going forward, how exactly do we plan to apply updated curations? IMO the partly implemented current approach of curation-override options per tool does not scale. I'd much more like to see a tool / command that can update the curations in a result file, and then that updated result file can be passed to other tools without specifying any override options.

sschuberth added a commit that referenced this issue May 17, 2023
This option can be used to rather quickly check whether packages from an
analyzer result can be downloaded without actually running the scanner /
downloader. As such the option can also be used to more quickly verify
curations after (re-)applying them to the analyzer result. For the
latter, a proper solution yet needs to be implemented, see [1].

Note that the implementation is not complete yet. E.g. not all cases
where a real download would succeed can be verified, as guessing
revisions while keeping downloads to a minimum is difficult to
implement for a dry run.

[1]: #6188

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
sschuberth added a commit that referenced this issue May 17, 2023
This option can be used to rather quickly check whether packages from an
analyzer result can be downloaded without actually running the scanner /
downloader. As such the option can also be used to more quickly verify
curations after (re-)applying them to the analyzer result. For the
latter, a proper solution yet needs to be implemented, see [1].

Note that the implementation is not complete yet. E.g. not all cases
where a real download would succeed can be verified, as guessing
revisions while keeping downloads to a minimum is difficult to
implement for a dry run.

[1]: #6188

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
@mnonnenmacher
Copy link
Member

mnonnenmacher commented May 22, 2023

@mnonnenmacher @fviernau I lost a bit track of this since we've merged the resolved configuration stuff / the new way of storing curations in the ORT result.

But going forward, how exactly do we plan to apply updated curations? IMO the partly implemented current approach of curation-override options per tool does not scale. I'd much more like to see a tool / command that can update the curations in a result file, and then that updated result file can be passed to other tools without specifying any override options.

Currently my preferred approach would be to introduce a new ORT CLI command like resolve-configuration, but I'm open for other ideas. Such a command could provide options to re-resolve all contained resolved configurations or resolve only specific parts of the resolved configuration.

@sschuberth
Copy link
Member Author

Currently my preferred approach would be to introduce a new ORT CLI command

Ok, good, so we're in line about having a new command.

but I'm open for other ideas.

We already have the config subcommand, and even if that currently only deals with global configuration, should we maybe also bundle resolved config stuff there to not get too many config-related subcommands?

@mnonnenmacher
Copy link
Member

We already have the config subcommand, and even if that currently only deals with global configuration, should we maybe also bundle resolved config stuff there to not get too many config-related subcommands?

It can make the command difficult to use and implement if it can be used for two different things. For example, which options are relevant for which use case?

@sschuberth
Copy link
Member Author

For example, which options are relevant for which use case?

That could be solved via prefixes to options (though I agree that might not be the nicest user experience), or we could use sub-subcommands, like the helper-cli already does.

sschuberth added a commit that referenced this issue May 23, 2023
This option can be used to rather quickly check whether packages from an
analyzer result can be downloaded without actually running the scanner /
downloader. As such the option can also be used to more quickly verify
curations after (re-)applying them to the analyzer result. For the
latter, a proper solution yet needs to be implemented, see [1].

Note that the implementation is not complete yet. E.g. not all cases
where a real download would succeed can be verified, as guessing
revisions while keeping downloads to a minimum is difficult to
implement for a dry run.

[1]: #6188

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
sschuberth added a commit that referenced this issue May 23, 2023
This option can be used to rather quickly check whether packages from an
analyzer result can be downloaded without actually running the scanner /
downloader. As such the option can also be used to more quickly verify
curations after (re-)applying them to the analyzer result. For the
latter, a proper solution yet needs to be implemented, see [1].

Note that the implementation is not complete yet. E.g. not all cases
where a real download would succeed can be verified, as guessing
revisions while keeping downloads to a minimum is difficult to
implement for a dry run.

[1]: #6188

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
sschuberth added a commit that referenced this issue May 23, 2023
This option can be used to rather quickly check whether packages from an
analyzer result can be downloaded without actually running the scanner /
downloader. As such the option can also be used to more quickly verify
curations after (re-)applying them to the analyzer result. For the
latter, a proper solution yet needs to be implemented, see [1].

Note that the implementation is not complete yet. E.g. not all cases
where a real download would succeed can be verified, as guessing
revisions while keeping downloads to a minimum is difficult to
implement for a dry run.

[1]: #6188

Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements model About the data model
Projects
None yet
Development

No branches or pull requests

3 participants