Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ORT result scanner infrastructure #5325

Conversation

porsche-rbieniek
Copy link

@porsche-rbieniek porsche-rbieniek commented May 4, 2022

This pull request is meant to provide a functional overview of what changes in the ORT infrastrcture are required to allow the integration of ORT with the (commercial) BlackDuck tool as a backend scanner.

From a high-level perspective, we need an infrastructural way to integrate a scanner which is not working on a per-package level, like scancode does, but works on the level of an analyzer result.

We have built the Blackduck integration in a way that we organize all projects in an ORT analyzer result as projects grouped in a project group. A project group is a Blackduck-specific concept of a container grouping individual projects.

In order to make this logic work, we need to work on the level of an analyzer result which where we descend through the projects and its dependencies.
We then build the Blackduck scan input per project, create the project groups and the projects in Blackduck, ship the dependencies and start the processing on Blackduck.
Once the scanning is complete (on Blackduck), we retrieve the results for all projects and create the ORT scan result in one step.

We are well aware of the on-going changes in ORT in the area of experimental scanner support (as discussed in the ORT developer meeting). Therefore we submit this pull request as a technology demo to show what support we need from the ORT infrastructure.

Once the additional infrastructure is in place, we're happy to supply another pull request to provide the Blackduck inegration

As explained in the issue oss-review-toolkit#5324, we need a scanner with the capability to access and autonomously process the full ORT result into a scan result structure.

This pull request gives an insight on how we solved this requirement using the current ORT infrastructure. This has to be seen as an example and a base for further discussion on the ORT developer community.

Signed-off-by: Rainer Bieniek <extern.rainer.bieniek@porsche.de>
@mnonnenmacher
Copy link
Member

mnonnenmacher commented Jun 14, 2022

@porsche-rbieniek What data exactly do you need to send to Blackduck? Does ORT (1) need to download the source code of the projects and packages and upload it to Blackduck, or (2) send the identifiers and source code location (VCS, source artifact URL) of the projects and packages to Blackduck and Blackduck downloads the source code on its own?

@mnonnenmacher mnonnenmacher changed the title Introduce ORT result scanner infrastructre Introduce ORT result scanner infrastructure Jun 14, 2022
@porsche-rishisaxena
Copy link

@mnonnenmacher
Once the analyzer-result.yml is generated by running the analyzer from ORT on the code repo. The analyzer-result.yml data in terms of dependency graph is transformed to be able to send API calls/request is sent to Blackduck.
The software library and its version is looked up into the Blackduck database returning the SPDX, License Text, Copyright information in the response.
Once we have got the response, the data against software library is consolidated and transformed for passing this information to evaluator stage and rest is followed as standard to get the reports from ORT itself.

Note: Blackduck is not downloading any software library but checking against its own storage to return the information on the SPDX License Name, License Text and Copyright (C) information. Authors are not part of Blackduck.

CC: @porsche-rbieniek

@mnonnenmacher
Copy link
Member

@mnonnenmacher Once the analyzer-result.yml is generated by running the analyzer from ORT on the code repo. The analyzer-result.yml data in terms of dependency graph is transformed to be able to send API calls/request is sent to Blackduck.

Do you only send the dependencies or also the projects? And do you have to send them in bulk in a single request, or can it be one request per dependency?

The software library and its version is looked up into the Blackduck database returning the SPDX, License Text, Copyright information in the response. Once we have got the response, the data against software library is consolidated and transformed for passing this information to evaluator stage and rest is followed as standard to get the reports from ORT itself.

Does the data fit into the ORT scan result model? E.g. do you get license and copyright data per file, or only for the whole library?

@porsche-rishisaxena
Copy link

@mnonnenmacher

  1. Do you only send the dependencies or also the projects?
  • Project and Dependencies
  1. And do you have to send them in bulk in a single request, or can it be one request per dependency?
  • one request per dependency

Does the data fit into the ORT scan result model? E.g. do you get license and copyright data per file, or only for the whole library?

  • Yes, the data fits the scan-result model
  • we are getting the license and recently (last week) we found how to extract copyright(c) information (separate call) which we are integrating with the data set.
  • The license and Copyright is bound to the library and not to the file level

I am attaching a report screen-shot for your kind review:
image

@porsche-rbieniek

@mnonnenmacher
Copy link
Member

1. Do you only send the dependencies or also the projects?

* Project and Dependencies

2. And do you have to send them in bulk in a single request, or can it be one request per dependency?

* one request per dependency

In general, any new scanner implementation should be done by implementing the new ScannerWrapper interface instead of extending the Scanner class. The reason is that Scanner will be replaced with the ExperimentalScanner soon, because the ExperimentalScanner is now proven to be production ready.
Based on your answer it seems that this matches what PackageScannerWrapper was designed for. Implementations of this interface are called once per package and project, so this also makes sure that if multiple projects depend on the same package, the API is called only once for this package. Also, ORT does not download any source code for such scanners which fits the use case.
I expect you will need to provide credentials for the BlackDuck API, you can have a look at the FossId and related FossIdConfig classes for an example for how this can be implemented. Also, I think it does not make sense to store scan results created by this scanner in a ScanStorage as they can simply be retrieved from the API again, this can be achieved by setting criteria to null in the implementation.
Please have a look at PackageScannerWrapper and check if the scanPackage() function retrieves all information you need to send to the BlackDuck API.

@sschuberth
Copy link
Member

Therefore we submit this pull request as a technology demo to show what support we need from the ORT infrastructure.

Once the additional infrastructure is in place, we're happy to supply another pull request to provide the Blackduck inegration

Given that this was submitted only as a demo, and there have been no updates / follow-up PR implementing @mnonnenmacher's comments, this is getting closed as part of backlog grooming. Feel free to comment if you would like to contribute to this.

@sschuberth sschuberth closed this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants