Introduce ORT result scanner infrastructure #5325

porsche-rbieniek · 2022-05-04T15:53:29Z

This pull request is meant to provide a functional overview of what changes in the ORT infrastrcture are required to allow the integration of ORT with the (commercial) BlackDuck tool as a backend scanner.

From a high-level perspective, we need an infrastructural way to integrate a scanner which is not working on a per-package level, like scancode does, but works on the level of an analyzer result.

We have built the Blackduck integration in a way that we organize all projects in an ORT analyzer result as projects grouped in a project group. A project group is a Blackduck-specific concept of a container grouping individual projects.

In order to make this logic work, we need to work on the level of an analyzer result which where we descend through the projects and its dependencies.
We then build the Blackduck scan input per project, create the project groups and the projects in Blackduck, ship the dependencies and start the processing on Blackduck.
Once the scanning is complete (on Blackduck), we retrieve the results for all projects and create the ORT scan result in one step.

We are well aware of the on-going changes in ORT in the area of experimental scanner support (as discussed in the ORT developer meeting). Therefore we submit this pull request as a technology demo to show what support we need from the ORT infrastructure.

Once the additional infrastructure is in place, we're happy to supply another pull request to provide the Blackduck inegration

As explained in the issue oss-review-toolkit#5324, we need a scanner with the capability to access and autonomously process the full ORT result into a scan result structure. This pull request gives an insight on how we solved this requirement using the current ORT infrastructure. This has to be seen as an example and a base for further discussion on the ORT developer community. Signed-off-by: Rainer Bieniek <extern.rainer.bieniek@porsche.de>

mnonnenmacher · 2022-06-14T17:24:50Z

@porsche-rbieniek What data exactly do you need to send to Blackduck? Does ORT (1) need to download the source code of the projects and packages and upload it to Blackduck, or (2) send the identifiers and source code location (VCS, source artifact URL) of the projects and packages to Blackduck and Blackduck downloads the source code on its own?

porsche-rishisaxena · 2022-06-23T08:59:48Z

@mnonnenmacher
Once the analyzer-result.yml is generated by running the analyzer from ORT on the code repo. The analyzer-result.yml data in terms of dependency graph is transformed to be able to send API calls/request is sent to Blackduck.
The software library and its version is looked up into the Blackduck database returning the SPDX, License Text, Copyright information in the response.
Once we have got the response, the data against software library is consolidated and transformed for passing this information to evaluator stage and rest is followed as standard to get the reports from ORT itself.

Note: Blackduck is not downloading any software library but checking against its own storage to return the information on the SPDX License Name, License Text and Copyright (C) information. Authors are not part of Blackduck.

CC: @porsche-rbieniek

mnonnenmacher · 2022-06-23T12:07:03Z

@mnonnenmacher Once the analyzer-result.yml is generated by running the analyzer from ORT on the code repo. The analyzer-result.yml data in terms of dependency graph is transformed to be able to send API calls/request is sent to Blackduck.

Do you only send the dependencies or also the projects? And do you have to send them in bulk in a single request, or can it be one request per dependency?

The software library and its version is looked up into the Blackduck database returning the SPDX, License Text, Copyright information in the response. Once we have got the response, the data against software library is consolidated and transformed for passing this information to evaluator stage and rest is followed as standard to get the reports from ORT itself.

Does the data fit into the ORT scan result model? E.g. do you get license and copyright data per file, or only for the whole library?

porsche-rishisaxena · 2022-06-23T13:17:56Z

@mnonnenmacher

Do you only send the dependencies or also the projects?

Project and Dependencies

And do you have to send them in bulk in a single request, or can it be one request per dependency?

one request per dependency

Does the data fit into the ORT scan result model? E.g. do you get license and copyright data per file, or only for the whole library?

Yes, the data fits the scan-result model
we are getting the license and recently (last week) we found how to extract copyright(c) information (separate call) which we are integrating with the data set.
The license and Copyright is bound to the library and not to the file level

I am attaching a report screen-shot for your kind review:

@porsche-rbieniek

mnonnenmacher · 2022-06-23T16:03:39Z

1. Do you only send the dependencies or also the projects?

* Project and Dependencies

2. And do you have to send them in bulk in a single request, or can it be one request per dependency?

* one request per dependency

In general, any new scanner implementation should be done by implementing the new ScannerWrapper interface instead of extending the Scanner class. The reason is that Scanner will be replaced with the ExperimentalScanner soon, because the ExperimentalScanner is now proven to be production ready.
Based on your answer it seems that this matches what PackageScannerWrapper was designed for. Implementations of this interface are called once per package and project, so this also makes sure that if multiple projects depend on the same package, the API is called only once for this package. Also, ORT does not download any source code for such scanners which fits the use case.
I expect you will need to provide credentials for the BlackDuck API, you can have a look at the FossId and related FossIdConfig classes for an example for how this can be implemented. Also, I think it does not make sense to store scan results created by this scanner in a ScanStorage as they can simply be retrieved from the API again, this can be achieved by setting criteria to null in the implementation.
Please have a look at PackageScannerWrapper and check if the scanPackage() function retrieves all information you need to send to the BlackDuck API.

sschuberth · 2024-06-10T12:00:28Z

Therefore we submit this pull request as a technology demo to show what support we need from the ORT infrastructure.

Once the additional infrastructure is in place, we're happy to supply another pull request to provide the Blackduck inegration

Given that this was submitted only as a demo, and there have been no updates / follow-up PR implementing @mnonnenmacher's comments, this is getting closed as part of backlog grooming. Feel free to comment if you would like to contribute to this.

porsche-rbieniek requested a review from a team as a code owner May 4, 2022 15:53

porsche-rbieniek mentioned this pull request May 4, 2022

Infrastructure for scanning a complete ORTResult #5324

Closed

mnonnenmacher marked this pull request as draft May 9, 2022 07:29

mnonnenmacher changed the title ~~Introduce ORT result scanner infrastructre~~ Introduce ORT result scanner infrastructure Jun 14, 2022

Etsija mentioned this pull request Jun 16, 2023

[DO-40] Describe how to attach DOS to the ORT process doubleopen-project/dos#88

Closed

sschuberth closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ORT result scanner infrastructure #5325

Introduce ORT result scanner infrastructure #5325

porsche-rbieniek commented May 4, 2022 •

edited

Loading

mnonnenmacher commented Jun 14, 2022 •

edited

Loading

porsche-rishisaxena commented Jun 23, 2022

mnonnenmacher commented Jun 23, 2022

porsche-rishisaxena commented Jun 23, 2022

mnonnenmacher commented Jun 23, 2022

sschuberth commented Jun 10, 2024

Introduce ORT result scanner infrastructure #5325

Introduce ORT result scanner infrastructure #5325

Conversation

porsche-rbieniek commented May 4, 2022 • edited Loading

mnonnenmacher commented Jun 14, 2022 • edited Loading

porsche-rishisaxena commented Jun 23, 2022

mnonnenmacher commented Jun 23, 2022

porsche-rishisaxena commented Jun 23, 2022

mnonnenmacher commented Jun 23, 2022

sschuberth commented Jun 10, 2024

porsche-rbieniek commented May 4, 2022 •

edited

Loading

mnonnenmacher commented Jun 14, 2022 •

edited

Loading