Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unmanaged file checks #3145

Merged
merged 1 commit into from
May 3, 2022
Merged

Improve unmanaged file checks #3145

merged 1 commit into from
May 3, 2022

Conversation

sschuberth
Copy link
Member

Please have a look at the individual commit messages for the details.

analyzer/src/main/kotlin/managers/Unmanaged.kt Outdated Show resolved Hide resolved
@@ -40,6 +40,7 @@ import org.ossreviewtoolkit.model.VcsInfo
import org.ossreviewtoolkit.model.config.AnalyzerConfiguration
import org.ossreviewtoolkit.model.config.RepositoryConfiguration
import org.ossreviewtoolkit.model.readValue
import org.ossreviewtoolkit.spdx.VCS_DIRECTORIES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe doing this commit as-is is wrong. Let's consider an example:

project/package.json project/src/include/x.h project/src/include/x.cpp
If the above is a native project (where we do not support any package manager) applying this commit would eliminate the unmnaged project. All files would be associated with the package.json. Then you want to exclude the package.json, because the project is native only. If you do so, you effectively exclude also all native code as it gets only associated with the package-json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's correct that before this PR an unmanaged project would have been created, the project/src/include/x.h and project/src/include/x.cpp files from your example would not be associated to it, but to the NPM project defined by project/package.json. In general, every file is associated to the "closest" project (defined by a definition file) when walking along the parents in directory tree.

So when you say

All files would be associated with the package.json.

then that's correct, but unless I'm terribly mistaken, it's exactly how it was before.

Copy link
Member

@fviernau fviernau Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not be associated to it, but to the NPM project defined by project/package.json.

Are you sure this is true? I believe the files are assigned to both projects, unmanaged and project/package.json.

In general, every file is associated to the "closest" project

mind sharing a reference for this?

then that's correct, but unless I'm terribly mistaken, it's exactly how it was before.

I believe you are mistaken, but it could also be me. Let's figure it out. I should look into it again myself, but I have some urgent things right now todo.

Copy link
Member

@mnonnenmacher mnonnenmacher Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all files are associated to all definition files found in their parent directories. For example if you have:

- README.md
- a
  - build.gradle
  - b
    - package.json
    - c
      - main.cpp

Then main.cpp would be associated with the Unmanaged, Gradle and NPM projects, because we never filter subdirectories that contain other projects. I have suggested we do that at least for Unmanaged in the past, because the idea of this fake package manager was to make sure that also files that cannot be associated to a project get scanned, but it was never implemented.

@fviernau The scenario you describe above would be a weird project setup, but if the package.json was in the root directory you would have exactly the same issue already with the existing code. A workaround could be to not exclude the project but only all dependency scopes, and I wonder if that would not be sufficient, because Unmanaged was never intended to solve the problem you describe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you describe above would be a weird project setup

because?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you describe above would be a weird project setup

because?

Because you shouldn't put *.h / *.cpp files which are C++ files in the directory tree of an (unrelated) NPM project where only *.js files are expected. Or is this some funky-special NPM project that implements a native addon and the *.h / *.cpp files are in fact part of the NPM project?

Copy link
Member

@fviernau fviernau Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you shouldn't put *.h / *.cpp files which are C++ files in the directory tree of an (unrelated) NPM project where only *.js files are expected. Or is this some funky-special NPM project that implements a native addon and the *.h / *.cpp files are in fact part of the NPM project?

Agree that one shouldn't do that setup (probably) but should ORT really say that it doesn't support such setups ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that one shouldn't do that setup (probably) but should ORT really say that it doesn't support such setups ?

Actually, I indeed would be fine with ORT not supporting such setups. Also in other areas (like maintaining metdata) we leverage ORT to foster best engineering practices. So why not here?

Anyway, that was also not your original question; your original question was why it's weird. And you seem to agree it's weird now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you describe above would be a weird project setup

because?

Because why do you have a package.json lying around in your native code project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not be associated to it, but to the NPM project defined by project/package.json.

Are you sure this is true? I believe the files are assigned to both projects, unmanaged and project/package.json.

For reference, turns out that @fviernau was right, and scan findings are not only associated to the "closest" project, but to all projects on the way up to the root... also see #5365.

@sschuberth
Copy link
Member Author

I've split out the uncontroversial changes to #3377.

@sschuberth
Copy link
Member Author

@fviernau I've found a bug in the implementation and have slightly rewritten it for more clarity. Please have another look.

@fviernau
Copy link
Member

fviernau commented May 19, 2021

I gave this some thought again. The only issue I found so far is related to the semantics of OrtResult.isExcluded() and it's use in the notice template.

Currently the semantics is: "A project is excluded if and only if the definition file is excluded". See [1], [2].

This means that if I only exclude a the definition file of a project, but not the > 0 additionally associated files,
then the licenses of the project files will not be included in the notice file (assuming the are not associated to also with another project like the "Unknown" one), see [3], [4].

This exact situation I believe may break the NOTICE files in our internal setup.

I haven't checked the evaluator but I'd suspect an analog issue there.

What do you think @sschuberth @mnonnenmacher ?

Note: I believe it should be possible to exclude a definition file without excluding all project files. So, change the implementation so that a (project) file should be included in the notices if there is no matching path exclude for it.

[1] https://github.com/oss-review-toolkit/ort/blob/master/model/src/main/kotlin/OrtResult.kt#L111-L114
[2]

val definitionFilePath = ortResult.getDefinitionFilePathRelativeToAnalyzerRoot(project)

[3]
val excluded: Boolean by lazy { input.ortResult.isExcluded(id) }

[4]
Excluded projects and packages are ignored.

@codecov-commenter
Copy link

codecov-commenter commented May 20, 2021

Codecov Report

Merging #3145 (7a185fb) into main (f8f4192) will decrease coverage by 10.69%.
The diff coverage is n/a.

❗ Current head 7a185fb differs from pull request most recent head 0d903e4. Consider uploading reports for the commit 0d903e4 to get more accurate results

@@              Coverage Diff              @@
##               main    #3145       +/-   ##
=============================================
- Coverage     54.62%   43.92%   -10.70%     
+ Complexity     1504      513      -991     
=============================================
  Files           260      186       -74     
  Lines         13899     9584     -4315     
  Branches       1960     1467      -493     
=============================================
- Hits           7592     4210     -3382     
+ Misses         5581     4902      -679     
+ Partials        726      472      -254     
Impacted Files Coverage Δ
...lients/fossid-webapp/src/main/kotlin/model/Scan.kt 4.34% <0.00%> (-91.66%) ⬇️
...p/src/main/kotlin/model/result/FossIdScanResult.kt 10.00% <0.00%> (-90.00%) ⬇️
...in/model/identification/identifiedFiles/License.kt 12.50% <0.00%> (-87.50%) ⬇️
...in/model/identification/markedAsIdentified/File.kt 12.50% <0.00%> (-87.50%) ⬇️
...-webapp/src/main/kotlin/model/status/ScanStatus.kt 14.28% <0.00%> (-85.72%) ⬇️
...c/main/kotlin/scanners/fossid/FossIdScanResults.kt 0.00% <0.00%> (-84.22%) ⬇️
...in/kotlin/model/identification/common/Component.kt 8.33% <0.00%> (-83.98%) ⬇️
scanner/src/main/kotlin/scanners/fossid/FossId.kt 0.00% <0.00%> (-72.57%) ⬇️
scanner/src/main/kotlin/scanners/Licensee.kt 12.00% <0.00%> (-69.25%) ⬇️
downloader/src/main/kotlin/vcs/Mercurial.kt 23.52% <0.00%> (-63.27%) ⬇️
... and 300 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8f4192...0d903e4. Read the comment docs.

@mnonnenmacher
Copy link
Member

@sschuberth Unfortunately we would like to put this on hold from HERE side until we have found another solution for a special case where we have to exclude definition files, but must not exclude the source code from the same directory. Currently the Unmanaged project acts as a workaround for that.

@sschuberth sschuberth added analyzer About the analyzer tool on hold Pull requests that cannot currently be merged labels May 21, 2021
@sschuberth sschuberth force-pushed the improve-unmanaged branch 3 times, most recently from 8ff2d46 to 5dbbe2f Compare June 22, 2021 16:37
@sschuberth
Copy link
Member Author

As a side note, this PR also fixes the case where two projects (a real one and an unmanaged one) are created when analyzing only a single definition file. I.e.

    projects:
    - id: "PIP:::"
      definition_file_path: "analyzer/src/funTest/assets/projects/synthetic/pip/requirements.txt"
      declared_licenses: []
      declared_licenses_processed: {}
      vcs:
        type: ""
        url: ""
        revision: ""
        path: ""
      vcs_processed:
        type: "Git"
        url: "https://github.com/oss-review-toolkit/ort.git"
        revision: "41bbe0e10988f3761dc5075fc11899d675360734"
        path: "analyzer/src/funTest/assets/projects/synthetic/pip"
      homepage_url: ""
      scope_names: []
    - id: "Unmanaged::ort:41bbe0e10988f3761dc5075fc11899d675360734"
      definition_file_path: ""
      declared_licenses: []
      declared_licenses_processed: {}
      vcs:
        type: ""
        url: ""
        revision: ""
        path: ""
      vcs_processed:
        type: "Git"
        url: "https://github.com/oss-review-toolkit/ort.git"
        revision: "41bbe0e10988f3761dc5075fc11899d675360734"
        path: "analyzer/src/funTest/assets/projects/synthetic/pip/requirements.txt"
      homepage_url: ""
      scope_names: []

becomes just

    projects:
    - id: "PIP:::"
      definition_file_path: "analyzer/src/funTest/assets/projects/synthetic/pip/requirements.txt"
      declared_licenses: []
      declared_licenses_processed: {}
      vcs:
        type: ""
        url: ""
        revision: ""
        path: ""
      vcs_processed:
        type: "Git"
        url: "https://github.com/oss-review-toolkit/ort.git"
        revision: "8ff2d460412785e8374df4d093b12dbf8ecfc1db"
        path: "analyzer/src/funTest/assets/projects/synthetic/pip"
      homepage_url: ""
      scope_names: []

@sschuberth sschuberth removed the on hold Pull requests that cannot currently be merged label May 3, 2022
@sschuberth sschuberth dismissed fviernau’s stale review May 3, 2022 14:10

Concerns are probably obsolete by now.

@sschuberth
Copy link
Member Author

@mnonnenmacher please have a look again.

analyzer/src/main/kotlin/Analyzer.kt Show resolved Hide resolved
analyzer/src/main/kotlin/Analyzer.kt Outdated Show resolved Hide resolved
So far, if the directory structure was like

root-dir
 |
 +-.git
 |
 +-project-dir

still an unmanaged project would have been created because no definition
file is in the root directory, and the project directory is regarded as an
unmanaged "file", even though there are not really any unmanaged files in
the root.

Fix this by a introducing a check that removes all managed paths (or
ignored files) from all files in the root.

Signed-off-by: Sebastian Schuberth <sebastian.schuberth@bosch.io>
@mnonnenmacher mnonnenmacher added release notes Changes that should be mentioned in release notes breaking change Pull requests that break compatibility with previous behavior labels May 3, 2022
@mnonnenmacher
Copy link
Member

I have marked this with the breaking change label, because it could lead to unexpected results if the ort.yml contains configuration for the unmanaged project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool breaking change Pull requests that break compatibility with previous behavior release notes Changes that should be mentioned in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants