Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluator: Add support for "repolinter" style rules #5621

Closed
tsteenbe opened this issue Jul 28, 2022 · 8 comments
Closed

Evaluator: Add support for "repolinter" style rules #5621

tsteenbe opened this issue Jul 28, 2022 · 8 comments
Assignees
Labels
enhancement Issues that are considered to be enhancements evaluator About the evaluator tool

Comments

@tsteenbe
Copy link
Member

tsteenbe commented Jul 28, 2022

Goal: Have ORT better support common contribution process checks

Usage scenarios:

  • Check community file such as CONTRIBUTING.md, Code of Conduct, CODEOWNERS, pull request templates are present in your open source projects
  • Check if a project has a 'docs' and 'example' directories
  • Check if each C/C++ project follow agreed up ways of working scheme such as
    • All open source C/C++ code is included in an 'external' directory.
    • Within the external directory each package is contain within its own directory e.g. '/external/curl'
    • Each package directory has package.spdx.yml and LICENSE
    • Package known only for testing such as GoogleTest have been excluded via path exclude
    • Throw deprecation policy violation for deprecated package /external/legacy-sdk
  • Checking Git commits for organization's secrets or naming
  • Check if a CI is used

Came up with below API, inspired by https://github.com/todogroup/repolinter/blob/main/docs/rules.md.

  • hasCI() checks for presence of known CI files within project being scanned such as .gitlab-ci.yml or .github/workflows
  • hasMoreContributorsThan(integer) checks the number of unique names in version control log (Git log) of the project being scanned.
  • hasDirectory([glob/regex?]) checks for presence of directory within project being scanned
  • hasFile([glob/regex?]) to check for the presence of LICENSE, CONTRIBUTING.md, CHANGELOG.md, SECURITY.md, SUPPORT.md, NOTICE, CODE-OF-CONDUCT and README e.g. make sure the FOSS project has minimum files present to clarify its governance. I would like to be able to write a policy rule to check if project’s license is APACHE-2.0 then a NOTICE file must be present
  • hasFileWithContents([filepath glob/regex?, contents regex?]) checks if the contents of a file matches a given regular expression. Useful for searching for internal names or urls.
  • hasFileWithExtension([glob/regex?]) checks for presence of file with certain extensions within project being scanned e.g. hasFileWithExtension(‘.ttf’) to search for font files. See also https://github.com/todogroup/repolinter/blob/main/docs/rules.md#file-starts-with
  • hasFileWithLicense([filepath glob/regex, spdx license id]) to check for presence of file with certain license within project being scanned
  • usesGit() checks whether the project being scanned is managed by Git.
  • usesGitSubModules() checks whether the project being scanned uses Git sub modules.
  • usesSVN() checks whether the project being scanned is managed by Subversion.
  • usesMercurial() checks whether the project being scanned is managed by Mercurial.
  • hasCommitContains(contents regex?) checks if the contents of a version control log (Git history) matches a given regular expression for a project being scanned. Useful for searching for internal names or URLs.
@tsteenbe tsteenbe added enhancement Issues that are considered to be enhancements evaluator About the evaluator tool labels Jul 28, 2022
@tsteenbe
Copy link
Member Author

tsteenbe commented Jul 28, 2022

One might ask why not simply use repolinter as tool instead of implemented our own version.

My response would be:

  • Plan to use repolinter-style checks in combination with existing our existing policy rules based on labels (e.g. using product/delivery context)
  • Want to use ORT's HowToFix message to give dev teams guidance
  • Don't want to introduce a dependency on repolinter e.g. require ruby for simple logic
  • Having all policy rules in one file is much nicer/simpler user experience

@fviernau
Copy link
Member

@tsteenbe can we add as bullet point to use scenarios: checks for the project, out of scope "packages"?
And thus, also remove the bullet point "Check if new package have a minimum number of contributors"?

@fviernau
Copy link
Member

fviernau commented Aug 1, 2022

@tsteenbe a couple of questions to the use cases / API in the ticket description:

  1. "All open source C/C++ code is included in an 'external' directory."
  • How would you envision the logic for determining whether a source file / dir is "Open Source"?
  1. "Throw deprecation policy violation for deprecated package /external/legacy-sdk"
  • I presume the rule needs a list of deprecated packages. What are the requirements for such entry?
    • list of dirs: Assumes a specific dir a submodule is mounted in, which can break.
    • list of VCS URLs: Maybe too coarse grained.
    • based on the packages, e.g. SPDX packages. Assumption SPDX files are setup and trustworthy.
    • above does not even consider versions yet.
  1. "hasCommitContains(contents regex?)"
  • Can you please provide some real world use case for this?
  1. "Checking Git commits for organization's secrets or naming"
  • Do you have in mind anything in particular for the checks?

@tsteenbe
Copy link
Member Author

tsteenbe commented Aug 1, 2022

How would you envision the logic for determining whether a source file / dir is "Open Source"?

My assumption is that each code file in my organization would use "Coypright (C) Example, Inc." followed by on the next line "SPDX identifier: "LicenseRef-proprietary-my-org". We would then write helper functions in rules.kts hasFileFindings(file glob), hasMyOrgCoyrights(file glob) and hasFileWithLicense(file glob, "LicenseRef-proprietary-my-org") if hasFileFindings is true amd the other are false then I presume the code is open source.

"Throw deprecation policy violation for deprecated package /external/legacy-sdk" What are the requirements for such entry?

I was thinking to just do it on directory name and only in a next step code repository location. My presumption was that R&D teams can agree on the naming on libs within /external but that code repository information would not be useful. Think for example case were the R&D team simply creates a /external/curl within their own code repository.

"hasCommitContains(contents regex?)" I want to use for checking Git commits messages/history for organization's secrets or naming. Say we are open-sourcing a project and the team would like to preserve the Git history as much as possible, however as a company we don't want our internal code names for projects or customer to be made public. The project was originally developed for customer "paris" and code name was "oiseau" then I would like to hasCommitContains(paris|oiseau) where "paris" and "oiseau" would likely be a Kotlin set in rules.kts or passed in to the scan via labels.

@fviernau
Copy link
Member

fviernau commented Aug 1, 2022

My assumption is that each code file in my organization would use "Coypright (C) Example, Inc." followed by on the next line "SPDX identifier: "LicenseRef-proprietary-my-org". We would then write helper functions in rules.kts hasFileFindings(file glob), hasMyOrgCoyrights(file glob) and hasFileWithLicense(file glob, "LicenseRef-proprietary-my-org") if hasFileFindings is true and the other are false then I presume the code is open source.

Hm, from that writing I have a hunch that you're assuming that there would be some new rule type which gets called once per file in the source tree. However, for this use case I believe one should not create one violation per file as that would lead potentially to a large amount of violations. If I'm not missing anything, that would not be possible to achieve with a rule which gets called separately per file.

Whilst writing this I got the feeling that clarifying these details in a call would be more efficient.

@fviernau
Copy link
Member

fviernau commented Aug 2, 2022

@tsteenbe and me further dived into the use cases and we propose to do the following in scope of this ticket:

  1. Extend the rules API to realize the use cases
  2. Create actual real world policy rules in ort-config / examples rules
  • The rules must include error messages which make the violation actionable. This is important
    as the error messages typically impose additional requirements on the API.
  • The rules can assume they are controlled by some organization, e.g. the rule can prescribe how particular files have to be named like "the license file must always be named LICENSE".

The following rules should be added to ort-config or ORT's example rules, and the rules API shall be extended accordingly:

One rule for each of the following files, checking their existance

  • CONTRIBUTING.md, CONTRIBUTING.rst
  • LICENSE
  • README.md, README.rst

Note: We added .rst to the list, since .md has some limitation which .rst can work around.

One rule which checks whether there is CI setup

  • message
   This project does not seem to have any known CI configuration files.
   The following are known CI configurations. Directories:
   -
   Files:
   -
   -
   If you're a different CI system that is not recognized, please ask for support.
  • pseudo code:
require {
   +AnyOf(
     hasDirectory('.github'),
     hasFile('.gitlab-ci.yml'),
     hasDirectory('.circleci'),
     hasFile('.travis.yml'),
     hasFile('bitbucket-pipelines.yml'),
     hasFile('appveyor.yml')
     ...
   )

A rule which checks whether the repository contains tests

  • idea: throw a violation if the project sources seem not to have any test
  • rules uses function / heuristic: hasTests()
  • message:
This project does not seem to have any test directories or files.
If you do have tests, please ask for support.

A rule which checks for the existance of '.gitignore' in the root

  • message:
 It is generally recommended to have a '.gitignore' file. See the following
 repository https://github.com/github/gitignore.

A rule that checks that no dependencies commited which are used via package managers

  • idea: for NPM node_module dirs should not be commited, for Go the vendor dir should not be committed and so forth.
  • pseudo code:
   val moduleDirs = sourceTree.findDirectories('**/node_modules')

   if (moduleDirs.isNotEmpty()) {
        error(...)
   }

A rule that checks that the README.md has a section for License and contains a copyright
´´´
idea: hasFileWithContents("README.md", "## License")

A rule which flags non-inclusive language in commit messages

idea: check git history for non-inclusive language
see https://github.com/dialpad/inclusive-language
error message should mentioned the SHA1 of the commit which is problematic

@fviernau
Copy link
Member

fviernau commented Aug 2, 2022

As flagging non-inclusive language is a bigger topic, I've filed a GitHub issue as a basis for starting discussions: #5629.

fviernau added a commit to oss-review-toolkit/ort-config that referenced this issue Sep 15, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been create as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort-config that referenced this issue Sep 15, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been create as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort-config that referenced this issue Sep 15, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been created as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort-config that referenced this issue Sep 16, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been created as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
fviernau added a commit to oss-review-toolkit/ort-config that referenced this issue Sep 19, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been created as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
tsteenbe pushed a commit to oss-review-toolkit/ort-config that referenced this issue Sep 19, 2022
The evaluator rules in this repository have been created based on ORT's
example policy rules, meanwhile they deviated, but are quite redundant.
Reduce the deviation use-case-wise by copying all rules from ORT's
examples which are missing in this repository using ORT revision [1].

This is the first step towards making the 'ort-config' repository the
single dedicated place to examplify policy rules. In the following ORT's
example rules will be deleted, or rather minimized and turned into
functional test assets, see also [2].

Note: The copied rules have been created as part of [3].

[1] 63e002ba57e7d49c96017fac2ff679de8a5b76df
[2] oss-review-toolkit/ort#5701
[3] oss-review-toolkit/ort#5621

Signed-off-by: Frank Viernau <frank_viernau@epam.com>
@fviernau
Copy link
Member

The initial scope (ticket description) had been changed to the use case defined in the comment above: #5621 (comment). All except one have been implemented which I extracted to a separate ticket [1], so this can be closed as completed.

Note: For the implementation of the use cases see merged PRs linked above.

[1] #5853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues that are considered to be enhancements evaluator About the evaluator tool
Projects
None yet
Development

No branches or pull requests

2 participants