Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Windows containers #181

Merged
merged 30 commits into from
Aug 4, 2021
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
354bf8b
Use newer version of container libraries
pombredanne May 25, 2021
624f8cd
Use new container-inspector structures
pombredanne May 25, 2021
927bd4b
Add minimal support for Windows containers
pombredanne May 25, 2021
ee5ea1b
Update Windows package getter
JonoYang Jun 26, 2021
fcfbe3c
Use newer version of container libraries
pombredanne May 25, 2021
7c66011
Update call to windows_helper to win_reg
JonoYang Jul 8, 2021
324cf3d
Create new pipeline for Windows Docker images
JonoYang Jul 9, 2021
4f1eac7
Add function to find packages at well-known paths
JonoYang Jul 16, 2021
198efed
Add step to tag known software in pipeline
JonoYang Jul 16, 2021
3a49932
Get version from path in tag_known_software #238
JonoYang Jul 16, 2021
1b46251
Troubleshoot regex patterns #238
JonoYang Jul 16, 2021
c80f06b
Report Program File contents as packages #238
JonoYang Jul 17, 2021
358b9ac
Update Windows-specific regex
JonoYang Jul 23, 2021
a632464
Do not ignore .mui files #238
JonoYang Jul 23, 2021
431d1e4
Filter using extension field rather than path #238
JonoYang Jul 26, 2021
04d45aa
Update scanpipe/pipes/docker.py
pombredanne Jul 23, 2021
9c74c4c
Fix scancode-toolkit pinned version in base.txt #238
JonoYang Jul 26, 2021
4636576
Create pipeline step to tag ignorable files #252
JonoYang Jul 27, 2021
b216c6e
Update formatting #238
JonoYang Jul 28, 2021
aaafc04
Generalize regex expressions #238
JonoYang Jul 29, 2021
730e808
Create new pipes for ignoring files #238
JonoYang Jul 29, 2021
546296f
Add more file extensions to ignore #238
JonoYang Jul 30, 2021
6baaeb0
Bump dep versions #238
JonoYang Jul 30, 2021
76c9e4f
Update docstring #238
JonoYang Aug 2, 2021
2220809
Improve regex used in tag_known_software #238
JonoYang Aug 2, 2021
a48eb4c
Adjust code for consistency across the codebase #181
tdruez Aug 3, 2021
c94fc15
Address PR comments #238
JonoYang Aug 3, 2021
ced8f38
Add is_media field to CodebaseResource #238
JonoYang Aug 3, 2021
53b128a
Simplify tag_media_files_as_unintersting() #238
JonoYang Aug 3, 2021
2645864
Refine windows pipes #238
tdruez Aug 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions scanpipe/pipelines/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def steps(cls):
return (
cls.extract_images,
cls.extract_layers,
cls.find_images_linux_distro,
cls.find_images_os_and_distro,
cls.collect_images_information,
cls.collect_and_create_codebase_resources,
cls.collect_and_create_system_packages,
Expand Down Expand Up @@ -63,9 +63,9 @@ def extract_layers(self):
if errors:
self.add_error("\n".join(errors))

def find_images_linux_distro(self):
def find_images_os_and_distro(self):
"""
Finds the linux distro of input images.
Finds the operating system and distro of input images.
"""
for image in self.images:
image.get_and_set_distro()
Expand Down
81 changes: 81 additions & 0 deletions scanpipe/pipelines/windows_docker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# SPDX-License-Identifier: Apache-2.0
#
# http://nexb.com and https://github.com/nexB/scancode.io
# The ScanCode.io software is licensed under the Apache License version 2.0.
# Data generated with ScanCode.io is provided as-is without warranties.
# ScanCode is a trademark of nexB Inc.
#
# You may not use this software except in compliance with the License.
# You may obtain a copy of the License at: http://apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software distributed
# under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
# CONDITIONS OF ANY KIND, either express or implied. See the License for the
# specific language governing permissions and limitations under the License.
#
# Data Generated with ScanCode.io is provided on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, either express or implied. No content created from
# ScanCode.io should be considered or used as legal advice. Consult an Attorney
# for any legal advice.
#
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

from scanpipe.pipelines.docker import Docker
from scanpipe.pipes import docker
from scanpipe.pipes import rootfs
from scanpipe.pipes import windows


class WindowsDocker(Docker):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about DockerWindows and docker_windows instead? This would keep all Docker based pipeline grouped in the UI.
I'm not sure about this though, what's your take?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to rename the pipeline to DockerWindows for grouping.

"""
A pipeline to analyze Windows Docker images.
"""

@classmethod
def steps(cls):
return (
cls.extract_images,
cls.extract_layers,
cls.find_images_os_and_distro,
cls.collect_images_information,
cls.collect_and_create_codebase_resources,
cls.collect_and_create_system_packages,
cls.tag_known_software_packages,
cls.tag_uninteresting_codebase_resources,
cls.tag_program_files_dirs_as_packages,
cls.tag_empty_files,
cls.scan_for_application_packages,
cls.scan_for_files,
cls.analyze_scanned_files,
cls.tag_data_files_with_no_clues,
cls.tag_not_analyzed_codebase_resources,
)

def tag_known_software_packages(self):
JonoYang marked this conversation as resolved.
Show resolved Hide resolved
"""
Flag files from well-known software packages by checking common install paths.
"""
windows.tag_known_software(self.project)

def tag_uninteresting_codebase_resources(self):
"""
Flag files that are known to be uninteresting.
"""
docker.tag_whiteout_codebase_resources(self.project)
windows.tag_uninteresting_windows_codebase_resources(self.project)
rootfs.tag_ignorable_codebase_resources(self.project)
rootfs.tag_media_files_as_uninteresting(self.project)

def tag_program_files_dirs_as_packages(self):
"""
Report the immediate subdirectories of `Program Files` and `Program
Files (x86)` as packages.
"""
windows.tag_program_files(self.project)

def tag_data_files_with_no_clues(self):
"""
If a file is a data file and has no clues towards its origin, mark as
uninteresting.
"""
rootfs.tag_data_files_with_no_clues(self.project)
1 change: 0 additions & 1 deletion scanpipe/pipes/docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ def extract_layers_from_images(project, images):
Returns the `errors` that may happen during the extraction.
"""
errors = []

for image in images:
image_dirname = Path(image.extracted_location).name
target_path = project.codebase_path / image_dirname
Expand Down
97 changes: 96 additions & 1 deletion scanpipe/pipes/rootfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
# ScanCode.io is a free software code scanning tool from nexB Inc. and others.
# Visit https://github.com/nexB/scancode.io for support and download.

import fnmatch
import logging
import os
from functools import partial
Expand All @@ -28,12 +29,14 @@
from django.db.models import Q

import attr
from commoncode.ignore import default_ignores
from container_inspector.distro import Distro

from scanpipe import pipes
from scanpipe.pipes import alpine
from scanpipe.pipes import debian
from scanpipe.pipes import rpm
from scanpipe.pipes import windows

logger = logging.getLogger(__name__)

Expand All @@ -48,6 +51,7 @@
"opensuse": rpm.package_getter,
"opensuse-tumbleweed": rpm.package_getter,
"photon": rpm.package_getter,
"windows": windows.package_getter,
}


Expand Down Expand Up @@ -188,7 +192,7 @@ def has_hash_diff(install_file, codebase_resource):

def scan_rootfs_for_system_packages(project, rootfs, detect_licenses=True):
"""
Given a `project` Project and an `rootfs` RootFs, scan the `rootfs` for
Given a `project` Project and a `rootfs` RootFs, scan the `rootfs` for
installed system packages, and create a DiscoveredPackage for each.

Then for each installed DiscoveredPackage file, check if it exists
Expand Down Expand Up @@ -336,3 +340,94 @@ def tag_uninteresting_codebase_resources(project):

qs = project.codebaseresources.no_status()
qs.filter(lookups).update(status="ignored-not-interesting")


def tag_ignorable_codebase_resources(project):
"""
Using the glob patterns from commoncode.ignore of ignorable files/directories,
tag codebase resources from `project` if their paths match an ignorable pattern.
"""
lookups = Q()
for pattern in default_ignores.keys():
# Translate glob pattern to regex
translated_pattern = fnmatch.translate(pattern)
# PostgreSQL does not like parts of Python regex
if translated_pattern.startswith("(?s"):
translated_pattern = translated_pattern.replace("(?s", "(?")
lookups |= Q(rootfs_path__icontains=pattern)
lookups |= Q(rootfs_path__iregex=translated_pattern)

qs = project.codebaseresources.no_status()
qs.filter(lookups).update(status="ignored-default-ignores")


def tag_data_files_with_no_clues(project):
"""
Tags CodebaseResources that have a file type of `data` and no detected clues
to be uninteresting.
"""
lookup = Q(
file_type="data",
copyrights=[],
holders=[],
authors=[],
licenses=[],
license_expressions=[],
emails=[],
urls=[],
)

qs = project.codebaseresources
qs.filter(lookup).update(status="ignored-data-file-no-clues")


def tag_media_files_as_uninteresting(project):
"""
Tags CodebaseResources that are media files to be uninteresting.

`mimes` and `types` are taken from TypeCode:
https://github.com/nexB/typecode/blob/main/src/typecode/contenttype.py#L528
"""
mimes = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be directly imported from typecode.
It would require to make https://github.com/nexB/typecode/blob/main/src/typecode/contenttype.py#L528 available as a module variable.
In the short term, we can keep it as-is and enter a ticket on the typecode side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez On the other hand, what if we added the is_media field from the license scan to the CodebaseResource model?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the is_media field to CodebaseResource, removed the part that excludes is_media from unsupported_fields in scanpipe.pipes.scancode.get_resource_info, and updated the tests results.

"image",
"picture",
"audio",
"video",
"graphic",
"sound",
)

types = (
"image data",
"graphics image",
"ms-windows metafont .wmf",
"windows enhanced metafile",
"png image",
"interleaved image",
"microsoft asf",
"image text",
"photoshop image",
"shop pro image",
"ogg data",
"vorbis",
"mpeg",
"theora",
"bitmap",
"audio",
"video",
"sound",
"riff",
"icon",
"pc bitmap",
"image data",
"netpbm",
)

lookup = Q()
for mime_type in mimes:
lookup |= Q(mime_type__icontains=mime_type)
for file_type in types:
lookup |= Q(file_type__icontains=file_type)

qs = project.codebaseresources.no_status()
qs.filter(lookup).update(status="ignored-media-file")
Loading