Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#270 Adjust prune workflow #288

Closed
wants to merge 1 commit into from
Closed

Conversation

chang-annie
Copy link
Contributor

Makes the following changes to the GitHub prune workfow:

  • updates container-retention-policy from v2 to v3.0.0 to allow usage of new skip-shas parameter
  • creates two new scripts that pulls SHAs for: all release versions, the latest snapshot version, the latest patch version (if exists)

dry-run: false
skip-shas: ${{ steps.concat-digests.outputs.skip_shas }}
image-names: "aissemble-* !aissemble-*-chart"
image-tags: "!1.7.0 !1.7.0-arm64 !1.7.0-amd64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: we're skipping 1.7.0, 1.7.0-arm64, and 1.7.0-amd64 tags completely because the1.7.0 release was a special case where we manually generated the multi-arch images

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: and some parameter name changes required for upgrading from v2 to v3.0.0

- name: Concatenate digests
id: concat-digests
run: |
skip_shas="${{ steps.multi-arch-digests.outputs.multi-arch-digests }} ${{ steps.latest-snapshot-digests.outputs.latest-snapshot-digests }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: concats the output of the two scripts into a single string that's passed to the skip-shas parameter of the container-rentention-policy

matching_patch_versions+=("$version")
fi

# If matching patch versions array is not empty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: the logic to find the latest patch version is as follows:

  • parse the latest snapshot version (follows major.minor.0-SNAPSHOT syntax) to find the major and minor versions
  • find the previous minor version
  • if the previous minor is greater than or equal to 0, then the patch version uses the same major version and the pattern to look for is current_major.previous_minor.[1-9]-SNAPSHOT
  • if the previous minor version is less than 0, we need to also find the previous major version and the patch version pattern to check for becomes previous_major.(10 + previous_minor).[1-9]-SNAPSHOT
  • save all found patch versions to an array
  • if the array is not empty, sort the contents, find the latest patch version, and pull the SHA for that image:latest_patch_version and save to main digests array

Since there are no actual patch versions currently, this logic was tested locally by creating dummy data. Some examples below:

All Snapshot Versions:
[ "1.9.0-SNAPSHOT", "1.8.3-SNAPSHOT", "1.8.1-SNAPSHOT", "1.7.0-SNAPSHOT", "1.8.0-SNAPSHOT", "1.7.1-SNAPSHOT" ]
Latest Snapshot Version found: 1.9.0-SNAPSHOT
Previous major: 0
Previous minor: 8
Previous minor is greater than or equal to 0
Using Patch Major: 1 and Patch Minor: 8
Patch Pattern: 1\.8\.[1-9]-SNAPSHOT

Patch version 1.8.3-SNAPSHOT matches patch pattern
Patch version 1.8.1-SNAPSHOT matches patch pattern
Sorted matching patch versions: 1.8.3-SNAPSHOT 1.8.1-SNAPSHOT
Latest patch version found: 1.8.3-SNAPSHOT
All Snapshot Versions:
[ "1.9.0-SNAPSHOT", "1.8.3-SNAPSHOT", "1.8.1-SNAPSHOT", "2.0.0-SNAPSHOT", "1.9.1-SNAPSHOT", "1.9.4-SNAPSHOT" ]
Latest Snapshot Version found: 2.0.0-SNAPSHOT
Previous major: 1
Previous minor: -1
Previous minor is less than 0
Using Patch Major: 1 and Patch Minor: 9
Patch Pattern: 1\.9\.[1-9]-SNAPSHOT

Patch version 1.9.1-SNAPSHOT matches patch pattern
Patch version 1.9.4-SNAPSHOT matches patch pattern
Sorted matching patch versions: 1.9.4-SNAPSHOT 1.9.1-SNAPSHOT
Latest patch version found: 1.9.4-SNAPSHOT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a false assumption in this logic that the major.minor.patch number has to be 1-9 which is not always the case. Before we released aissemble 1.0.0 we were on version 0.13.6.

The logic should still be fine for when the major remains the same, but probably won't work for when the major changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, I was assuming that our versioning would be 1-9 and sequential...

I think it make more sense if I just use the latest release version's major and minor values for the patch pattern.
For example, since the latest release is 1.8.0, the patch pattern would be 1\.8\.[1-9]+[0-9]*-SNAPSHOT?. What do you think?

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Honestly it's not the end of the world if we delete these patch release snapshots anyway. The one that really matters is the patch release itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carter-cundiff updated to new logic of using the latest release version's major and minor versions for the patch pattern. The latest release version is passed from the Fetch multi-platform package version SHAs step into GITHUB_OUTPUT and then used in the Fetch latest snapshot version SHAs step.

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this works for when the latest release is 1.0.1 and we're working on a patch release 1.0.2. Should probably just match on whatever the current patch is + 1 instead of .[1-9]+[0-9]*-SNAPSHOT?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would match on both, but it this scenario we want it to match on only 1.0.2-SNAPSHOT

# Fetch the base manifest SHA
# Inspect command will output Name, MediaType, Digest in the first three lines
# so we can use a regex to pull out the SHA
manifest_base_sha=$(docker buildx imagetools inspect "ghcr.io/boozallen/${name}:${version}" | head -n 3 | sed -n 's/^Digest: *//p')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: ideally we would use something more straightforward like docker buildx imagetools inspect ghcr.io/boozallen/${name}:${version} --format "{{json .Manifest}}" | jq -r .digest to pull the manifest base SHA. However, because we have buildx's provenance parameter set to min (aka true, the default value), there are two extra manifests that are added to the list with os and architecture set to unknown.
This causes the command to output an error like the following:

ERROR: failed to copy: httpReadSeeker: failed open: content at https://ghcr.io/v2/boozallen/aissemble-spark/manifests/sha256:c0ea773c38265bf3a80a133211aba2468dec87b8ce5b341d82611e2c81252147 not found: not found

where the referenced SHA points to the attestation/provenance.

The workaround here is to search the first 3 lines of the inspect output for the line beginning with Digest: and set that to the manifest_base_sha. An example of the first 3 lines below:

Name:      ghcr.io/boozallen/aissemble-spark:1.8.0
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:fb575ed4468a7828a23a8a7d2608977acc59a67e0dd36f9f2608316301554a95

Another potential workaround would be to use the GitHub API:

gh api \
        -H "Accept: application/vnd.github+json" \
        -H "-GitHub-Api-Version: 2022-11-28" \
        -H "Authorization: Token ${GITHUB_TOKEN}" \
        --paginate  /orgs/boozallen/packages/container/${name}/versions | jq '.[] | select(.metadata.container.tags[] == ${version}) | .name'

However, this would require (# images * # release versions) total API calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a Github issue we can link for this bug in the imagetools inspect tool? Maybe buildx#1509? Though it looks like that issue has been closed. I wonder if they have an open issue that would address the JSON output.

-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages/container/${name}/versions" \
| jq -r '.[] | .metadata.container.tags[] | select(test("^\\d+\\.\\d+\\.\\d?$"))' \
| jq -R -s 'split("\n") | map(select(length > 0)) | map(select(. != "1.7.0" and . != "1.7.0-arm64" and . != "1.7.0-amd64"))')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: here, we're using the GitHub API to find all image names that follow the given regex pattern. We are excluding 1.7.0 release image tags completely because the 1.7.0 release was a special case where we manually generated the multi-arch images

-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages/container/${name}/versions" \
| jq -r '.[] | .metadata.container.tags[] | select(test("^\\d+\\.\\d+\\.\\d+?$"))' \
Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal of this regex to match only the release versions (i.e 1.8.0) but not the SNAPSHOTs (i.e. 1.8.0-SNAPSHOT)?

If so, is the ? on the end of ^\\d+\\.\\d+\\.\\d+?$ necessary? Would think we could just do ^\\d+\\.\\d+\\.\\d+$

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, you're right! the ? is no longer needed - updated to ^\\d+\\.\\d+\\.\\d+$.

-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages/container/${name}/versions" \
| jq -r '.[] | .metadata.container.tags[] | select(test("^\\d+\\.\\d+\\.\\d+$"))' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So given this regex will only match to 3 digits (i.e. 1.7.0), do we still need the check below for 1.7.0-arm64 and 1.7.0-amd64? In theory this regex shouldn't match on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, removed!

echo "Processing snapshot image ${name}:${latest_snapshot_version}"

# Fetch the latest snapshot version SHA
latest_snapshot_manifest_list_shas=$(docker buildx imagetools inspect --raw "ghcr.io/boozallen/${name}:${latest_snapshot_version}" | jq -r '.manifests[].digest' | paste -s -d ' ' -)
Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why do we not have to same logic as above when we were getting the release SHA's like this:
(docker buildx imagetools inspect "ghcr.io/boozallen/${name}:${version}" | head -n 3 | sed -n 's/^Digest: *//p')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to be pulling the manifest digest SHA for snapshot versions? I thought for snapshot versions we just need the manifest list SHAs but if that's not the case, I can update it to also pull in the manifest digest SHA.

so if the manifest is something like this:

Name:      ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:1de4a18694021c9c6e6194846be20c3fb790fd309e40089ea1b36e5efae1c860

Manifests:
  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/amd64

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/arm64

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:08c2d2581e1a4596db460e5d9b1b0bcb0d476eaf392530df6b59020b5c2f13cd
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
    vnd.docker.reference.type:   attestation-manifest

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:6db9c6f6e8c87f30ad4ebc12be8c2635d7b20c2e6bb585af43bd9e5240376684
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb
    vnd.docker.reference.type:   attestation-manifest

we'll be pulling out the following SHAs:

sha256:1de4a18694021c9c6e6194846be20c3fb790fd309e40089ea1b36e5efae1c860
sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb 
sha256:08c2d2581e1a4596db460e5d9b1b0bcb0d476eaf392530df6b59020b5c2f13cd 
sha256:6db9c6f6e8c87f30ad4ebc12be8c2635d7b20c2e6bb585af43bd9e5240376684

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in the manifest digest SHA vs manifest list SHA? I'm just confused why the logic for preserving released versions is different than preserving snapshot versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carter-cundiff I agree, I just added it into the Fetch latest snapshot version SHAs step. I was under the wrong impression that snapshot versions wouldn't always have a digest SHA. The new dry-run prune logs can be found here for review~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if we should just pull this logic out into a common shared function that we source in each script. Something like an get_image_shas that reads $1 as an image URI and returns the array with the base manifest SHA + referenced image SHAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance I think we'd also need this same logic down in the patch snapshot logic for the same reasons outlined here.

@chang-annie chang-annie force-pushed the 270-adjust-prune-workflow branch 3 times, most recently from 88f6bee to 7493264 Compare August 22, 2024 22:50
account-type: org
org-name: boozallen
keep-at-least: 2
untagged-only: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: As I was reading this PR I again started to question the keep-n-most-recent setting. So I went back to the docs and re-read. I know I brought up this concern that tagged version might be deleted if they fell outside the range, but on second read I think that concern was unfounded. The docs say the setting controls what to keep "out of the most recent tagged images selected [for deletion]". However the key was this parameter (renamed/refactord to be tag-selection: untagged), as this prevents tags from being selected at all. Which means the keep parameter never even comes into play as no tags are ever selected.

I think I prefer our current approach anyway, as it also cleans up RC tags without any extra effort. But I just thought I'd bring it up since I know we went back and forth a bit on what the options were doing.

# Initialize array to store digest SHAs
declare -a digests

# Fetch all docker image names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: Since we already rely on the release version being passed from the release_images.sh script, we could also pass the image names so we don't have to query for them twice.

- uses: actions/checkout@v4

# Prevents multi-platform release images from being pruned by identifying all manifest lists
- name: Fetch multi-platform package version SHAs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: For consistency with the snapshot step, it might make sense to call this Fetch release version SHAs and set the id/output names to be release-digests

IFS='.' read -r major minor patch <<< "${LATEST_RELEASE_VERSION}"

# Patch pattern uses the latest release version's major and minor components
patch_pattern="${major}\.${minor}\.[1-9]+[0-9]*-SNAPSHOT?"
Copy link
Contributor

@ewilkins-csi ewilkins-csi Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: I'd drop the plus as it's not doing much here. Also I don't think the ? is necessary? Looks like it's just making the T in SNAPSHOT optional.

Suggested change
patch_pattern="${major}\.${minor}\.[1-9]+[0-9]*-SNAPSHOT?"
patch_pattern="${major}\.${minor}\.[1-9][0-9]*-SNAPSHOT"

Copy link
Contributor

@ewilkins-csi ewilkins-csi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few smaller cleanup items. Looks good!

Copy link
Contributor

@ewilkins-csi ewilkins-csi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke offline with @chang-annie and there seems to be some decent actions already on the marketplace that will correctly handle multi-arch images instead of rolling our own. We're going to evaluate those to see if they make more sense than rolling our own here. (Certainly seems promising at first glance.)

@chang-annie
Copy link
Contributor Author

Closing this PR and will open a new one based on conversation and changes discussed with Emily.

@chang-annie chang-annie deleted the 270-adjust-prune-workflow branch August 26, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants