Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#270 Adjust prune workflow #288

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions .github/scripts/release_images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

# Initialize array to store digest SHAs and all release versions
declare -a digests
declare -a all_release_versions

# Fetch all docker image names
image_names=$(gh api \
-H "Accept: application/vnd.github+json" \
-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages?package_type=container" | jq -r '.[] | select((.name | startswith("aissemble")) and (.name | endswith("-chart") | not)) | .name')

# For each docker image, find all release versions, excluding 1.7.0
for name in $image_names; do
release_versions=$(gh api \
-H "Accept: application/vnd.github+json" \
-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages/container/${name}/versions" \
| jq -r '.[] | .metadata.container.tags[] | select(test("^\\d+\\.\\d+\\.\\d+$"))' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So given this regex will only match to 3 digits (i.e. 1.7.0), do we still need the check below for 1.7.0-arm64 and 1.7.0-amd64? In theory this regex shouldn't match on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, removed!

| jq -R -s 'split("\n") | map(select(length > 0)) | map(select(. != "1.7.0"))')

# Add release versions to all_release_versions array
all_release_versions+=("$release_versions")

# Loop through all release versions
for version in $(echo "$release_versions" | jq -r '.[]'); do
echo "Processing release image ${name}:${version}"

# Fetch the base manifest SHA
# Inspect command will output Name, MediaType, Digest in the first three lines
# so we can use a regex to pull out the SHA
manifest_base_sha=$(docker buildx imagetools inspect "ghcr.io/boozallen/${name}:${version}" | head -n 3 | sed -n 's/^Digest: *//p')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: ideally we would use something more straightforward like docker buildx imagetools inspect ghcr.io/boozallen/${name}:${version} --format "{{json .Manifest}}" | jq -r .digest to pull the manifest base SHA. However, because we have buildx's provenance parameter set to min (aka true, the default value), there are two extra manifests that are added to the list with os and architecture set to unknown.
This causes the command to output an error like the following:

ERROR: failed to copy: httpReadSeeker: failed open: content at https://ghcr.io/v2/boozallen/aissemble-spark/manifests/sha256:c0ea773c38265bf3a80a133211aba2468dec87b8ce5b341d82611e2c81252147 not found: not found

where the referenced SHA points to the attestation/provenance.

The workaround here is to search the first 3 lines of the inspect output for the line beginning with Digest: and set that to the manifest_base_sha. An example of the first 3 lines below:

Name:      ghcr.io/boozallen/aissemble-spark:1.8.0
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:fb575ed4468a7828a23a8a7d2608977acc59a67e0dd36f9f2608316301554a95

Another potential workaround would be to use the GitHub API:

gh api \
        -H "Accept: application/vnd.github+json" \
        -H "-GitHub-Api-Version: 2022-11-28" \
        -H "Authorization: Token ${GITHUB_TOKEN}" \
        --paginate  /orgs/boozallen/packages/container/${name}/versions | jq '.[] | select(.metadata.container.tags[] == ${version}) | .name'

However, this would require (# images * # release versions) total API calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a Github issue we can link for this bug in the imagetools inspect tool? Maybe buildx#1509? Though it looks like that issue has been closed. I wonder if they have an open issue that would address the JSON output.


echo "Manifest index SHA: ${manifest_base_sha}"

# Add to digests array
digests+=("$manifest_base_sha")

# Query the raw inpect output to get the nested manifest list SHAs
manifest_list_shas=$(docker buildx imagetools inspect --raw "ghcr.io/boozallen/${name}:${version}" | jq -r '.manifests[].digest' | paste -s -d ' ' -)

echo "Manifest List SHAs: $manifest_list_shas"

# Add to digests array
digests+=("$manifest_list_shas")
done
done

# Pass the latest_release_version to GITHUB_OUTPUT so snapshot_images.sh can use it
latest_release_version=$(echo "$all_release_versions" | jq -r .[] | sort -uV -r | head -n 1)
echo "latest-release-version=${latest_release_version}" >> "$GITHUB_OUTPUT"

# Join digests into a single string separated by spaces
digests_string=$(echo "${digests[*]}")

# Save the output to $GITHUB_OUTPUT
echo "multi-arch-digests=${digests_string}" >> "$GITHUB_OUTPUT"
79 changes: 79 additions & 0 deletions .github/scripts/snapshot_images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/bin/bash

# Initialize array to store digest SHAs
declare -a digests

# Fetch all docker image names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: Since we already rely on the release version being passed from the release_images.sh script, we could also pass the image names so we don't have to query for them twice.

image_names=$(gh api \
-H "Accept: application/vnd.github+json" \
-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages?package_type=container" | jq -r '.[] | select((.name | startswith("aissemble")) and (.name | endswith("-chart") | not)) | .name')
chang-annie marked this conversation as resolved.
Show resolved Hide resolved

# For each docker image, find their snapshot versions by grabbing any tag that ends with "-SNAPSHOT"
for name in $image_names; do
all_snapshot_versions=$(gh api \
-H "Accept: application/vnd.github+json" \
-H "-GitHub-Api-Version: 2022-11-28" \
-H "Authorization: Token ${GITHUB_TOKEN}" \
--paginate "/orgs/boozallen/packages/container/${name}/versions" \
| jq -r '.[] | .metadata.container.tags[] | select(endswith("-SNAPSHOT"))'| jq -R -s 'split("\n")| map(select(length > 0))')

# Find the latest snapshot version by sorting all snapshot versions, then selecting the top
latest_snapshot_version=$(echo "$all_snapshot_versions" | jq -r '.[] | select(endswith("-SNAPSHOT"))' | sort -r | head -n 1)

echo "Processing snapshot image ${name}:${latest_snapshot_version}"

# Fetch the base manifest SHA
# Inspect command will output Name, MediaType, Digest in the first three lines
# Use regex to pull out the SHA
latest_snapshot_manifest_base_sha=$(docker buildx imagetools inspect "ghcr.io/boozallen/${name}:${latest_snapshot_version}" | head -n 3 | sed -n 's/^Digest: *//p')

echo "Manifest index SHA: ${latest_snapshot_manifest_base_sha}"

# Add to digests array
digests+=("$latest_snapshot_manifest_base_sha")

# Fetch the manifest list SHAs
latest_snapshot_manifest_list_shas=$(docker buildx imagetools inspect --raw "ghcr.io/boozallen/${name}:${latest_snapshot_version}" | jq -r '.manifests[].digest' | paste -s -d ' ' -)
Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why do we not have to same logic as above when we were getting the release SHA's like this:
(docker buildx imagetools inspect "ghcr.io/boozallen/${name}:${version}" | head -n 3 | sed -n 's/^Digest: *//p')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to be pulling the manifest digest SHA for snapshot versions? I thought for snapshot versions we just need the manifest list SHAs but if that's not the case, I can update it to also pull in the manifest digest SHA.

so if the manifest is something like this:

Name:      ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT
MediaType: application/vnd.oci.image.index.v1+json
Digest:    sha256:1de4a18694021c9c6e6194846be20c3fb790fd309e40089ea1b36e5efae1c860

Manifests:
  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/amd64

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    linux/arm64

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:08c2d2581e1a4596db460e5d9b1b0bcb0d476eaf392530df6b59020b5c2f13cd
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
    vnd.docker.reference.type:   attestation-manifest

  Name:        ghcr.io/boozallen/aissemble-spark:1.9.0-SNAPSHOT@sha256:6db9c6f6e8c87f30ad4ebc12be8c2635d7b20c2e6bb585af43bd9e5240376684
  MediaType:   application/vnd.oci.image.manifest.v1+json
  Platform:    unknown/unknown
  Annotations:
    vnd.docker.reference.digest: sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb
    vnd.docker.reference.type:   attestation-manifest

we'll be pulling out the following SHAs:

sha256:1de4a18694021c9c6e6194846be20c3fb790fd309e40089ea1b36e5efae1c860
sha256:9a035cc18e05501f3cd9f0478e16cdf3d37f5cc39ed23b52ab5c692e9a876465
sha256:fd1d5d47903d0b9a63233a092e1e06f3e99bb501d350dc23aa7858cfb55ac4fb 
sha256:08c2d2581e1a4596db460e5d9b1b0bcb0d476eaf392530df6b59020b5c2f13cd 
sha256:6db9c6f6e8c87f30ad4ebc12be8c2635d7b20c2e6bb585af43bd9e5240376684

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference in the manifest digest SHA vs manifest list SHA? I'm just confused why the logic for preserving released versions is different than preserving snapshot versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carter-cundiff I agree, I just added it into the Fetch latest snapshot version SHAs step. I was under the wrong impression that snapshot versions wouldn't always have a digest SHA. The new dry-run prune logs can be found here for review~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if we should just pull this logic out into a common shared function that we source in each script. Something like an get_image_shas that reads $1 as an image URI and returns the array with the base manifest SHA + referenced image SHAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance I think we'd also need this same logic down in the patch snapshot logic for the same reasons outlined here.


echo "Manifest List: $latest_snapshot_manifest_list_shas"

# Add to digests array
digests+=("$latest_snapshot_manifest_list_shas")

# Find if there are any patch versions available
# Extract the major, minor, and patch components of the latest release version using IFS (Internal Field Separator)
IFS='.' read -r major minor patch <<< "${LATEST_RELEASE_VERSION}"

# Patch pattern uses the latest release version's major and minor components
patch_pattern="${major}\.${minor}\.[1-9]+[0-9]*-SNAPSHOT?"
Copy link
Contributor

@ewilkins-csi ewilkins-csi Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: I'd drop the plus as it's not doing much here. Also I don't think the ? is necessary? Looks like it's just making the T in SNAPSHOT optional.

Suggested change
patch_pattern="${major}\.${minor}\.[1-9]+[0-9]*-SNAPSHOT?"
patch_pattern="${major}\.${minor}\.[1-9][0-9]*-SNAPSHOT"


for version in $(echo "$all_snapshot_versions" | jq -r '.[]'); do
if [[ ${version} =~ ^${patch_pattern}$ ]]; then
echo "Patch version ${version} matches patch pattern ${patch_pattern}"

# Add to array containing all matching patch versions
matching_patch_versions+=("$version")
fi

# If matching patch versions array is not empty
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: the logic to find the latest patch version is as follows:

  • parse the latest snapshot version (follows major.minor.0-SNAPSHOT syntax) to find the major and minor versions
  • find the previous minor version
  • if the previous minor is greater than or equal to 0, then the patch version uses the same major version and the pattern to look for is current_major.previous_minor.[1-9]-SNAPSHOT
  • if the previous minor version is less than 0, we need to also find the previous major version and the patch version pattern to check for becomes previous_major.(10 + previous_minor).[1-9]-SNAPSHOT
  • save all found patch versions to an array
  • if the array is not empty, sort the contents, find the latest patch version, and pull the SHA for that image:latest_patch_version and save to main digests array

Since there are no actual patch versions currently, this logic was tested locally by creating dummy data. Some examples below:

All Snapshot Versions:
[ "1.9.0-SNAPSHOT", "1.8.3-SNAPSHOT", "1.8.1-SNAPSHOT", "1.7.0-SNAPSHOT", "1.8.0-SNAPSHOT", "1.7.1-SNAPSHOT" ]
Latest Snapshot Version found: 1.9.0-SNAPSHOT
Previous major: 0
Previous minor: 8
Previous minor is greater than or equal to 0
Using Patch Major: 1 and Patch Minor: 8
Patch Pattern: 1\.8\.[1-9]-SNAPSHOT

Patch version 1.8.3-SNAPSHOT matches patch pattern
Patch version 1.8.1-SNAPSHOT matches patch pattern
Sorted matching patch versions: 1.8.3-SNAPSHOT 1.8.1-SNAPSHOT
Latest patch version found: 1.8.3-SNAPSHOT
All Snapshot Versions:
[ "1.9.0-SNAPSHOT", "1.8.3-SNAPSHOT", "1.8.1-SNAPSHOT", "2.0.0-SNAPSHOT", "1.9.1-SNAPSHOT", "1.9.4-SNAPSHOT" ]
Latest Snapshot Version found: 2.0.0-SNAPSHOT
Previous major: 1
Previous minor: -1
Previous minor is less than 0
Using Patch Major: 1 and Patch Minor: 9
Patch Pattern: 1\.9\.[1-9]-SNAPSHOT

Patch version 1.9.1-SNAPSHOT matches patch pattern
Patch version 1.9.4-SNAPSHOT matches patch pattern
Sorted matching patch versions: 1.9.4-SNAPSHOT 1.9.1-SNAPSHOT
Latest patch version found: 1.9.4-SNAPSHOT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a false assumption in this logic that the major.minor.patch number has to be 1-9 which is not always the case. Before we released aissemble 1.0.0 we were on version 0.13.6.

The logic should still be fine for when the major remains the same, but probably won't work for when the major changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, I was assuming that our versioning would be 1-9 and sequential...

I think it make more sense if I just use the latest release version's major and minor values for the patch pattern.
For example, since the latest release is 1.8.0, the patch pattern would be 1\.8\.[1-9]+[0-9]*-SNAPSHOT?. What do you think?

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Honestly it's not the end of the world if we delete these patch release snapshots anyway. The one that really matters is the patch release itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carter-cundiff updated to new logic of using the latest release version's major and minor versions for the patch pattern. The latest release version is passed from the Fetch multi-platform package version SHAs step into GITHUB_OUTPUT and then used in the Fetch latest snapshot version SHAs step.

Copy link
Contributor

@carter-cundiff carter-cundiff Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this works for when the latest release is 1.0.1 and we're working on a patch release 1.0.2. Should probably just match on whatever the current patch is + 1 instead of .[1-9]+[0-9]*-SNAPSHOT?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would match on both, but it this scenario we want it to match on only 1.0.2-SNAPSHOT

if [ ${#matching_patch_versions[@]} -ne 0 ]; then
echo "Patch Versions array is not empty"
# Find the latest patch version
latest_patch_version=($(printf "%s\n" "${matching_patch_versions[@]}" | sort -V -r | head -n 1))

# Fetch the SHA
latest_patch_version_sha=$(docker buildx imagetools inspect --raw "ghcr.io/boozallen/${name}:${latest_patch_version}" | jq -r '.manifests[].digest' | paste -s -d ' ' -)

# Add to the digests array
digests+=("$latest_patch_version_sha")
fi
done
done

# Join digests into a single string separated by spaces
digests_string=$(echo "${digests[*]}")

# Save the output to $GITHUB_OUTPUT
echo "latest-snapshot-digests=${digests_string}" >> "$GITHUB_OUTPUT"
44 changes: 34 additions & 10 deletions .github/workflows/prune.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Prunes old aissemble-* untagged containers
# Prunes old aissemble-* containers, excluding those with the following tags:
# any release versions, the latest snapshot version, and the latest patch version

name: Prune ghcr.io

Expand All @@ -15,14 +16,37 @@ jobs:
runs-on: ubuntu-latest

steps:
- name: Prune
uses: snok/container-retention-policy@v2
# Required in order to access script files in this repository
- uses: actions/checkout@v4

# Prevents multi-platform release images from being pruned by identifying all manifest lists
- name: Fetch multi-platform package version SHAs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: For consistency with the snapshot step, it might make sense to call this Fetch release version SHAs and set the id/output names to be release-digests

id: multi-arch-digests
env:
GITHUB_TOKEN: ${{ secrets.GHCR_IO_TOKEN }}
run: bash ${GITHUB_WORKSPACE}/.github/scripts/release_images.sh

# Prevents the latest snapshot images from being pruned
- name: Fetch latest snapshot version SHAs
id: latest-snapshot-digests
env:
GITHUB_TOKEN: ${{ secrets.GHCR_IO_TOKEN }}
LATEST_RELEASE_VERSION: ${{ steps.multi-arch-digests.outputs.latest-release-version }}
run: bash ${GITHUB_WORKSPACE}/.github/scripts/snapshot_images.sh

- name: Concatenate digests
id: concat-digests
run: |
skip_shas="${{ steps.multi-arch-digests.outputs.multi-arch-digests }} ${{ steps.latest-snapshot-digests.outputs.latest-snapshot-digests }}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: concats the output of the two scripts into a single string that's passed to the skip-shas parameter of the container-rentention-policy

echo "skip-shas=$skip_shas" >> $GITHUB_OUTPUT

- name: Prune old release versions
uses: snok/container-retention-policy@v3.0.0
with:
image-names: aissemble-*
cut-off: Two days ago UTC
account-type: org
org-name: boozallen
keep-at-least: 2
untagged-only: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: As I was reading this PR I again started to question the keep-n-most-recent setting. So I went back to the docs and re-read. I know I brought up this concern that tagged version might be deleted if they fell outside the range, but on second read I think that concern was unfounded. The docs say the setting controls what to keep "out of the most recent tagged images selected [for deletion]". However the key was this parameter (renamed/refactord to be tag-selection: untagged), as this prevents tags from being selected at all. Which means the keep parameter never even comes into play as no tags are ever selected.

I think I prefer our current approach anyway, as it also cleans up RC tags without any extra effort. But I just thought I'd bring it up since I know we went back and forth a bit on what the options were doing.

dry-run: false
skip-shas: ${{ steps.concat-digests.outputs.skip-shas }}
image-names: "aissemble-* !aissemble-*-chart"
image-tags: "!1.7.0 !1.7.0-arm64 !1.7.0-amd64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: we're skipping 1.7.0, 1.7.0-arm64, and 1.7.0-amd64 tags completely because the1.7.0 release was a special case where we manually generated the multi-arch images

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I: and some parameter name changes required for upgrading from v2 to v3.0.0

cut-off: 2d
account: boozallen
token: ${{ secrets.GHCR_IO_TOKEN }}