Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOTICE: We are considering deprecating the "stable" image tag #710

Open
PettitWesley opened this issue Jul 31, 2023 · 1 comment
Open

NOTICE: We are considering deprecating the "stable" image tag #710

PettitWesley opened this issue Jul 31, 2023 · 1 comment

Comments

@PettitWesley
Copy link
Contributor

The AWS Distro for Fluent Bit team is considering deprecating our stable tag: https://github.com/aws/aws-for-fluent-bit#latest-stable-version

Request for Comments: Concerns with stable deprecation?

Please comment on this issue.

Stable deprecation proposal

Problems with Stable

Problem: Stable for all customers is too wide

If stable was targeted at each user segment, then each user segment could get upgrades as soon as a new version was stable for their use cases. Currently, a specific user segment is blocked from obtaining upgrades if other user segments have issues.

Historically, we have continued to release new upstream code over time, and thus, each new version introduces net new code. Thus, even as the team continues to work on bug fixes for known bugs, new versions can have new instability issues. This has often made is slow to upgrade stable.

Problem: Stable is mutable

The official version consumption guidance in the AWS for Fluent Bit documentation calls out the following concern with stable:

Mixed deployments: If you are in the middle of a deployment when we release an update to the stable or latest immutable tags, some of your deployment may have deployed the previous version, and the rest will deploy the new version.

Problem: Stable is opaque

The official version consumption guidance in the AWS for Fluent Bit documentation calls out the following concern with stable:

Difficulty in determining which version was deployed: If you experience an issue, you will need to check the Fluent Bit log output to determine which specific version tag was deployed. This is because the stable and latest tags are mutable and change over time.

This concern becomes worse due to the above mixed deployment problem. If a customer has a cluster with multiple versions deployed, this is not immediately clear since all tasks/pods use the stable tag.

This concern was recently brought up in a user complaint when we switched the eks-charts helm chart to use the stable tag:

Looks like App version is not reported anymore in the latest Helm chart version - just stable shown.
What version exactly the stable refers to ?

The stable is not SemVer compliant and to be honest is confusing as it's not clear what version from GitHub release is going to install.

The world after stable tag deprecation

Goals

Deprecate stable:

  • By January 1st, 2024, monthly image downloads for stable < 100,000
  • By March 1st, 2024, monthly image downloads for stable < 1,000

Delight customers with clear version consumption guidance:

  • Clear guidance for different user segments

Proposed Experience:

  • Enable only best practices: Our recommended best practice is that customers select a specific version and lock their deployments to that tag, and then choose when they want to upgrade.
  • Support new major.minor tags: While the safest best practice is to lock to a single immutable version tag, some customers may desire an experience that allows them to automatically receive bug fixes. We strive to strictly follow semantic versioning in our releases. This means that a major.minor.patch versioning scheme is followed and new features and major changes are only releases when we change the major or minor version. The patch version can be incremented for bug fixes. We also now have a process for linux-only CVE re-builds where we append a 4th number which is the re-build date. It is reasonable for customers to manually select a major.minor releases series that they want to consume and then automatically receive patch and CVE fixes. We can release a new tag (a new alias, not a new image) that is just the major.minor version numbers and always points to the latest patch in the series.
    • For example, consider the following sequence of releases in time:
      • latest = 2.31.11 = 2.31
      • latest = 2.31.12 = 2.31
      • latest = 2.32.0 = 2.32; 2.31 = 2.31.12
      • latest = 2.32.1 = 2.32; 2.31 = 2.31.12
    • A customer can choose the 2.31 tag which is mutable and will receive updates as long as we keep publishing 2.31 versions. However, when we make significant changes and release 2.32.0 then the 2.31 tag is not updated and the customer must make a manual human decision to upgrade or not.
  • Provide Version Consumption Guidance Targeted at specific user segments: We should not expect/require that every user reads all of our release notes and tracks our GitHub issues. We should provide version consumption guidance for large user segments that makes it easy for them to decide which latest version is stable for their needs. This may be done via GitHub issues or GitHub discussions.
  • Stable image tag is eventually deleted: Customers may continue to lock deployments to the stable tag, and old examples not owned by AWS may not be updated. We should not break anyone’s deployment without a considerable timeline. Stable deprecation will have phases. Initially, stable image will continue to exist in our repo, it will just be an immutable tag that will no longer be upgraded. Our stable deprecation timeline will include a date when stable is then deleted. This timeline will be re-evaluated each month as we check the download rate of stable. Long term, we should not have a tag called stable in our repos that is very old, that will lead to dangerous confusion.
@c-ameron
Copy link

Hello,

Thanks for the write up on this.

I would actually like to keep stable. I've found it very useful in my use case, and even proposed that datadog implement it for their images.

In my use case, I'm using ECS Fargate, with this fluentbit image as a container for firelens logging. We previously had an outage due to a memory leak issue when aws-for-fluent-bit 2.15 was released #184 . Initially, we marked the aws-for-fluent-bit container as non-essential. However, there is currently no way to restart non-essential containers in Fargate aws/containers-roadmap#326 . This means in order to prevent any issues again, we mark the fluent bit sidecar as essential. This would mean if the logging sidecar fails, it would restart the whole task, not just the fluent bit container.

My fear, is that if a new version is released with a bug, it could automatically bring down my Fargate tasks/applications because of an issue with the fluent bit container. For me, having the stable tag is really useful. I don't care about the latest features, I just want to make sure that due to the already existing limitations/quirk on fargate, that my logging side car isn't going to crash my whole task due to an update to fluent bit.

I totally understand the other's user's viewpoint on semver. Maybe a $major-stable would work? I see your points about stable being mutable but in my case I don't really mind about that, I just care that it's stable.

Thanks

@PettitWesley PettitWesley changed the title NOTICE: We are considering deprecation the "stable" image tag NOTICE: We are considering deprecating the "stable" image tag Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants