-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move dev server image build into main CI workflow #141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, had a couple more thoughts after chewing on this for a bit. I'm all for easier iteration with the model
repo, but I'm wondering if we can keep some of the protections we have in place and only work around them when we have to. For instance, with this change CI would now build server-dev
and server-sandbox
on every push to a branch, right? Usually we only need to use the image when running the FTs after merging to master, so we wouldn't need all the intermediate builds. This could waste some compute (and docker storage? are we charged for that?). Also, CI does these new builds before running the unit tests, so every push to branch would take longer to get basic correctness feedback even when not trying to quickly deploy to model...and we could potentially build buggy images.
Is there a way to keep our builds fast and cheap in the usual case, but still provide a convenient break-glass solution when a dev needs to quickly deploy to the model
repo? Github supports automatically running a given workflow if a branch matches some regex, and elsewhere in industry I've seen that used to enable specific DevX workflows like this. Maybe we could have a build-server-breakglass.yaml
workflow or something that's triggered if the branch name contains breakglass
or buildme
or something? We could also keep the new builds in build_test_push.yaml
with a branch name condition or something, but then we could potentially try to build and publish the same image twice. In a separate workflow, we could just use different suffixes entirely so we avoid conflicts (e.g. breakglass-server-dev
instead of server-dev
).
Anyways, lemme know what you think. Not trying to unnecessarily complicate things here, just want to discuss some of the potential downsides to the standard case where we're not trying to quickly deploy.
@bryanwilliams1025 great points, thanks for taking another look at this! I messed around with a different approach over the last week and think I found something that checks all of the boxes 🤞 You're correct that previous approach in this PR would be wasting a lot of disk space by forcing server + sandbox image pushes on every commit, vs. just when the func test ran. One minor clarification on compute time though - only the first step (building the "base" image) takes a long time. The other build phases are essentially just adding a docker This got me thinking about how I could accomplish what I wanted (getting server images I can test with built as quickly as possible in the main CI flow) while also not pushing any more images to GCR than we do today. Here's what I came up with:
This actually saves us How I tested:
How does this sound to you? |
uses: docker/build-push-action@v2 | ||
with: | ||
context: . | ||
push: true | ||
push: false | ||
load: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load: true
is what tells docker to push/pull from the local docker repo. vs remote
@@ -54,6 +54,8 @@ jobs: | |||
- name: Set up Docker Buildx | |||
id: buildx | |||
uses: docker/setup-buildx-action@master | |||
with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed due to a weird detail of how load: true
works. Some context here, it failed without this and then worked when I added it, so I didn't dig too much deeper 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beautiful, i like the new approach! strikes a great balance between unblocking deploying to model
and keeping storage and build times minimal. appreciate you iterating here!
uses: docker/build-push-action@v2 | ||
with: | ||
context: . | ||
push: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we specify push: true
here and get rid of the following Push server-dev image
step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm great question, I vaguely recall hitting some issue with this but I can't find the relevant stack overflow post now lol, so let's try it 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, at least the error is explicit 😀
Error: buildx failed with: ERROR: push and load may not be set together at the moment
https://replicahq.atlassian.net/browse/RAD-6212
Currently, our CI setup doesn’t build a docker image that can be used to spin up routing servers in the model repo - this step happens only when you kick off a functional test CI flow manually. In theory, this separation is intended, as we technically don’t want to “stamp” a runnable server image with dev before we’ve even run unit tests.
However, in practice, I basically never sit and wait for all unit tests to pass when I’m iterating quickly and need a runnable image to use ASAP. Instead, I wait until the “unit test” image finished building (the first part of the docker build process), then right away kick off a functional test build to get a working server image I can plug into the model repo.
This change moves that server dev image building step to the main CI flow. The change technically goes against the intended pattern of our existing CI/functional test flow, but in practice would save me loads of time that’s currently wasted waiting for CI to pass a particular step and then manually kicking off another flow