Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build worker container with custom conda channel #214

Draft
wants to merge 209 commits into
base: main
Choose a base branch
from
Draft

Conversation

cisaacstern
Copy link
Collaborator

@cisaacstern cisaacstern commented Aug 28, 2024

This is an experiment to see if this approach will allow us to build smaller worker containers for deployment.

The basic concept is that:

  • Our worker prototype container, as built from the CI envs in this repo, is unmanageably large (in the range of 3GB)
  • @daniefdz has helped me analyze this and we have concluded that the source of almost all of the bloat in this container are python dependencies. Indeed, I took a look at disk usage in a shell on the container and the /env/python/lib/ directory for our conda env is ~2GB!
  • Currently many of our dependencies are installed with pip, because they are not available on conda-compatible channels
  • Conda/Mamba/Micromamba environment resolvers can potentially reduce duplication of dependencies if the packages installed by them are hosted on conda-compatible channels (which pip is not)
  • Long-term, the public conda-forge channel is a possibility, but that channel does not move fast enough for our pace of iteration (I have hosted packages there before), and is better suited IMHO to distribution of packages to users, not necessarily for hosting packages for our own deployment needs.
  • An alternative for us is to host our own "build channel" which can also just be a local directory. The artifacts in that channel must be build using some conda-compatible build system. The most performant/contemporary one I've identified is rattler-build
  • To make this work, we need to:
    1. User rattler-build to build conda-compatible packages for all of our dependencies which do not already offer them. This needs to be done from bottom -> up, in terms of dependency resolution, because lower level packages are used for subsequent builds of higher-order ones. (E.g. the local build of ps-mem is used to build lithops, the local build of our lonboard fork is used to build ecoscope, etc.)
    2. Define an environment.yml that is similar to the one we are currently using, except which pulls all of these dependencies from the local channel (rather than pip)
    3. This may allow us to share binary dependencies more efficiently between packages in our environment, thus reducing bloat. It will also provide us a sound basis upon which to customize builds (for example... we could build GDAL in this pipeline without NetCDF support, because we are not using that).

So far, the tree of custom packages I have come up with looks like this:

psmem        er_client   lonboard (fork)
  |                 \    /
lithops            ecoscope
    \               /
    ecoscope-workflows

@cisaacstern
Copy link
Collaborator Author

The core approach here definitely works from a technical standpoint, as partially shown in CI here already, as well as further in my local testing. I just need to get to the bottom of this tree of packages (which is not that long) to get a sense of if it actually has a container size benefit for us. Will report back soon!

@cisaacstern
Copy link
Collaborator Author

For the scale of this PR, I concede that it is not as well documented (in prose, that is) as it could be, but I hope the clarity of the implementation makes up for that shortcoming.

In terms of what the implementation is, here are the main points:

  • Testing is now bifurcated into two ✌️ separate streams of work.
  • The first stream is based on a standard pip installation of ecoscope-workflows and covers all tests not marked as pytest.mark.requires_ecoscope_core. This covers the fundamental decorator tests, workflows compilation and execution logic, as well as all tasks which do not depend on ecoscope core. These tests run very quickly (~30 seconds) and can give the developer a fast indicator if something fundamental is broken, as well as serve as a helpful reference point for the type of work that is possible based on a pip-only installation.
  • The second stream (the Build tests) is more time-consuming but more comprehensive. In this stream, we build a conda package for ecoscope-workflows, which we then use to create a test (micromamba) environment which includes ecoscope core, which allows us to run all the tests (including the end-to-end workflows tests).
  • If the test-docker label is applied to the PR, the conda package for ecoscope-workflows is also used to build a docker container, in which the end to end tests are also run, both locally (i.e. locally within the GitHub Actions runner), as well as remotely (i.e. by deploying to GCP Cloud Run).
  • The container artifacts which are used for the docker tests are published here: https://github.com/orgs/wildlife-dynamics/packages?repo_name=ecoscope-workflows (currently private, lmk if you do not have access). This is useful for caching (many times they can be reused), and also for debugging (if something fails in CI, you can pull the same image which was used and run/test it locally).
  • As for the conda package for ecoscope-workflows, that is also downloadable for each workflow run, it is the Artifact labeled rattler-artifacts at the bottom of the workflow run summary page, e.g.: https://github.com/wildlife-dynamics/ecoscope-workflows/actions/runs/10734707742
  • This rattler-artifacts Artifact is a directory containing a custom conda channel with our ecoscope-workflows package. You can download it and use it to install into a conda environment like so (where file://$(pwd)/tmp/conda-channel is the unzipped artifact):
    micromamba install ecoscope-workflows \
    -c file://$(pwd)/tmp/conda-channel \
    -c conda-forge \
    --only-deps \
    --yes
  • Eventually we can establish a release process where by we publish this artifacts to conda-forge or possibly https://prefix.dev/channels, but for now the local directory works. Note that the directory contains recipes for many other packages in addition ecoscope-workflows, those are our dependencies which do not otherwise offer a conda package release on a public registry (that I am aware of). And even if they did, it's possible we'd still want to manage (at least some of) them ourselves, so we can fine tune the transitive dependencies they bring in (i.e. opting for slimmer dependency sets than the official release might otherwise offer).

Note that there is a bit of noise in this PR because I ended up needing to bump pre-commit/mypy versions, which led to some linting changes that are not directly related to above.

Next steps:

  • Wire in the script runner. This PR just got too big and I wasn't able to include that here. This will require its own Dockerfile, which will share a base with the existing images in this PR. The build/test process for this additional container can be attached into the Build workflow here as an additional job / testing pathway.
  • Define a separate workflow for building and pushing container image releases. Notably the container builds here very-close-to production containers, but are not exactly the same as what we will want to deploy to production. For example, the version of ecoscope-workflows included in them is just from the head ref of the PR, and not from a release. So come to think of this, this is probably an on: release workflow. Which can push an official conda package release to a public registry, then use that conda release to build and tag an associated worker container. (🤔 As I am thinking of it, these will be workflow-specific, so I can think about how to approach that as part of the script-runner work.)
  • Benchmark import and runtime speed of the ecoscope-workflows install in the conda environment. I have the general impression that it's a little laggy, but I'm not positive if that's the case. If so, look into if/how that lag might be improved.

@cisaacstern cisaacstern marked this pull request as ready for review September 6, 2024 08:41
@cisaacstern
Copy link
Collaborator Author

To any prospective reviewers here ... as I mentioned to @walljcg earlier today... I am exploring https://pixi.sh/latest/ as an alternative to micromamba and at the risk of still not being done with this ... this is seemingly like an INCREDIBLY good alternative (faster, more reproducible, simpler). The fundamental structure here can remain somewhat similar, but we can probably reduce effort, build time, and complexity by using pixi.sh. I will follow up with updates.

@cisaacstern
Copy link
Collaborator Author

Ok yes so clearly pixi.sh is the way. TL;DR we can eliminate the entire step of caching a base image, because the solve from scratch is as fast as (or perhaps 30 seconds faster than) the micromamba solve building on the base.

Almost done with that over in #220, can finish Monday.

Reviewers can hold off until that is done, I will re-ping here 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-docker Request docker builds for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant