Build worker container with custom conda channel #214

cisaacstern · 2024-08-28T23:52:20Z

This is an experiment to see if this approach will allow us to build smaller worker containers for deployment.

The basic concept is that:

Our worker prototype container, as built from the CI envs in this repo, is unmanageably large (in the range of 3GB)
@daniefdz has helped me analyze this and we have concluded that the source of almost all of the bloat in this container are python dependencies. Indeed, I took a look at disk usage in a shell on the container and the /env/python/lib/ directory for our conda env is ~2GB!
Currently many of our dependencies are installed with pip, because they are not available on conda-compatible channels
Conda/Mamba/Micromamba environment resolvers can potentially reduce duplication of dependencies if the packages installed by them are hosted on conda-compatible channels (which pip is not)
Long-term, the public conda-forge channel is a possibility, but that channel does not move fast enough for our pace of iteration (I have hosted packages there before), and is better suited IMHO to distribution of packages to users, not necessarily for hosting packages for our own deployment needs.
An alternative for us is to host our own "build channel" which can also just be a local directory. The artifacts in that channel must be build using some conda-compatible build system. The most performant/contemporary one I've identified is rattler-build
To make this work, we need to:
1. User rattler-build to build conda-compatible packages for all of our dependencies which do not already offer them. This needs to be done from bottom -> up, in terms of dependency resolution, because lower level packages are used for subsequent builds of higher-order ones. (E.g. the local build of ps-mem is used to build lithops, the local build of our lonboard fork is used to build ecoscope, etc.)
2. Define an environment.yml that is similar to the one we are currently using, except which pulls all of these dependencies from the local channel (rather than pip)
3. This may allow us to share binary dependencies more efficiently between packages in our environment, thus reducing bloat. It will also provide us a sound basis upon which to customize builds (for example... we could build GDAL in this pipeline without NetCDF support, because we are not using that).

So far, the tree of custom packages I have come up with looks like this:

psmem        er_client   lonboard (fork)
  |                 \    /
lithops            ecoscope
    \               /
    ecoscope-workflows

cisaacstern · 2024-08-29T00:09:20Z

The core approach here definitely works from a technical standpoint, as partially shown in CI here already, as well as further in my local testing. I just need to get to the bottom of this tree of packages (which is not that long) to get a sense of if it actually has a container size benefit for us. Will report back soon!

cisaacstern · 2024-09-06T08:40:47Z

For the scale of this PR, I concede that it is not as well documented (in prose, that is) as it could be, but I hope the clarity of the implementation makes up for that shortcoming.

In terms of what the implementation is, here are the main points:

Testing is now bifurcated into two ✌️ separate streams of work.
The first stream is based on a standard pip installation of ecoscope-workflows and covers all tests not marked as pytest.mark.requires_ecoscope_core. This covers the fundamental decorator tests, workflows compilation and execution logic, as well as all tasks which do not depend on ecoscope core. These tests run very quickly (~30 seconds) and can give the developer a fast indicator if something fundamental is broken, as well as serve as a helpful reference point for the type of work that is possible based on a pip-only installation.
The second stream (the Build tests) is more time-consuming but more comprehensive. In this stream, we build a conda package for ecoscope-workflows, which we then use to create a test (micromamba) environment which includes ecoscope core, which allows us to run all the tests (including the end-to-end workflows tests).
If the test-docker label is applied to the PR, the conda package for ecoscope-workflows is also used to build a docker container, in which the end to end tests are also run, both locally (i.e. locally within the GitHub Actions runner), as well as remotely (i.e. by deploying to GCP Cloud Run).
The container artifacts which are used for the docker tests are published here: https://github.com/orgs/wildlife-dynamics/packages?repo_name=ecoscope-workflows (currently private, lmk if you do not have access). This is useful for caching (many times they can be reused), and also for debugging (if something fails in CI, you can pull the same image which was used and run/test it locally).
As for the conda package for ecoscope-workflows, that is also downloadable for each workflow run, it is the Artifact labeled rattler-artifacts at the bottom of the workflow run summary page, e.g.: https://github.com/wildlife-dynamics/ecoscope-workflows/actions/runs/10734707742

This rattler-artifacts Artifact is a directory containing a custom conda channel with our ecoscope-workflows package. You can download it and use it to install into a conda environment like so (where file://$(pwd)/tmp/conda-channel is the unzipped artifact):

ecoscope-workflows/.github/workflows/test-build.yaml

Lines 124 to 128 in a7a4f2c

    
                     micromamba install ecoscope-workflows \ 
        
                     -c file://$(pwd)/tmp/conda-channel \ 
        
                     -c conda-forge \ 
        
                     --only-deps \ 
        
                     --yes

Eventually we can establish a release process where by we publish this artifacts to conda-forge or possibly https://prefix.dev/channels, but for now the local directory works. Note that the directory contains recipes for many other packages in addition ecoscope-workflows, those are our dependencies which do not otherwise offer a conda package release on a public registry (that I am aware of). And even if they did, it's possible we'd still want to manage (at least some of) them ourselves, so we can fine tune the transitive dependencies they bring in (i.e. opting for slimmer dependency sets than the official release might otherwise offer).

Note that there is a bit of noise in this PR because I ended up needing to bump pre-commit/mypy versions, which led to some linting changes that are not directly related to above.

Next steps:

Wire in the script runner. This PR just got too big and I wasn't able to include that here. This will require its own Dockerfile, which will share a base with the existing images in this PR. The build/test process for this additional container can be attached into the Build workflow here as an additional job / testing pathway.
Define a separate workflow for building and pushing container image releases. Notably the container builds here very-close-to production containers, but are not exactly the same as what we will want to deploy to production. For example, the version of ecoscope-workflows included in them is just from the head ref of the PR, and not from a release. So come to think of this, this is probably an on: release workflow. Which can push an official conda package release to a public registry, then use that conda release to build and tag an associated worker container. (🤔 As I am thinking of it, these will be workflow-specific, so I can think about how to approach that as part of the script-runner work.)
Benchmark import and runtime speed of the ecoscope-workflows install in the conda environment. I have the general impression that it's a little laggy, but I'm not positive if that's the case. If so, look into if/how that lag might be improved.

cisaacstern · 2024-09-06T22:28:07Z

To any prospective reviewers here ... as I mentioned to @walljcg earlier today... I am exploring https://pixi.sh/latest/ as an alternative to micromamba and at the risk of still not being done with this ... this is seemingly like an INCREDIBLY good alternative (faster, more reproducible, simpler). The fundamental structure here can remain somewhat similar, but we can probably reduce effort, build time, and complexity by using pixi.sh. I will follow up with updates.

cisaacstern · 2024-09-07T00:17:57Z

Ok yes so clearly pixi.sh is the way. TL;DR we can eliminate the entire step of caching a base image, because the solve from scratch is as fast as (or perhaps 30 seconds faster than) the micromamba solve building on the base.

Almost done with that over in #220, can finish Monday.

Reviewers can hold off until that is done, I will re-ping here 😄

cisaacstern added 2 commits August 28, 2024 16:42

rattler build first commit

68c2324

rattler build test workflow first commit

5112c3d

cisaacstern added 27 commits August 29, 2024 09:16

build lithops

768e27c

custom packages find

5ff88ef

erclient recipe

1b1ce07

build erclient

fb2257d

add lonboard

a8aaf04

build lonboard in ci

0d6e77f

build ecoscope-core

ae1ba87

build ecoscope-workflows

e965317

build ecoscope-workflows in ci

1df9559

build image

3e205a1

split dockerfiles

140329c

move env vars to lithops image

ad69247

move env vars to lithops image

36154e4

move channel priority to dockerfile

b1ee270

local channel mount continued

670ee58

build image fix

6ce4ea5

run tests in container

95405fe

constrain environment.yml to python >= 3.10

62e5338

mount examples

f14b891

add env vars to docker test

a807199

not a tty

5c7b1d9

Merge remote-tracking branch 'origin/main' into rattler

ae0658d

try to rebuild workflows from current rev

68dc66c

try to make build work

1f7ec03

interpolate strings

38a6fdc

oops fix envyaml script

043c0c4

hmm maybe this is more accurate

ae4f237

cisaacstern added 18 commits September 5, 2024 20:44

add test-pip-install for testing not requires ecoscope core tasks

267f3ec

split off not io and io tests in conda tests

250fc62

skip tests with transitive deps on ecoscope core

06226ea

almost finish fixing tests

ac27923

pre-commit autoupdate

6a06876

pre-commit args

e32c8e8

mypy version 3.12

4682062

test executors fixes wip

71845a6

do tests really pass now?

79946f6

lint decorators

b92e8f6

lint decorators

7a1943e

can graph tests pass now?

127a3ae

make the workflow name more readable

6bec99b

how about now

1fcb950

drop docker README, its stale

0a049f6

register io marker

a21ab6a

instance check for mock returns

53f04da

xfail lithops test

a7a4f2c

cisaacstern marked this pull request as ready for review September 6, 2024 08:41

cisaacstern requested review from Yun-Wu, atmorling and walljcg September 6, 2024 08:41

cisaacstern mentioned this pull request Sep 6, 2024

Use pixi.sh instead of micromamba #220

Closed

cisaacstern removed request for walljcg, atmorling and Yun-Wu September 7, 2024 03:40

cisaacstern marked this pull request as draft September 10, 2024 05:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build worker container with custom conda channel #214

Build worker container with custom conda channel #214

cisaacstern commented Aug 28, 2024 •

edited

Loading

cisaacstern commented Aug 29, 2024

cisaacstern commented Sep 6, 2024

cisaacstern commented Sep 6, 2024

cisaacstern commented Sep 7, 2024

Build worker container with custom conda channel #214

Are you sure you want to change the base?

Build worker container with custom conda channel #214

Conversation

cisaacstern commented Aug 28, 2024 • edited Loading

cisaacstern commented Aug 29, 2024

cisaacstern commented Sep 6, 2024

cisaacstern commented Sep 6, 2024

cisaacstern commented Sep 7, 2024

cisaacstern commented Aug 28, 2024 •

edited

Loading