Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-arch docker images #399

Merged
merged 14 commits into from
Jun 28, 2024
Merged

Conversation

yuvipanda
Copy link
Member

@yuvipanda yuvipanda commented Oct 19, 2022

  • Install appropriate version of mambaforge based on build architecture
  • Generate lockfiles on base-notebook and pangeo-notebook for:
    • osx-64 (x86_64)
    • osx-arm64 (Apple Sillicon M1/M2/M3 chips)
    • linux-aarch64 (arm64)
    • ppc64le (POWER8/9)

Ref #396

@github-actions
Copy link
Contributor

Binder 👈 Try on Mybinder.org!
Binder 👈 Try on Pangeo GCP Binder!
Binder 👈 Try on Pangeo AWS Binder!

The version of mambaforge appropriate for the architecture that
the docker build is running on is built this way.

Ref pangeo-data#396
@yuvipanda
Copy link
Member Author

conda-lock also needs to know though.

@yuvipanda yuvipanda marked this pull request as draft October 19, 2022 03:24
@yuvipanda
Copy link
Member Author

/condalock

@yuvipanda yuvipanda changed the title Don't hardcode CPU architecture in dockerfile Add support for multi-arch docker images Oct 19, 2022
- name: Specify which architectures to build lockfiles for
run: |
# space delimited list of architectures to send to conda-lock
echo "BUILD_ARCHS='64 aarch64 ppc64le'" > ${GITHUB_ENV}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'd be best to just start with one additional? aarch64 and add others down the line if need be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottyhq yeah i agree I mostly added both to test what packages are needed to be fixed for ppc64le

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's build just for linux-aarch64 first as mentioned at #396 (comment)

conda-lock lock --mamba -k explicit -f environment.yml -p linux-64
../generate-packages-list.py conda-linux-64.lock > packages.txt
for ARCH in ${BUILD_ARCHS}; do
conda-lock lock --mamba -k explicit -f environment.yml -p linux-${ARCH}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a good time to switch to a unified lockfile and actually add conda-lock to the base environment. (Our current usage of conda-lock pre-dates the unified lockfile format now shown here https://github.com/conda-incubator/conda-lock#basic-usage). Installing the environment with conda-lock would actually recognize the host platform and choose the correct set of packages:

conda-lock lock -f environment.yml -p linux-64 aarch64
conda-lock install -n pangeo-notebook

the drawback is that if someone just has conda or mamba installed they dont recognize the new unified lockfile format... so you could not do conda env create -f conda-lock.yml ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottyhq ok that's pretty cool and we clearly should just move to that.

Copy link
Member

@weiji14 weiji14 Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking that we might need to update from conda-lock=1.4 to conda-lock=2.x first at

Also FYI, mamba doesn't support installing from the unified conda-lock.yml, but micromamba does according to https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html#conda-lock-yaml-spec-files. See also mamba-org/mamba#1884 (comment)

Makefile Outdated
conda-lock lock --mamba -k explicit -f environment.yml -p linux-aarch64; \
../generate-packages-list.py conda-linux-aarch64.lock > packages-aarch64.txt; \
conda-lock lock --mamba -k explicit -f environment.yml -p linux-ppc64le; \
../generate-packages-list.py conda-linux-ppc64le.lock > packages-ppc64le.txt; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above on unified lockfile. but i still really like a simple text file as well though! maybe we create a artifacts subdirectory to organize files not actually used by docker build?

Makefile Outdated
conda-lock lock --mamba -k explicit -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-aarch64; \
../generate-packages-list.py conda-linux-aarch64.lock > packages-aarch64.txt; \
conda-lock lock --mamba -k explicit -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-ppc64le; \
../generate-packages-list.py conda-linux-ppc64le.lock > packages-ppc64le.txt; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I have rarely used this Makefile and am always just building things via GitHub Actions ;) I've put in a little more time to embracing docker buildx and think it might be the right tool to evolve this repo. in particular the bake tool for several related images https://github.com/scottyhq/pangeo-buildx#usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottyhq I completely agree - I love that repo and think that's what this should be :D What do you think needs to happen to complete that transition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Mainly just more time :) I abandoned it for a bit with the "dont fix what isn't broken" rationale :) but it's also nice to keep evolving so I'll try to open a PR here this week and would love your review

@weiji14 weiji14 linked an issue Jun 27, 2023 that may be closed by this pull request
@scottyhq scottyhq marked this pull request as ready for review June 26, 2024 17:08
@scottyhq scottyhq requested a review from weiji14 June 26, 2024 17:08
@scottyhq
Copy link
Member

@yuvipanda @weiji14 I'm thinking we should go ahead and merge this. Likely there will be some things to work out, but linux and mac ARM support will be a nice addition

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvipanda @weiji14 I'm thinking we should go ahead and merge this. Likely there will be some things to work out, but linux and mac ARM support will be a nice addition

Yep, agree with going ahead with this. Let me just test running conda-lock locally on some of these images to check that there are no serious issues. Also a few minor suggestions below 👇

base-image/Dockerfile Outdated Show resolved Hide resolved
pangeo-notebook/environment.yml Outdated Show resolved Hide resolved
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tested things locally, and locking on base-notebook and pangeo-notebook worked, but ml-notebook and pytorch-notebook didn't. Maybe we just do base-/pangeo- for now in this PR, and I can do a follow-up one for ml-/pytorch-?

.github/workflows/CondaLock.yml Outdated Show resolved Hide resolved
Comment on lines 58 to 59
cd ml-notebook
conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-64 -p linux-aarch64 -p osx-64 -p osx-arm64
Copy link
Member

@weiji14 weiji14 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow doesn't have a linux-aarch64 package yet - conda-forge/tensorflow-feedstock#136

Could not lock the environment for platform linux-aarch64
Could not solve for environment specs
The following package could not be installed
└─ tensorflow >=2.15.0 cuda120* does not exist (perhaps a typo or a missing channel).

There are osx-64/osx-arm64 packages on conda-forge, though only CPU builds. Not sure if the osx-arm64 build actually uses the Apple Sillicon GPU.

Comment on lines 67 to 68
cd pytorch-notebook
conda-lock lock -f environment.yml -f ../pangeo-notebook/environment.yml -f ../base-notebook/environment.yml -p linux-64 -p linux-aarch64 -p osx-64 -p osx-arm64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytorch does have linux-aarch64/osx-64/osx-arm64 packages on conda-forge, but we set a cuda120 pin in the environment.yml file so locking fails:

Could not lock the environment for platform linux-aarch64
Could not solve for environment specs
The following packages are incompatible
├─ pytorch >=2.1.0 cuda120* is not installable because it requires
│  └─ __cuda, which is missing on the system;
└─ torchvision >=0.16.1 cuda120* does not exist (perhaps a typo or a missing channel).

scottyhq and others added 3 commits June 26, 2024 17:12
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Comment on lines +45 to +46
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will need to use virtualization, until someone can pay us to use native arm64 runners (https://github.blog/changelog/2024-06-24-github-actions-ubuntu-24-04-image-now-available-for-arm64-runners) 🙂

@scottyhq
Copy link
Member

Added some commits to update the Makefile and Actions to actually build for multiple platforms, and it seems to be working!

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving since this is mostly ready, but have a look at the suggestions first. We'll probably need to disable osx-64 for ml-notebook and pytorch-notebook because of the (Linux-only) CUDA pin set in the environment.yml file.

Comment on lines +45 to +46
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will need to use virtualization, until someone can pay us to use native arm64 runners (https://github.blog/changelog/2024-06-24-github-actions-ubuntu-24-04-image-now-available-for-arm64-runners) 🙂

.github/workflows/CondaLock.yml Outdated Show resolved Hide resolved
.github/workflows/CondaLock.yml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
base-image/Dockerfile Outdated Show resolved Hide resolved
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@scottyhq scottyhq merged commit fc92d8b into pangeo-data:master Jun 28, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Images availability for OS/ARCH other than linux/amd64
4 participants