Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

images: some refactoring and version bumps for arm64 compatible Dockerfiles #423

Merged
merged 1 commit into from
Sep 14, 2021

Conversation

consideRatio
Copy link
Collaborator

@consideRatio consideRatio commented Aug 21, 2021

This is a step along the way for #421 about compatibility on arm64 machines like Raspberry Pi.

@consideRatio consideRatio changed the title images: some refactoring and arm64 support images: some refactoring and version bumps for arm64 compatible Dockerfiles Aug 21, 2021
@@ -1,18 +1,19 @@
FROM python:3.8.6-slim-buster as dependencies
FROM python:3.8.11-slim-buster as dependencies
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General maintenance keeping the base Python version updated

LABEL MAINTAINER="Jim Crist-Harif"

RUN apt-get update \
&& apt-get install -y tini \
&& apt-get clean \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get clean is automatically run in this and most images via a config (/etc/apt/apt.conf.d/docker-clean).

Comment on lines -10 to +14
aiohttp==3.7.2 \
aiohttp==3.7.4 \
colorlog \
cryptography \
traitlets==4.3.3 \
traitlets==5.0.5 \
pyyaml \
kubernetes-asyncio==11.0.0
kubernetes-asyncio==12.1.1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Systematic version updates as I had to update aiohttp in order to have a arm64 (aka. aarch64) compatible conda package in the dask-gateway image.

@@ -2,27 +2,34 @@
FROM debian:buster-slim as miniconda
LABEL MAINTAINER="Jim Crist-Harif"

ARG CONDA_VERSION=py38_4.8.3
ARG CONDA_SHA256=879457af6a0bf5b34b48c12de31d4df0ee2f06a8e68768e5758c3293b2daf688
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of checksums as amd64 / arm64 will have different checksums and it is a bit tedious to keep track of both.

Comment on lines 16 to +23
# - Create user dask
RUN useradd -m -U -u 1000 dask

# - Install tini
# - Install miniconda build dependencies
# - Download miniconda and check the sha256 checksum
# - Install miniconda
RUN apt-get update \
&& apt-get install -y tini wget bzip2 \
&& rm -rf /var/lib/apt/lists/*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split this apart from a section below making so many different things in the same RUN layer that it became a bit hard to read.

@@ -32,39 +39,40 @@ RUN useradd -m -U -u 1000 dask \
&& echo "aggressive_update_packages: []" >> /home/dask/.condarc \
&& find /opt/conda/ -follow -type f -name '*.a' -delete \
&& /opt/conda/bin/conda clean -afy \
&& chown -R dask:dask /opt/conda \
&& apt-get autoremove --purge -y wget bzip2 \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not removing wget / bzip2 the image will not grow further because adding + removing won't help if its not done in the same layer, and since it is no longer in the same layer I don't remove these binaries later.

The reason for having one more RUN statements was for readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a rough idea of how much larger the image is now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rough idea is that it is minimal (<1MB) and not worth worrying about. I did not confirm this thoroughly in an e2e build comparison though.

$ apt-cache --no-all-versions show wget | grep '^Size: '
Size: 348824

$ apt-cache --no-all-versions show bzip2 | grep '^Size: '
Size: 34064

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so it seems like 992 kB + 195 kB.

$ apt-cache --no-all-versions show wget | grep Installed-Size
Installed-Size: 992

$ apt-cache --no-all-versions show bzip2 | grep Installed-Size
Installed-Size: 195

Comment on lines -49 to +56
ARG DASK_VERSION=2021.7.1
ARG DISTRIBUTED_VERSION=2021.7.1
ARG DASK_VERSION=2021.8.1
ARG DISTRIBUTED_VERSION=2021.8.1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just bumped these while I was at it so all the conda/pip packages installed were updated across now that I had to update at least aiohttp.

Comment on lines +74 to +75


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability as separation between the build steps

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me!

We recently added Arm support to the Dask docker images in dask/dask-docker#166.

I wonder if there are any learnings we can take from that PR, or anything we can reuse to ensure a bit more consistency between projects.

@consideRatio
Copy link
Collaborator Author

consideRatio commented Aug 23, 2021

We recently added Arm support to the Dask docker images in dask/dask-docker#166.

I wonder if there are any learnings we can take from that PR, or anything we can reuse to ensure a bit more consistency between projects.

Ah one key thing to keep consistent is a transition to mamba and mambaforge to install it. I'd be happy to help with that as well but it felt like out of scope for this PR to do that also.

Oh @holdenk and I worked on these kinds of things in jupyter/docker-stacks also btw! :) 👋 Happy to see you contribute here as well @holdenk!!! Want to help by contributing a review of this PR also?

@consideRatio
Copy link
Collaborator Author

I can address the conda to mamba transition as well, but I think it should be a dedicated PR. I opened #432 to represent this.

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, one question about image sizes inline.

I'd love it if we could come up with a policy about what library versions to use in the Dockerfiles, to avoid having repeated conversations about versions. Is there any reason not to use the latest version of all libraries (dask, distributed, aiohttp, etc) at the time of release?

One maybe special consideration is Python itself. Right now we're on 3.8.6 (bumping to 3.8.11). Is there any reason to keep that on 3.8? Or should we move to 3.9?

@@ -32,39 +39,40 @@ RUN useradd -m -U -u 1000 dask \
&& echo "aggressive_update_packages: []" >> /home/dask/.condarc \
&& find /opt/conda/ -follow -type f -name '*.a' -delete \
&& /opt/conda/bin/conda clean -afy \
&& chown -R dask:dask /opt/conda \
&& apt-get autoremove --purge -y wget bzip2 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a rough idea of how much larger the image is now?

@consideRatio
Copy link
Collaborator Author

consideRatio commented Sep 1, 2021

I'd love it if we could come up with a policy about what library versions to use in the Dockerfiles, to avoid having repeated conversations about versions. Is there any reason not to use the latest version of all libraries (dask, distributed, aiohttp, etc) at the time of release?

UPDATE: @TomAugspurger I created #437 regarding this question. Let's move this discussion to there.

I think a good policy is to use a requirements.in file that is automatically refrozen to requirements.txt by dependabot, and then we install such requirements.txt in the dockerfile. That helps users get info about what version has been used in each release via git history, we avoid a maintenance burden, and we get security warnings via automation etc.

We have a mix of conda/pip in these files though, making my idea a bit troublesome. I'd consider going 100% pip in these images or 100% conda and hope there is a good tool like there is for pip that dependabot can use for an environment.yml file

@consideRatio
Copy link
Collaborator Author

Go for merge on this?

Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @consideRatio!

@jcrist jcrist merged commit 8dd5b70 into dask:main Sep 14, 2021
@consideRatio
Copy link
Collaborator Author

Thanks @jacobtomlinson @TomAugspurger @jcrist for your review efforts!

@consideRatio consideRatio deleted the pr/support-image-build-on-arm64 branch March 25, 2022 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants