Docker: multi-version CUDA #270

bertsky · 2021-07-23T14:35:53Z

Implements #263

bertsky · 2021-07-23T20:07:37Z

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

bertsky · 2021-07-23T20:20:57Z

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

It seems that the choice nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 as base image now requires at least nvidia-driver-470 on the host system. I have 440 and 465 on systems available to me, neither of them can work the image. But that means we are making a sacrifice here: to be able to support the newest Tensorflow/CUDA as well, we are forcing all host systems to get a newer driver. (It just might be that upgrading the driver is easier than upgrading CUDA. But it's still quite inconvenient.)

bertsky · 2021-07-27T21:56:02Z

If you have the Nvidia repo source, you can just update cuda-drivers-470 which will take care of all dependencies. (But a fresh installation might work, too.)

Anyway, this does work (based on a locally built ocrd/core-cuda from OCR-D/core#704).

for venv in /usr/local/sub-venv/headless-tf*; do . $venv/bin/activate && python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"; done

– yields True 3x

…-cuda

bertsky · 2022-01-17T09:43:47Z

Conflicting files

core

How are you supposed to keep PRs alive which involve subrepos then? I guess I'll have to update OCR-D/core#704 each time core master changes, and then in turn update here.

…-cuda

bertsky · 2022-02-04T12:19:50Z

So to sum up, we have two drawbacks here:

the base image size for the -cuda variants becomes even larger (for ocrd:core-cuda it's already 12 GB)
the host system needs a recent kernel driver to run the images (even for older CUDA)

But

considering what we gain here,
and how urgent this is (with detectron2 vs CUDA dependency bertsky/ocrd_detectron2#7 now even blocking our ocrd/all:maximum-cuda build),
and that this can probably go away as soon as we build thin images,
and that non-Docker and non-CUDA-Docker is not even affected,
and that it also provides a solution for native installations (i.e. running make cuda-ubuntu or merely make cuda-ldconfig as fixup),
and that dragging along these PRs with other changes (esp. if you want to combine them with other branches) is a lot of effort,

I'd say let's merge!

This reverts commit 6d7461f.

update core (multi-version core-cuda)

ed7d0b8

bertsky force-pushed the multi-cuda branch from e6494b0 to ed7d0b8 Compare July 23, 2021 20:03

Merge branch 'master' of https://github.com/OCR-D/ocrd_all into multi…

cd57411

…-cuda

bertsky added 2 commits January 17, 2022 09:53

update core

7b7b08b

Merge branch 'master' of https://github.com/OCR-D/ocrd_all into multi…

fd9e130

…-cuda

bertsky mentioned this pull request Jan 29, 2022

ocrd_all native installation fails due to tensorflow #235

Closed

Merge branch 'master' into multi-cuda

1627345

kba added 4 commits February 4, 2022 16:37

upgrade core

3d3ba59

execute docker.sh with -x for debugging

6d7461f

Revert "execute docker.sh with -x for debugging"

f9b986c

This reverts commit 6d7461f.

Merge remote-tracking branch 'origin/master' into multi-cuda

2b41f68

kba merged commit 2b41f68 into OCR-D:master Feb 4, 2022

bertsky mentioned this pull request Feb 21, 2022

multiple Tensorflow / CUDA versions again #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker: multi-version CUDA #270

Docker: multi-version CUDA #270

bertsky commented Jul 23, 2021

bertsky commented Jul 23, 2021

bertsky commented Jul 23, 2021

bertsky commented Jul 27, 2021

bertsky commented Jan 17, 2022

Conflicting files

bertsky commented Feb 4, 2022

Docker: multi-version CUDA #270

Docker: multi-version CUDA #270

Conversation

bertsky commented Jul 23, 2021

bertsky commented Jul 23, 2021

bertsky commented Jul 23, 2021

bertsky commented Jul 27, 2021

bertsky commented Jan 17, 2022

Conflicting files

bertsky commented Feb 4, 2022