Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: multi-version CUDA #270

Merged
merged 9 commits into from
Feb 4, 2022
Merged

Docker: multi-version CUDA #270

merged 9 commits into from
Feb 4, 2022

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Jul 23, 2021

Implements #263

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 23, 2021

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 23, 2021

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

It seems that the choice nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 as base image now requires at least nvidia-driver-470 on the host system. I have 440 and 465 on systems available to me, neither of them can work the image. But that means we are making a sacrifice here: to be able to support the newest Tensorflow/CUDA as well, we are forcing all host systems to get a newer driver. (It just might be that upgrading the driver is easier than upgrading CUDA. But it's still quite inconvenient.)

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 27, 2021

If you have the Nvidia repo source, you can just update cuda-drivers-470 which will take care of all dependencies. (But a fresh installation might work, too.)

Anyway, this does work (based on a locally built ocrd/core-cuda from OCR-D/core#704).

for venv in /usr/local/sub-venv/headless-tf*; do . $venv/bin/activate && python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"; done

– yields True 3x

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 17, 2022

Conflicting files

core

How are you supposed to keep PRs alive which involve subrepos then? I guess I'll have to update OCR-D/core#704 each time core master changes, and then in turn update here.

@bertsky
Copy link
Collaborator Author

bertsky commented Feb 4, 2022

So to sum up, we have two drawbacks here:

  • the base image size for the -cuda variants becomes even larger (for ocrd:core-cuda it's already 12 GB)
  • the host system needs a recent kernel driver to run the images (even for older CUDA)

But

  • considering what we gain here,
  • and how urgent this is (with detectron2 vs CUDA dependency bertsky/ocrd_detectron2#7 now even blocking our ocrd/all:maximum-cuda build),
  • and that this can probably go away as soon as we build thin images,
  • and that non-Docker and non-CUDA-Docker is not even affected,
  • and that it also provides a solution for native installations (i.e. running make cuda-ubuntu or merely make cuda-ldconfig as fixup),
  • and that dragging along these PRs with other changes (esp. if you want to combine them with other branches) is a lot of effort,

I'd say let's merge!

@kba kba merged commit 2b41f68 into OCR-D:master Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants