Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab demo? / Headless server version? #6

Closed
INF800 opened this issue Jan 14, 2022 · 18 comments
Closed

Colab demo? / Headless server version? #6

INF800 opened this issue Jan 14, 2022 · 18 comments

Comments

@INF800
Copy link

INF800 commented Jan 14, 2022

If there isn't a colab demo already, will send a PR. Please let me know if there will be any OOM issues or any technical issues that I many face.

@Tom94
Copy link
Collaborator

Tom94 commented Jan 14, 2022

No, we haven't looked into colab at all, actually. It would be great to have, thank you!

The CPU memory usage is relatively tame. GPU memory usage is on the order of the size of the dataset (plus a few GB for temporary training, inference, and render buffers).

Unfortunately, the codebase is not particularly optimized for being memory-economical. We've been spoiled by the 3090's 24GB.

Example GPU RAM usages:

  • NeRF (bundled fox dataset): 7.56 GB
  • SDF (bundled armadillo mesh): 1.73 GB

You can read out the GPU RAM usage at the top of the UI
python_TGtfH9UuZ7

@pwais
Copy link
Contributor

pwais commented Jan 14, 2022

with Colab, you might need to get lucky and get a V100 to get anywhere (might be Colab Pro only?)... the P100s and K80s don't have Tensor cores, and somebody else found you can't seem to build tiny-cuda-nn with Pascal or Maxwell: NVlabs/tiny-cuda-nn#10

Tensor cores were introduced in Volta? So you'd need a V100, Titan V, or RTX 20xx or better to try this project.
Edit: sounds like tiny-cuda-nn might require Turing tensor cores, so no V100 support :( #13

What would be really cool is if tiny-cuda-nn and/or this project could provide a fused ops / network that does not require tensor cores and can work for the older GPU architectures-- it will be slower but would still probably be faster than altneratives (pytorch / tensorflow etc). TensorRT has fused ops for the older architectures and these might provide easy drop-ins (at least, likely for inference).

@myagues
Copy link
Contributor

myagues commented Jan 17, 2022

It should be possible to run on Colab now that lower compute capabilities are allowed, but I'm stuck at compilation with the following error:

[ 98%] Linking CXX executable testbed
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libGL.so: undefined reference to `_glapi_tls_Current'
collect2: error: ld returned 1 exit status
CMakeFiles/testbed.dir/build.make:115: recipe for target 'testbed' failed
make[2]: *** [testbed] Error 1
CMakeFiles/Makefile2:199: recipe for target 'CMakeFiles/testbed.dir/all' failed
make[1]: *** [CMakeFiles/testbed.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX shared library pyngp.cpython-37m-x86_64-linux-gnu.so
[100%] Built target pyngp
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

Here is a link for reproducing it.

@Tom94
Copy link
Collaborator

Tom94 commented Jan 17, 2022

Progress! Thanks for reporting!

  • Does python3 scripts/run.py --scene data/nerf/fox run?
  • There's currently a half-working CMake option NGP_BUILD_WITH_GUI=OFF that's supposed to build the project without GUI (and thus without linking to OpenGL and GLFW). I'm saying "half-working", because I haven't removed all references to GL-related symbols yet -- in light of your problem I'll prioritize this tomorrow and report back.

Edit: you can now run cmake -DNGP_BUILD_WITH_GUI=OFF <remaining params> to build instant-ngp without linking GLFW, ImGUI, and OpenGL for headless operation. Hopefully this will work around the linker error you encountered.

@pwais
Copy link
Contributor

pwais commented Jan 17, 2022

FWIW, at least EGL works in Colab, see e.g. the pyrender demo notebook: https://colab.research.google.com/drive/1pcndwqeY8vker3bLKQNJKr3B-7-SYenE?usp=sharing

There's no X11 though. It would be pretty nice to have imgui over websocket for Colab / Jupyter (e.g. via https://github.com/ggerganov/imgui-ws -- see in-browser demos ) but I don't see anybody has tried that yet.

@Tom94 Tom94 changed the title Colab demo? Colab demo? / Headless server version? Jan 18, 2022
@Tom94
Copy link
Collaborator

Tom94 commented Jan 18, 2022

If someone with access to a K80 machine could check whether it runs now, that'd be appreciated. :)

@pwais
Copy link
Contributor

pwais commented Jan 18, 2022

🔥 🔥 🔥 Thanks @Tom94 !! 🔥 🔥 🔥

I had to remove transforms_val.json from the lego scene to avoid an OOM during train, and I made some small mods for test time. I think the testbed just loads all images / rays into GPU memory (even temp memory) for all dataset transform_*.json meta files no matter if they get used or not, so skipping that might help avoid some OOMs.

Overall: The K80 is about 60x slower than a 30-series GPU, but also about 60x cheaper at the time of writing (YMMV but check ebay). I've seen 100x slowdown for pytorch stuff so 60x is pretty good.

(What about K40? Note that K40 seems to be compute 35, while K80 is compute 37. A K80 is basically two K40s on the same card. At the time of writing, an AWS p2.xlarge with a single K80 (two separate devices, 11gb memory each) is ~$0.90/hr or $0.25/hr spot price. In Google Colab free version or Kaggle, you're likely to get a K80 or slightly better).

Other than that one training change, here's what I see for Nerf lego train out of the box on a K80 :

python3 scripts/run.py --scene=data/nerf_synthetic/lego/ --mode=nerf --screenshot_transforms=data/nerf_synthetic/lego/transforms_test.json --screenshot_w=800 --screenshot_h=800 --screenshot_dir=data/nerf_synthetic/lego/screenshots --save_snapshot=data/nerf_synthetic/lego/snapshot.msgpack --n_steps=1000
21:38:39 INFO     Loading NeRF dataset from
21:38:39 INFO       data/nerf_synthetic/lego/transforms_test.json
21:38:39 INFO       data/nerf_synthetic/lego/transforms_train.json
21:38:39 SUCCESS  Loaded 300 images of size 800x800 after 0s
21:38:39 INFO       cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]]
21:38:39 INFO     Loading network config from: /opt/instant-ngp/configs/nerf/base.json
21:38:39 INFO     GridEncoding:  Nmin=16 b=1.38191 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 37. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 37. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
21:38:39 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
21:38:39 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
21:38:39 INFO       total_encoding_params=12196240 total_network_params=9728
Screenshot transforms from  data/nerf_synthetic/lego/transforms_test.json
Training:  34%|███████████████████████████████▉                                                               | 336/1000 [01:44<03:45,  2.94step/s, loss=0.00474]

Final train time (as reported) was 05:54 with loss=0.00275.

nvidia-smi during training:

|   1  Tesla K80           Off  | 00000000:04:00.0 Off |                    0 |
| N/A   45C    P0    92W / 149W |  11271MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |

So the K80 is about 60x slower than a 30-series GPU (6 seconds -> 360 seconds). In my experience, pytorch stuff (high i/o) is a 50x-100x lag, so this is pretty nice! Clearly the implementation helps a ton.

Once the model finishes training, I do get an OOM when rendering tries to start:
RuntimeError: Could not allocate memory: CUDA Error: cudaMalloc(&rawptr, n_bytes+DEBUG_GUARD_SIZE*2) failed with error out of memory

For rendering, I did this:

I see moderate GPU memory usage:

|   1  Tesla K80           Off  | 00000000:04:00.0 Off |                    0 |
| N/A   43C    P0    93W / 149W |   3319MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |

Rendering is about 11/sec per frame: 5/200 [00:55<36:09, 11.12s/it]

Most importantly, the render looks good, no different for 1000 iters as other GPU:
r_1

@Tom94
Copy link
Collaborator

Tom94 commented Jan 19, 2022

Awesome, thank you so much for testing!

You don’t actually need to delete transforms_val.json et al. You can directly pass a path to the training transforms to testbed — then it will train from just that one .json file rather than all it finds in the folder.

In the above, I believe you ended up training using also the testing transforms, so there’s more memory to be saved by not loading their respective images.

@pwais
Copy link
Contributor

pwais commented Jan 19, 2022

@Tom94 oh my bad! I have not been able to use the GUI yet so I didn't know --scene would concatenate everything (perhaps that doesn't happen in the GUI?). The README just provides the lego dir. So the command should be:

python3 scripts/run.py --scene=data/nerf_synthetic/lego/transforms_train.json ...

That does save memory, and, erm, results in more correct training too :) 🎉

@myagues
Copy link
Contributor

myagues commented Jan 19, 2022

Can confirm it works in Colab (link) (with a T4), only downside is it takes some 5-10 min of compile time given that Colab allocates only 2 CPUs.

Maybe an approach could be copying the compiled folder to the user's GDrive, so it could be reusable in next runs, and avoid recompilation, hoping you get the same GPU in the Colab lottery.

@pwais
Copy link
Contributor

pwais commented Jan 19, 2022

the repo builds & works in docker with nvidia/cuda base image(s) ( #20 ) so it's likely that building a binary in a CUDA 11.2 base image (seems that's what Colab was using there) could work in Colab.

5-10 mins isn't that bad tho, there are many Colab notebooks like Nerfies ( https://colab.research.google.com/github/google/nerfies/blob/main/notebooks/Nerfies_Capture_Processing.ipynb ) that can take 30 mins or more to set up or run.

Huggingface spaces wouldn't offer the notebook environment, but since this project has its own nice GUI, it might be a better match https://huggingface.co/spaces/launch

@tlightsky
Copy link

It should be possible to run on Colab now that lower compute capabilities are allowed, but I'm stuck at compilation with the following error:

[ 98%] Linking CXX executable testbed
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libGL.so: undefined reference to `_glapi_tls_Current'
collect2: error: ld returned 1 exit status
CMakeFiles/testbed.dir/build.make:115: recipe for target 'testbed' failed
make[2]: *** [testbed] Error 1
CMakeFiles/Makefile2:199: recipe for target 'CMakeFiles/testbed.dir/all' failed
make[1]: *** [CMakeFiles/testbed.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX shared library pyngp.cpython-37m-x86_64-linux-gnu.so
[100%] Built target pyngp
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2

Here is a link for reproducing it.

met exact issue in colab

@Tom94
Copy link
Collaborator

Tom94 commented Jan 30, 2022

Hi there, you can avoid this error by compiling testbed without GUI support

cmake -DNGP_BUILD_WITH_GUI=OFF <remaining params>

This way, it won't try to link to OpenGL, which you presumably don't need when running in colab. (You can still render out images as numpy arrays.)

@loboere
Copy link

loboere commented Feb 1, 2022

how to see the rendering in colab?

@pwais
Copy link
Contributor

pwais commented Feb 1, 2022

how to see the rendering in colab?

It's gonna be really hard to do that :( There might be a path thru websockets (e.g. https://github.com/ggerganov/imgui-ws ) or perhaps some way of standing up an X server / VNC on colab. The GUI is pretty killer though, it could be worth the hassle.

@Tom94
Copy link
Collaborator

Tom94 commented Feb 2, 2022

If rendering only a single image (or a handful) is desired, you can call testbed.render(width, height, spp=8, linear=False) to get a numpy array that you can imshow or similar.

spp refers to "samples per pixel" and it'll mostly wind up doing anti-aliasing for you, aside from getting rid of a little bit of raymarching noise. If performance is a concern, you can set it to 1 for fastest rendering.

Note that the returned colors will be sRGB if linear == False, which is likely what you want if you'd like to directly display the image or save it as png/jpg. Use linear colors only if you want to tonemap the image yourself.

@loboere
Copy link

loboere commented Feb 5, 2022

If rendering only a single image (or a handful) is desired, you can call testbed.render(width, height, spp=8, linear=False) to get a numpy array that you can imshow or similar.

spp refers to "samples per pixel" and it'll mostly wind up doing anti-aliasing for you, aside from getting rid of a little bit of raymarching noise. If performance is a concern, you can set it to 1 for fastest rendering.

Note that the returned colors will be sRGB if linear == False, which is likely what you want if you'd like to directly display the image or save it as png/jpg. Use linear colors only if you want to tonemap the image yourself.

how run testbed.render(width, height, spp=8, linear=False) ?
i get this error

NameError Traceback (most recent call last)
in ()
----> 1 testbed.render(width, height, spp=8, linear=False)

NameError: name 'testbed' is not defined

@Tom94
Copy link
Collaborator

Tom94 commented Feb 5, 2022

You'll have to first instantiate a testbed object and train it (or load a snapshot) before rendering makes sense. I recommend consulting scripts/run.py for an example of how it can be used. Its screenshot functionality uses the .render method.

@NVlabs NVlabs locked and limited conversation to collaborators Feb 16, 2022
@Tom94 Tom94 converted this issue into discussion #147 Feb 16, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants