Slow loading time #385

Atcold · 2017-10-28T04:37:53Z

Any idea why require 'cudnn' may take 45 seconds on my machine?

th> require 'cunn';
                                                                      [0.9818s]
th> require 'cudnn';
                                                                      [44.7415s]

Edit: Oh, maybe this is related.
Edit2: System info is the following.

Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core) 
Release:        7.4.1708
Codename:       Core

The text was updated successfully, but these errors were encountered:

Atcold · 2017-10-29T23:48:03Z

Hmm, other server here take 10 to 15 seconds... And the one above 40 to 45 seconds...
How can I debug this?

clement-masson · 2017-10-30T10:40:03Z

'require cudnn' initialize some stuff on every visible GPU. If you're on a machine with many GPUs, it may be the cause of the long loading time.

We've got a machine with 4 GPUs. Setting CUDA_VISIBLE_DEVICES=0 (for instance) reduce the loading time by almost a factor 4. On our machine, it takes <10sec though ...

Atcold · 2017-10-30T22:27:13Z

@clement-masson, right. I just saw that. Still, I believe some things must be wrong. I've contacted the IT (I don't have sudo here...).

ajhool · 2019-01-26T08:37:39Z

I'm finding that require cudnn on a volta takes 10 minutes. @clement-masson , any idea how I can profile the require function to see what exactly is taking so long with the volta architecture?

ajhool · 2019-01-31T04:23:46Z

@nagadomi , I'm using your distro with cuda9/10 support. Any ideas why the bindings might be struggling with the Volta architecture?

nagadomi · 2019-01-31T06:35:15Z

@ajhool
If you are using Docker, it may be caused by JIT Caching.
See nagadomi/waifu2x#138 ,
https://github.com/nagadomi/waifu2x/pull/138/files#diff-04c6e90faac2675aa89e2176d2eec7d8

ajhool · 2019-01-31T06:39:08Z

I am using docker and I'll give that a shot, thanks!

ajhool · 2019-02-01T05:16:54Z

So far, the JIT Caching fix does not appear to be working, although I'm having a hard time debugging Torch/Lua without a debug environment or print statements. I believe I have the cache and cache path configured correctly and the load time is still about 10 minutes.

The fact that the code executes quickly on K80's but takes so much longer on Voltas makes me suspect there's more to it than just luajit. Will continue to try and get to the bottom of this.

ajhool mentioned this issue Jan 26, 2019

require cudnn takes 10 minutes on a Volta with 1 GPU (Cuda 9, cudnn 7.1) #394

Open

ajhool mentioned this issue Mar 15, 2019

Confusion about ~/.nv/ComputeCache behavior with docker nagadomi/waifu2x#272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow loading time #385

Slow loading time #385

Atcold commented Oct 28, 2017 •

edited

Loading

Atcold commented Oct 29, 2017

clement-masson commented Oct 30, 2017

Atcold commented Oct 30, 2017

ajhool commented Jan 26, 2019

ajhool commented Jan 31, 2019

nagadomi commented Jan 31, 2019

ajhool commented Jan 31, 2019

ajhool commented Feb 1, 2019

Slow loading time #385

Slow loading time #385

Comments

Atcold commented Oct 28, 2017 • edited Loading

Atcold commented Oct 29, 2017

clement-masson commented Oct 30, 2017

Atcold commented Oct 30, 2017

ajhool commented Jan 26, 2019

ajhool commented Jan 31, 2019

nagadomi commented Jan 31, 2019

ajhool commented Jan 31, 2019

ajhool commented Feb 1, 2019

Atcold commented Oct 28, 2017 •

edited

Loading