Error on querying NVIDIA devices | OverflowError: Python int too large to convert to C long #160

JensWendt · 2023-08-08T13:31:04Z

Describe the bug

Freshly installed gpustat. Upon running gpustat I get:

Error on querying NVIDIA devices. Use --debug flag to see more details.
Python int too large to convert to C long

gpustat --debug:

Error on querying NVIDIA devices. Use --debug flag to see more details.
Python int too large to convert to C long

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\gpustat\cli.py", line 58, in print_gpustat
    gpu_stats = GPUStatCollection.new_query(debug=debug, id=id)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gpustat\core.py", line 604, in new_query
    gpu_info = get_gpu_info(handle)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gpustat\core.py", line 561, in get_gpu_info
    process = get_process_info(nv_process)
  File "C:\ProgramData\Anaconda3\lib\site-packages\gpustat\core.py", line 469, in get_process_info
    psutil.Process(pid=nv_process.pid)
  File "C:\ProgramData\Anaconda3\lib\site-packages\psutil\__init__.py", line 332, in __init__
    self._init(pid)
  File "C:\ProgramData\Anaconda3\lib\site-packages\psutil\__init__.py", line 361, in _init
    self.create_time()
  File "C:\ProgramData\Anaconda3\lib\site-packages\psutil\__init__.py", line 717, in create_time
    self._create_time = self._proc.create_time()
  File "C:\ProgramData\Anaconda3\lib\site-packages\psutil\_pswindows.py", line 688, in wrapper
    return fun(self, *args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\psutil\_pswindows.py", line 942, in create_time
    user, system, created = cext.proc_times(self.pid)
OverflowError: Python int too large to convert to C long

nvidia-smi:

C:\Users\MiN_Acc2>nvidia-smi
Tue Aug  8 15:25:02 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.27       Driver Version: 527.27       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000    WDDM* | 00000000:21:00.0 Off |                    0 |
| 33%   35C    P3    53W / 260W |  15655MiB / 46080MiB |     20%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     11700    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A     64008    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     90396    C+G   ...w5n1h2txyewy\SearchUI.exe    N/A      |
+-----------------------------------------------------------------------------+

Environment information:

OS: Windows Server 2019
NVIDIA Driver version: 527.27
gpustat version: 1.2.dev7+g7c09a0f
pynvml version: 12.535.77

It seems this bug has already been seen and solved over at nvitop XuehaiPan/nvitop#76

The text was updated successfully, but these errors were encountered:

Lunar13737 · 2023-08-25T01:53:18Z

I encountered the same problem, and it's been solved by downgrading the nvidia-ml-py to a former version 11.525.112 using pip install nvidia-ml-py==11.525.112. I hope it's helpful.

PyroGenesis · 2023-08-25T19:25:48Z

+1 same error

OS: Windows 10 Enterprise (Version: 2004, OS build: 19041.264)
NVIDIA Driver version: 536.99
The name(s) of GPU card: NVIDIA GeForce RTX 4090 x 2
gpustat version: gpustat 1.1.1

Thanks for the workaround @Lunar13737 , it worked for me.

mjmikulski · 2023-09-12T10:35:32Z

+1 and the workaround with downgrading nvidia-ml-py did not work for me :(

OS: Windows 11 Pro N
NVIDIA Driver Version: 535.98, CUDA Version: 12.2
GPU: NVIDIA gpuGeForce RTX 4070
gpustat version: gpustat 1.1.1

Any hints?

wookayin · 2023-10-30T23:02:50Z

I'd like to reproduce this issue to have a correct fix. But I've never seen the issue.

What we know from #161 (comment):

nvidia-ml-py=11.535.77 is buggy, only works for 535.43 and 535.86 (the OP's case):
- Does the problem go away if you install nvidia-ml-py==12.535.108? @JensWendt
It looks like that nvidia-ml-py 12.535.108 should correct all process-information related bugs, reverting the breaking changes in the previous versions. But this is just my guess, I'm not sure. I would need the nvidia-ml-py version installed on the system.

@Lunar13737, @PyroGenesis, @mjmikulski thanks for the datapoints. Could you please try upgrading nvidia-ml-py==12.535.108 and see if the OverflowError is gone?

PyroGenesis · 2023-10-31T22:55:53Z

Could you please try upgrading nvidia-ml-py==12.535.108 and see if the OverflowError is gone?

@wookayin I can confirm, overflow error does not occur in nvidia-ml-py 12.535.108

wookayin · 2023-10-31T23:03:47Z

@PyroGenesis Thanks. What was the previous version of nvidia-ml-py that resulted in this bug?

PyroGenesis · 2023-10-31T23:10:23Z

@wookayin I think it was most likely 12.535.77 that caused the error, though I'm not 100% sure because I didn't keep a record of it. I downgraded to 11.525.112 which worked, and now 12.535.108 works too.

Lunar13737 · 2023-11-01T01:16:56Z

@wookayin nvidia-ml-py 12.535.108 works for me, no overflow error

wookayin · 2023-11-01T02:05:00Z

Thanks. I can conclude that the root cause of this bug is essentially same as #161: one should use neither nvidia-ml-py=11.535.77 nor broken NVIDIA drivers >= 535.43, < 535.98.

gpustat will print warnings when any of these versions of nvml library or driver is detected, so we can close this issue without adding an unnecessary compatibility layer.

nvidia-ml-py==12.535.77 is a buggy version that breaks the struct for process information, and should not be used (unless NVIDIA driver is *also* buggy, 535.43, 535.54, and 535.86). The latest version nvidia-ml-py==12.535.108 fixes the problem and is still compatible with our supported drivers (R450+). To ensure users who will install gpustat 1.2.0 have a correct version of nvidia-ml-py version installed, we bump up the requirement. See #160 and #161 for more details.

JensWendt added the bug label Aug 8, 2023

wookayin added this to the 1.2 milestone Oct 16, 2023

wookayin added the pynvml label Oct 30, 2023

wookayin closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on querying NVIDIA devices | OverflowError: Python int too large to convert to C long #160

Error on querying NVIDIA devices | OverflowError: Python int too large to convert to C long #160

JensWendt commented Aug 8, 2023

Lunar13737 commented Aug 25, 2023

PyroGenesis commented Aug 25, 2023

mjmikulski commented Sep 12, 2023

wookayin commented Oct 30, 2023 •

edited

Loading

PyroGenesis commented Oct 31, 2023

wookayin commented Oct 31, 2023

PyroGenesis commented Oct 31, 2023

Lunar13737 commented Nov 1, 2023

wookayin commented Nov 1, 2023

Error on querying NVIDIA devices | OverflowError: Python int too large to convert to C long #160

Error on querying NVIDIA devices | OverflowError: Python int too large to convert to C long #160

Comments

JensWendt commented Aug 8, 2023

Lunar13737 commented Aug 25, 2023

PyroGenesis commented Aug 25, 2023

mjmikulski commented Sep 12, 2023

wookayin commented Oct 30, 2023 • edited Loading

PyroGenesis commented Oct 31, 2023

wookayin commented Oct 31, 2023

PyroGenesis commented Oct 31, 2023

Lunar13737 commented Nov 1, 2023

wookayin commented Nov 1, 2023

wookayin commented Oct 30, 2023 •

edited

Loading