Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] (Windows) nvitop lists no processes; OverflowError: Python int too large to convert to C long #76

Closed
3 tasks done
God-damnit-all opened this issue Jun 20, 2023 · 2 comments · Fixed by #79
Closed
3 tasks done
Assignees
Labels
api Something related to the core APIs bug Something isn't working pynvml Something related to the `nvidia-ml-py` package upstream Something upstream related

Comments

@God-damnit-all
Copy link

God-damnit-all commented Jun 20, 2023

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.1.2

Operating system and version

Windows 10 Build 19045.2965

NVIDIA driver version

535.98.0

NVIDIA-SMI

Tue Jun 20 15:48:35 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070      WDDM  | 00000000:01:00.0  On |                  N/A |
|  6%   64C    P0              39W / 151W |   864MiB /  8192MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      5184    C+G   C:\Windows\explorer.exe                   N/A      |
|    0   N/A  N/A      6552    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A      9468    C+G   ...on\114.0.1823.51\msedgewebview2.exe    N/A      |
|    0   N/A  N/A     12416    C+G   ...m Files\Mozilla Firefox\firefox.exe    N/A      |
|    0   N/A  N/A     12988    C+G   ...on\114.0.1823.43\msedgewebview2.exe    N/A      |
|    0   N/A  N/A     17368    C+G   ...5n1h2txyewy\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     22068    C+G   ...\cef\cef.win7x64\steamwebhelper.exe    N/A      |
|    0   N/A  N/A     22768    C+G   ...al\Discord\app-1.0.9013\Discord.exe    N/A      |
|    0   N/A  N/A     24012    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     24780    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe    N/A      |
|    0   N/A  N/A     28860    C+G   ...m Files\Mozilla Firefox\firefox.exe    N/A      |
+---------------------------------------------------------------------------------------+

Python environment

Installed with a virtual environment via python -m venv which downloaded cachetools-5.3.1 colorama-0.4.6 nvidia-ml-py-11.525.112 nvitop-1.1.2 psutil-5.9.5 termcolor-2.3.0 windows-curses-2.3.1

3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] win32

Problem description

Running nvitop doesn't list processes and says Gathering process status forever. After quitting the program, there are OverflowError errors.

Steps to Reproduce

Just ran nvitop within the virtual environment.

Traceback

Exception in thread process-snapshot-daemon:
Traceback (most recent call last):
  File "C:\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "C:\Python311\Lib\threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 281, in _snapshot_target
    self.take_snapshots()
  File "D:\test\venv\Lib\site-packages\cachetools\__init__.py", line 702, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 256, in take_snapshots
    snapshots = GpuProcess.take_snapshots(self.processes, failsafe=True)
                                          ^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 305, in processes
    return list(
           ^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 306, in <genexpr>
    itertools.chain.from_iterable(device.processes().values() for device in self.devices),
                                  ^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\device.py", line 1661, in processes
    proc = processes[p.pid] = self.GPU_PROCESS_CLASS(
                              ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\library\process.py", line 26, in __new__
    instance = super().__new__(cls, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\process.py", line 474, in __new__
    instance._host = HostProcess(pid)
                     ^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\process.py", line 204, in __new__
    host.Process._init(instance, pid, True)
  File "D:\test\venv\Lib\site-packages\psutil\__init__.py", line 361, in _init
    self.create_time()
  File "D:\test\venv\Lib\site-packages\psutil\__init__.py", line 719, in create_time
    self._create_time = self._proc.create_time()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\psutil\_pswindows.py", line 694, in wrapper
    return fun(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\psutil\_pswindows.py", line 948, in create_time
    user, system, created = cext.proc_times(self.pid)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python int too large to convert to C long
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\test\venv\Scripts\nvitop.exe\__main__.py", line 7, in <module>
  File "D:\test\venv\Lib\site-packages\nvitop\cli.py", line 376, in main
    ui.print()
  File "D:\test\venv\Lib\site-packages\nvitop\gui\ui.py", line 203, in print
    self.main_screen.print()
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\__init__.py", line 152, in print
    print_width = min(panel.print_width() for panel in self.container)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\__init__.py", line 152, in <genexpr>
    print_width = min(panel.print_width() for panel in self.container)
                      ^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 551, in print_width
    self.ensure_snapshots()
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 252, in ensure_snapshots
    self.snapshots = self.take_snapshots()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\cachetools\__init__.py", line 702, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 256, in take_snapshots
    snapshots = GpuProcess.take_snapshots(self.processes, failsafe=True)
                                          ^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 305, in processes
    return list(
           ^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\screens\main\process.py", line 306, in <genexpr>
    itertools.chain.from_iterable(device.processes().values() for device in self.devices),
                                  ^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\device.py", line 1661, in processes
    proc = processes[p.pid] = self.GPU_PROCESS_CLASS(
                              ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\gui\library\process.py", line 26, in __new__
    instance = super().__new__(cls, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\process.py", line 474, in __new__
    instance._host = HostProcess(pid)
                     ^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\nvitop\api\process.py", line 204, in __new__
    host.Process._init(instance, pid, True)
  File "D:\test\venv\Lib\site-packages\psutil\__init__.py", line 361, in _init
    self.create_time()
  File "D:\test\venv\Lib\site-packages\psutil\__init__.py", line 719, in create_time
    self._create_time = self._proc.create_time()
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\psutil\_pswindows.py", line 694, in wrapper
    return fun(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\test\venv\Lib\site-packages\psutil\_pswindows.py", line 948, in create_time
    user, system, created = cext.proc_times(self.pid)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python int too large to convert to C long

Logs

Only change is the addition of this line:

[DEBUG] 2023-06-20 15:54:54,361 nvitop.api.libnvml::nvmlDeviceGetMemoryInfo: NVML memory info version 2 is available.

Expected behavior

I expected nvitop to list the processes similar to how running nvidia-smi does.

Additional context

image
(A few things in this screenshot were hidden for privacy purposes)

@God-damnit-all God-damnit-all added the bug Something isn't working label Jun 20, 2023
@XuehaiPan XuehaiPan added upstream Something upstream related pynvml Something related to the `nvidia-ml-py` package labels Jun 21, 2023
@XuehaiPan
Copy link
Owner

XuehaiPan commented Jun 21, 2023

Hi @ImportTaste, thanks for raising this. I have encountered the same issue before. I think this would be a bug on the upstream (nvidia-ml-py) with the incompatible NVIDIA driver. The nvidia-ml-py returns invalid PIDs.

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: [p.pid for p in pynvml.nvmlDeviceGetComputeRunningProcesses(handle)]
Out[4]:
[1184,
 0,
 4294967295,
 4294967295,
 16040,
 0,
 4294967295,
 4294967295,
 19984,
 0,
 4294967295,
 4294967295,
 20884,
 0,
 4294967295,
 4294967295,
 26308,
 0,
 4294967295,
 4294967295,
 16336,
 0,
 4294967295,
 4294967295,
 5368,
 0,
 4294967295,
 4294967295,
 19828,
 0,
 4294967295]

I haven't found a solution for this yet. This may be due to an internal API change in the NVML library. We may need to wait for the next nvidia-ml-py release.

As a temporary workaround, you could downgrade your NVIDIA driver version.

See also:

@XuehaiPan
Copy link
Owner

XuehaiPan commented Jul 7, 2023

Hi @ImportTaste, a new release of nvidia-ml-py with version 12.535.77 came out several hours ago. You can upgrade your nvidia-ml-py package with the command:

python3 -m pip install --upgrade nvidia-ml-py

This would resolve the unrecognized PIDs with CUDA 12 drivers.

I would also make a new release of nvitop to resolve CUDA 12 driver support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Something related to the core APIs bug Something isn't working pynvml Something related to the `nvidia-ml-py` package upstream Something upstream related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants