Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'GatewayCluster' object has no attribute 'wait_for_workers' #782

Closed
abprime opened this issue Dec 13, 2023 · 2 comments
Closed

Comments

@abprime
Copy link

abprime commented Dec 13, 2023

Describe the issue:
The new version 2023.9.0 is giving an attribute error for wait_for_workers. This was working in the earlier version 2023.1.1.
The method on the cluster is called internally from the distributed client method wait_for_workers.

Is there any alternative way to wait for the workers?

AttributeError                            Traceback (most recent call last)
Cell In[41], [line 1](vscode-notebook-cell:?execution_count=41&line=1)
----> [1](vscode-notebook-cell:?execution_count=41&line=1) client.wait_for_workers(8)

File [~/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/distributed/client.py:1469](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/abprime/dde2/services/dde-analytics-core/tests/~/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/distributed/client.py:1469), in Client.wait_for_workers(self, n_workers, timeout)
   [1466](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/abprime/dde2/services/dde-analytics-core/tests/~/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/distributed/client.py:1466) if self.cluster is None:
   [1467](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/abprime/dde2/services/dde-analytics-core/tests/~/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/distributed/client.py:1467)     return self.sync(self._wait_for_workers, n_workers, timeout=timeout)
-> [1469](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/abprime/dde2/services/dde-analytics-core/tests/~/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/distributed/client.py:1469) return self.cluster.wait_for_workers(n_workers, timeout)

AttributeError: 'GatewayCluster' object has no attribute 'wait_for_workers'
2023-12-12 17:56:15,196 - distributed.client - ERROR - Failed to reconnect to scheduler after 30.00 seconds, closing client
Exception in callback None()
handle: <Handle cancelled>
Traceback (most recent call last):
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 1367, in _do_ssl_handshake
    self.socket.do_handshake()
  File "/home/abprime/anaconda3/envs/py310/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abprime/anaconda3/envs/py310/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 202, in _handle_events
    handler_func(fileobj, events)
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 691, in _handle_events
    self._handle_read()
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 1427, in _handle_read
    self._do_ssl_handshake()
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 1385, in _do_ssl_handshake
    return self.close(exc_info=err)
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 606, in close
    self._signal_closed()
  File "/home/abprime/dde2/services/dde-analytics-core/.venv/lib/python3.10/site-packages/tornado/iostream.py", line 636, in _signal_closed
    self._ssl_connect_future.exception()
asyncio.exceptions.CancelledError

Minimal Complete Verifiable Example:

gateway = Gateway(
    address=DASK_GATEWAY_URL,
    auth=BasicAuth(
        password=DASK_BASIC_AUTH_PASSWORD,
    ),
    asynchronous=False,
)
cluster = gateway.new_cluster()
cluster.adapt(minimum=6, maximum=12)
client = cluster.get_client()

client.wait_for_workers(6) // this line raises the Error

Anything else we need to know?:

Environment:

  • Dask version: '2023.5.0'
  • Distributed version: '2023.5.0'
  • Dask Gateway version: '2023.9.0'
  • Python version: 3.10.10
  • Operating System: Windows
  • Install method (conda, pip, source): poetry
@TomAugspurger
Copy link
Member

Seems like that might have been from dask/distributed#6700. That's now requiring a new Cluster.wait_for_workers method, that isn't on GatewayCluster.

We could implement that (PR would be great). It might be worth opening an issue on dask/distributed to confirm whether that change to the cluster interface was intention (it kind of looks incidental to the intent of the PR, but I haven't looked closely).

@consideRatio
Copy link
Collaborator

This was an upstream issue fixed in dask/distributed#8441 part of distributed>=2024.1.0, so use of that version should resolve this I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants