You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're currently issuing a warning to users about large graphs and are suggesting to scatter data. We should revise this warning. Ideally we'd point them to a page in the documentation that discusses this problem.
Primarily, I would like to recommend a safer and conceptually simpler approach of using a delayed or a Client.submit instead. The benefit of doing this is that code that is not using a dask client can benefit from this and that it is more resilient.
The shortcomings of delayed/submit over scatter are
Direct to worker communication is not possible. The data will always flow over the scheduler. Depending on the network topology, direct communication is not possible anyhow.
A copy of the data will be stored on the scheduler (which is why it is more resilient but this of course might push the scheduler over its limit)
If the scheduler memory or direct communication is actually a problem for users, going the extra mile of using remote storage might even be necessary. Since this topic is not exactly trivial it might be appropriate to let the warning point to a documentation page.
The text was updated successfully, but these errors were encountered:
We're currently issuing a warning to users about large graphs and are suggesting to scatter data. We should revise this warning. Ideally we'd point them to a page in the documentation that discusses this problem.
Primarily, I would like to recommend a safer and conceptually simpler approach of using a
delayed
or aClient.submit
instead. The benefit of doing this is that code that is not using a dask client can benefit from this and that it is more resilient.The shortcomings of delayed/submit over scatter are
If the scheduler memory or direct communication is actually a problem for users, going the extra mile of using remote storage might even be necessary. Since this topic is not exactly trivial it might be appropriate to let the warning point to a documentation page.
The text was updated successfully, but these errors were encountered: