Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook for Parameter Sweep tutorial occasional cell timeout failures on Linux/Python 3.8 #1312

Closed
lbianchi-lbl opened this issue Feb 26, 2024 · 11 comments · Fixed by #1316
Labels
Priority:Normal Normal Priority Issue or PR

Comments

@lbianchi-lbl
Copy link
Contributor

image

@adam-a-a
Copy link
Contributor

Just noting that I am seeing a similar issue in checks for PR #1244 :
image

@lbianchi-lbl
Copy link
Contributor Author

Based on my testing in #1314, the failures occur on Linux with Python 3.8 up for (up to) about half of the runs:

image

@bknueven
Copy link
Contributor

It looks like the demo uses the "ConcurrentFutures" backend:

" \"parallel_back_end\": \"ConcurrentFutures\", # Multiprocessing, MPI, Ray available\n",
.

@bknueven
Copy link
Contributor

Given that Python 3.9 seems to be working, here are the release notes for Python 3.9 about concurrent.futures: https://docs.python.org/3/whatsnew/3.9.html#concurrent-futures

@bknueven
Copy link
Contributor

Based on our experience with #1211 it's almost certainly the case that the deadlock is occurring in the gather method:
https://github.com/watertap-org/watertap/blob/main/watertap/tools/parallel/concurrent_futures_parallel_manager.py#L104-L115

@bknueven
Copy link
Contributor

As a hack, the multiprocessing parallel back end seems to work fine: https://github.com/watertap-org/watertap/actions/runs/8080874093/job/22078190530

@bknueven
Copy link
Contributor

bknueven commented Feb 28, 2024

Given the available evidence I am leaning towards this being a bug in Python 3.8. Python 3.8 hits EOL in October 2024.

How much resources should we devote to fixing this issue, when a work-around (use MultiProcessing) is readily available? It also raises the question of the utility of having both a multiprocessing parallel back end and a concurrent.futures parallel back end. If both are included with Python, why maintain both?

@lbianchi-lbl
Copy link
Contributor Author

Thanks for running the checks @bknueven. Personally, I'd be more than happy to just consider this a Python 3.8 limitation, and not do anything until the time when we drop support for Python 3.8 beyond excluding the notebook from being tested in Python 3.8 CI jobs.

@bknueven
Copy link
Contributor

Okay, I will implement the work around for Python 3.8 so this doesn't hold up further jobs.

@lbianchi-lbl
Copy link
Contributor Author

lbianchi-lbl commented Feb 28, 2024

Just to be clear, as far as I'm concerned the "workaround" can be

if: matrix.python-version == '3.8' && matrix.os == 'linux'`
run: rm tutorials/parameter_sweep_demo.ipynb

in the GHA job where notebooks are tested, meaning I don't think we need to come up with a "proper" fix.

@bknueven
Copy link
Contributor

Just to be clear, as far as I'm concerned the "workaround" can be

if: matrix.python-version == '3.8' && matrix.os == 'linux'`
run: rm tutorials/parameter_sweep_demo.ipynb

in the GHA job where notebooks are tested, meaning I don't think we need to come up with a "proper" fix.

I already had a work-around slightly more complex than this implemented (#1316) -- which should help any users of Python 3.8 not hit this same issue.

lbianchi-lbl pushed a commit that referenced this issue Feb 28, 2024
* try switching to MultiProcessing

* updating tutorial comment

* default to multiprocessing; make interface more forgiving
@ksbeattie ksbeattie added the Priority:Normal Normal Priority Issue or PR label Feb 29, 2024
lbianchi-lbl pushed a commit to watertap-org/parameter-sweep that referenced this issue May 1, 2024
…ap#1316)

* try switching to MultiProcessing

* updating tutorial comment

* default to multiprocessing; make interface more forgiving
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:Normal Normal Priority Issue or PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants