-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of subworlds in CoreNEURON #2185
Initial implementation of subworlds in CoreNEURON #2185
Conversation
…der to reflect the new semantics of myid_bbs and numprocs_bbs
…iraikov/coreneuron_subworlds
added modified test_subworld
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Ivan!
Thanks a lot for your really nice contribution.
I took some time to skim through your PR to understand the changes and left some comments/suggestions.
I'm trying to follow the slack conversation as well but if you have any other question or need and clarification feel free to post it here as well 👍
I'm not experienced in working on a pull request originating on another repository. I followed the "Merging via command line" Step1 instructions but because this pr is currently out of date with the master, a successful pull needed a merge or rebase.
but I suppose this means getting my changes back to the pr will require more or less a manual patch. |
The definition of the nrn2core_subworld_info_ function pointer is in the CoreNeuron version of nrnmpi.cpp, but it looks like this file is not compiled when MPI is disabled. This results in the error "Could not get symbol nrn2core_subworld_info_ from CoreNEURON" when CoreNEURON is enabled and psolve is invoked. Does anybody have any advice where to place it? |
Looking into this. |
@nrnhines I have checked the option that is supposed to give you write access to the branch. You should be able to push commits to the branch, but if not I can add you manually. |
I think that since
args and return do not depend on mpi.h, it can be declared just about anywhere that is always compiled and become in nrnmpi.cpp
|
adding comment about nrnmpi_subworld_change_cnt; moving declaration of nrn2core_subworld_info_ to src/coreneuron/utils/utils.cpp; set values of myid_bbs and numprocs_bbs on all ranks
adding corenrn subworlds test to ctest config; somewhat more robust corenrn subworlds
In at least one run (perhaps many?) I see the failure in external-nrntest
This is also occurring on my desktop. Perhaps it has something to do with a recent change to nrntests
in external/tests/nrntest/nrniv/Parallel/recv
|
I see this test hangs on my computer, and I suspect it is possibly caused by the changed semantics of nhost_bbs and myid_bbs. It looks like calls to context somehow are not sent to the workers. |
That is worth exploring. I thought the issue was something else but my machine does successfully do |
@nrnhines The reason for the hang on my system is that when runworker is called, the worker processes (myid > 0) all execute BBS::subworld_worker_execute, where they invoke nrnmpi_int_broadcast and wait. At the same time, the master calls context and then take to obtain results, but somehow the call to context does not result in the workers receiving the call, so they are stuck in the nrnmpi_int_broadcast. I do not understand enough about the BBS internals to know the correct sequence that should occur. But it looks like the change in numprocs_bbs myid_bbs is breaking context in the default case, when subworlds is not called. |
Ah yes. When there are subworlds some code relies on nrnmpi_myid_bbs == -1 .
I will review in context any substantive use of nrnmpi_myid_bbs. I.e.
|
I think it will be best to return to the former '-1' values for nrnmpi_myid_bbs and nrnmpi_nhost_bbs. This can be revisited after this pr is merged. Instead, for purposes of filling the arguments in nrn/src/nrniv/nrncore_write/callbacks/nrncore_callbacks.cpp
or else copy the present calculation of nrnmpi_myid_bbs into another global variable and then for nrnmpi_myid != 0 set By the way, I realize that calling pc.subworlds(n) multiple times is useless from the point of view of the bulletin board. After calling pc.runworker you are locked in to whatever subworld organization there was. |
Ok, will do.
I thought it was possible to call pc.done and then subworlds again, but I agree this is a fairly contrived scenario. |
This reverts commit 11b9906.
…ariable for subworld_id to pass to CoreNEURON via subworld_info
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #2185 +/- ##
=======================================
Coverage 55.76% 55.77%
=======================================
Files 616 616
Lines 123975 124017 +42
=======================================
+ Hits 69133 69168 +35
- Misses 54842 54849 +7
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
The NEURON CI checks are now passing, but the neuronsimulator.nrn checks did not pass and I do not understand the error. |
The CI failures have nothing to do with this pull request. We have the same issue in #2199 which is a typo fix for the documentation, but I have no idea why the CI is misbehaving. |
I'm guessing something about the system has changed on one of the CI runners. Our CI scripts for the CI resources seem to need occasional maintenance as the latter upgrades to new versions of software that we use.
|
As far as I can tell, it's related to permissions. Since this PR is incoming from a fork. Will look into it Monday |
Note that the |
I've explored in dealing with permissions but with no success. So mitigation in place via #2201, this means that for fork PRs people need to manually get the Azure drop URL in case they need the build artifacts (wheels) |
Thank you for looking into this! Is there anything I can do on my end to ensure that the CI-related checks pass? |
No special consideration required for that job anymore. |
…and ordering of ranks within subworld when CoreNEURON is invoked
…ikov/nrn into iraikov/coreneuron_subworlds
This PR contains initial support for subworlds in CoreNEURON per Slack discussion on 1/23/2023.