Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gasnetc_ofi_init failure on Frontier/Crusher with 1 node #22

Open
elliottslaughter opened this issue Aug 1, 2023 · 0 comments
Open

Comments

@elliottslaughter
Copy link
Contributor

elliottslaughter commented Aug 1, 2023

This is to document a known issue with Slingshot 11 network. Legion runs hit an error if you use only 1 node:

*** FATAL ERROR (proc 0): in gasnetc_ofi_init() at .../gasnet_ofi.c:946: fi_domain failed: -38(Function not implemented)

I have been told that this is an issue with the SLURM integration, and therefore is not something that Legion/GASNet are in a position to directly address.

In the meantime, I'm aware of two workarounds:

  1. Use 2 or more nodes
  2. Run with srun --network=single_node_vni

I will update this issue when the workarounds are no longer required.

Edit: I understand that the issue is related to SLURM settings at OLCF, not necessarily to Slingshot 11 per se.

@elliottslaughter elliottslaughter changed the title gasnetc_ofi_init failure on Slingshot 11 networks with 1 node gasnetc_ofi_init failure on Frontier/Crusher with 1 node Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant