Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip update part in pdtrord for current WINDOW if NWIN = 0 #72

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

weslleyspereira
Copy link
Collaborator

Try simple solution to solve #69.

@drhpc
Copy link

drhpc commented Jul 27, 2022

Is this a proper solution? Isn't the issue that the parallelization triggers NWIN=0 at all?

@weslleyspereira
Copy link
Collaborator Author

Is this a proper solution? Isn't the issue that the parallelization triggers NWIN=0 at all?

Thanks for asking. The answer has 2 parts:

A. This is a proper solution for the tests cases. Some other reasons for this fix:

  1. The update part does not make sense if NWIN=0.
  2. I verified that the tests result in FLOPS = 0 when NWIN=0. Accordingly to the algorithm logic, the update should be neglected when FLOPS = 0. I could have proposed a fix like the following:
               IF( FLOPS.NE.0 ) THEN
                  IF( ( FLOPS*100 ) / ( 2*NWIN*NWIN ) .GE. MMULT ) THEN
                     [...]
                  ELSE
                     [...]
                  END IF
               END IF

But this solution, in my opinion, would also be a problem if for some reason NWIN=0 and FLOPS is non zero. My solution is based on reasoning 1.

B. This is not a proper solution for the possible communication issue we have. I will give a short report for what I obtained.

  • My machine: Ubuntu 20.04, 8 cores (16 threads), using OpenMPI and GNU compilers. I can give more information if needed.
  1. NWIN is always different from zero and the tests pass when running xdhseqr with 1 or more than 8 MPI processes. I obtain NWIN=0 when using 8 MPI processes.

  2. I obtain NWIN=0 when using 2 - 8 MPI processes. I does not matter if I use bind-to, oversubscribe or map-by flags. At least I couldn't find anything else by playing with those flags.

  3. When NWIN=0, there are 2 scenarios:
    i. NMWIN2 = 2. In this case, LILO = LIHI+1 and I = LILO.
    ii. NMWIN2 = 1. In this case, LILO < LIHI and I is trash, sometimes even a negative number.

We are still investigating this problem.

@liyangzhong7
Copy link

I would like to add a few more information to this bug. Before I see this post, I used multiple compiler-mpi combination to compile and test scalapack-2.2.0. I tried gcc-8.2.0+openmpi-4.0.0, gcc-10.1.0+openmpi-4.0.5, gcc-11.2.0+openmpi-4.1.2 and intel_oneapi-2021. The gcc-8.2.0 and intel-2021 passed all test; while using the compiler gcc-10 and gcc-11 to compile will fail for Test #69 xshseqr and Test #70: xdhseqr. These two tests failed only for mpirun with np = 2-8; the tests failed no matter you are using gcc-10 or gcc-11 or gcc-8 during run-time. If you compile with gcc-8 and test with gcc-10, it goes fine.

The solution given by [weslleyspereira] is effective, but remember you need to edit both "pstrord.f" and "pdtrord.f" in order to pass those two tests, respectively. It is somewhat surprised to me that the failed test results from a program bug instead of settings from the compiler or mpi, and I have spent a lot of time checking and comparing the settings and object files in my system. So I write this post hoping people with the same situation can find this solution earlier and save them some time :)

@weslleyspereira
Copy link
Collaborator Author

In case this PR is still reasonable, I just wanted to add that Meiyue Shao, one of the main contributors to this code, suggested we look at https://dl.acm.org/doi/10.1145/2699471 to check if the current implementation is correct. I don't have time to work on that now, but, please, feel welcome to take from where I stopped here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants