Simplify RNG seed initialization in PARPACK #431

FabienPean · 2023-08-19T21:19:56Z

Following up #424, one can simplify the parpack seed initialization to fulfill the LAPACK requirements. Before, it was already a constant number which could handle "only" 1547 (0.5*(4095-1000-1)) proc id.

fghoussen · 2023-08-20T10:05:57Z

I'll have a closer look when possible.
@szhorvat: Sounds right to you?

PARPACK/SRC/MPI/pdgetv0.f

szhorvat · 2023-08-20T11:21:04Z

Yes, this change makes sense to me.

fghoussen · 2023-08-20T16:44:51Z

@FabienPean: BTW, why do you think this is better than the original way of seeding?
@pv: does it help #218?

FabienPean · 2023-08-20T17:24:38Z

@FabienPean: BTW, why do you think this is better than the original way of seeding?

It is simpler, no error block if it reaches 1547 MPI proc, and does not make some obfuscating calculation 1000+2*myid+1 is still a constant seed per MPI proc. The only difference is how the value are spread over iseed.
It was yielding

0<= iseed(1)<5
0<=iseed(2)<1000
0<=iseed(3)<100
1<iseed(4)<10

Since there is no recommendation on the spread of the initial seed from LAPACK, the new value values are as good as any. So I copied the initial part from the serial code for iseed(1-3) and used a modulo for iseed(4), which at least guarantee an initial seed uniqueness until 2048 MPI proc, then MPI processes 0 and 2048 will start to reuse the initial seed and therefore random values for their part of the problem.

In brief, this PR yields fewer LOC and reduced seed repetition for high MPI proc counts

@pv: does it help #218?

This PR will not help for this issue, it is more connected to #424 in particular that comment:

Otherwise, the seed could be provided by the user but it means bubbling up the parameter and affecting function signatures.

If someone provides a known residual and it fails for the user, then because of the constant seed, it will always fail (same sequence of numbers will be generated upon getv0 calls). The only way to dodge the constant seed is for the user to modify the initial residual (and hope for the best)

EDIT: yup indeed the last part was effectively discussed in length previously in #218 :)

fghoussen · 2023-08-20T18:01:27Z

It is simpler

I guess this is OK and will not trigger regression.

@caliarim: what do you think? Potential reg or fix related to JuliaLinearAlgebra/Arpack.jl#138?

fghoussen · 2023-08-20T18:14:23Z

related to JuliaLinearAlgebra/Arpack.jl#138

Just realizing that this was serial (arpack) so parpack will have no impact here

szhorvat · 2023-08-30T07:10:10Z

I am not an expert on RNGs and can't comment much, but here's a reason why this manner of seeding may not be ideal, depending on the type of RNG used: https://stats.stackexchange.com/a/233091/4764 I believe myid will increase sequentially from thread to thread.

fghoussen · 2023-09-02T17:30:44Z

PARPACK/SRC/MPI/pcgetv0.f

+         iseed(1) = 1
+         iseed(2) = 3
+         iseed(3) = 5
+         iseed(4) = 1+2*mod(myid,2048)


#431 (comment): Not sure what to do. Does this mean we should let seed([123]) as is (or dependent of rand?), and, initialize RNG seed to myid (with some kind of call random_seed(put=myid)?) before changing iseed(4) to iseed(4) = 1+2*mod(rand(),2048)?

fghoussen · 2023-09-02T17:30:51Z

I believe myid will increase sequentially from thread to thread.

Yes, it will.

szhorvat · 2023-09-02T18:25:20Z

Well, this may not be an issue at all here. Whether it is depends on how LAPACK's RNG works, and once again, I'm definitely not an RNG expert, and I do not have a theoretical understanding of why setting only the least significant part of the seed may be an issue.

Also, problems would only occur if iseed(4) (and not iseed(1)) is the least significant part of the 48 bit value it operates with. I am not entirely certain if this is the case.

If we want to be concerned about this, we can make all of iseed(1), iseed(2), iseed(3) and iseed(4) dependent on myid.

But at this point, I'm really hoping to see 3.9.1 released as soon as possible :-)

FabienPean · 2023-09-02T21:37:42Z

I am not an expert on RNGs and can't comment much, but here's a reason why this manner of seeding may not be ideal, depending on the type of RNG used: https://stats.stackexchange.com/a/233091/4764 I believe myid will increase sequentially from thread to thread.

I am not entirely sure what you mean here. AFAIU, the link discusses the choice of an initial seed and seed sampling per thread across several run of an application. In the case of ARPACK, the seed was already the same across runs. The risk is more about the lack of sensitivity of the RNG (small seed number variation yields a small spread in the number generated, i≈j ⇒ x≈y). A good characteristic of a RNG is to not have this problem, which I would expect form Lapack.

The documentation of Lapack does not mention advice or any restriction on the choice of seed (dlaran), diving deeper in the theory or testing it with different input might provide further insights

This routine uses a multiplicative congruential method with modulus
248 and multiplier 33952834046453 (see G.S.Fishman,
'Multiplicative congruential random number generators with modulus
2b: an exhaustive analysis for b = 32 and a partial analysis for
b = 48', Math. Comp. 189, pp 331-344, 1990).

48-bit integers are stored in 4 integer array elements with 12 bits
per element. Hence the routine is portable across machines with
integers of 32 bits or more.

This PR instead of spreading a seed across the 4 elements of the array in an arbitrary manner, it increases linearly in iseed(4). For a fully random seed at each run, I gave a try in #424 , but it made several tests to fail consistently, so I didn't pursue then because of the major change it would be and the lack of visibility on the reason for test failure.

Comparison for a few MPI ranks:

myid	Before	This PR
1	[1,0,0,3]	[1,3,5,3]
3	[1,0,0,7]	[1,3,5,7]
5	[1,0,1,1]	[1,3,5,11]
404	[1,8,0,9]	[1,3,5,809]

         igen = 1000 + 2*myid + 1
         iseed(1) = igen/1000
         igen     = mod(igen,1000)
         iseed(2) = igen/100
         igen     = mod(igen,100)
         iseed(3) = igen/10
         iseed(4) = mod(igen,10)

szhorvat · 2023-09-04T18:54:57Z

I experimented a bit with this specific RNG and it became clear that using sequential seeds is not acceptable.

A good characteristic of a RNG is to not have this problem, which I would expect form Lapack.

There is a misunderstanding here about what the goal of a PRNG is and what makes it good quality. For a given seed, a PRNG will produce a sequence of numbers, which should be uniformly distributed and should appear statistically independent. This does not mean that if we plug in one seed to get the sequence $a_1, a_2, \dots$ and a second seed to get $b_1, b_2, \dots$ then $a_i$ and $b_i$ will be independent. In fact, it is a characteristic of some RNG algorithms that they won't be independent for certain seed choices.

Let's reproduce LAPACK's RNG algorithm. The laruv docs state,

  This routine uses a multiplicative congruential method with modulus
  2**48 and multiplier 33952834046453 (see G.S.Fishman,
  'Multiplicative congruential random number generators with modulus
  2**b: an exhaustive analysis for b = 32 and a partial analysis for
  b = 48', Math. Comp. 189, pp 331-344, 1990).

So we have
$$r_{i+1} = A r_i \mod M$$
where $M=2^{48}$ and $A=33952834046453$.

Note that with laruv, the "seed" is in fact the entire RNG state. Let's take consecutive seeds 1 and 3, and see what we get if we plot $(a_i, b_i)$. Note that the state needs to be an odd number (the Fishman reference explain why). I'll include the Mathematica code I used for this, for reference.

iter[k_] := Mod[33952834046453 k, 2^48]

a = NestList[iter, 1, 1000];
b = NestList[iter, 3, 1000];

ListPlot[Transpose[{a, b}], AspectRatio -> Automatic]

It's clear that there are strong correlations between the two sequences:

Let's try 1 and 11:

How about 1 and 1001?

Now it's looking better.

Conclusion: The parallel threads should not be initialized with consecutive seeds. Instead, the effect of myid should be spread out through the four 12-bit parts of the state/seed.

To be fair, this specific visualization won't reveal the non-independence if we don't use small seeds. And I'll emphasize once more than I am not an RNG expert. But I believe the point I made in the first paragraph does stand.

Here's another experiment: Let's take seeds between 10000 and 11000, iterate each 100 times to get a number. Then plot these numbers in order:

ListPlot[Nest[iter, #, 100] & /@ Range[10000, 11000]]

I am not entirely sure what you mean here. AFAIU, the link discusses the choice of an initial seed and seed sampling per thread across several run of an application.

The SE post discusses the same situation we have here. If we have 4 threads, is it okay to init them with consecutive seeds, for example 23, 24, 25, 26. It states that with a Mersenne twister it is not, as the four sequences would not be independent.

szhorvat · 2023-09-04T19:35:02Z

I'll put this here as well: https://stackoverflow.com/q/51184553/695132

fghoussen · 2023-09-05T08:17:23Z

Interesting! So as stated here: https://stackoverflow.com/questions/51184553/do-consequtive-rng-seed-yield-independent-random-numbers, one way to go could be: "Take those sequential numbers and run them through a cryptographic hash, then use that as the seed. Completely repeatable yet uncorrelated."? Only for iseed(4)? Also for iseed([123])?

It's clear that there are strong correlations

Never used Mathematica: not sure what the code does! You plot a on the x-axis and b on the y-axis, right?

szhorvat · 2023-09-05T09:08:04Z

"Take those sequential numbers and run them through a cryptographic hash, then use that as the seed. Completely repeatable yet uncorrelated."

That would be one possible option.

Only for iseed(4)?

seed represents a 48-bit number split into four 12-bit parts. So, ideally, populate all of it with a hash function that produces at least 48 bits.

Never used Mathematica

You don't need to read the code but I like to insert it in these situations for my own reference.

You plot a on the x-axis and b on the y-axis, right?

Yes, up to the last plot.

The last plot shows the 100th random number in streams initialized with consecutive seeds, the point being again that consecutive seeds won't produce independent streams.

FabienPean · 2023-09-14T22:11:19Z

Thanks that's clearer and they are compelling arguments. The preexisting code is maybe not optimal, but this PR would theoretically make it worse.

"Take those sequential numbers and run them through a cryptographic hash, then use that as the seed. Completely repeatable yet uncorrelated."

That would be one possible option.

Another option could be to run the dlarnv routine myid times from a common starting seed across all MPI processes, convert the generated doubles to suitable integers and use that as a seed for subsequent calls in parpack.

szhorvat reviewed Aug 20, 2023

View reviewed changes

PARPACK/SRC/MPI/pdgetv0.f Outdated Show resolved Hide resolved

Simplify RNG seed in PARPACK

401e27b

FabienPean force-pushed the fix/simplify-mpi-seed branch from 052588a to 401e27b Compare August 20, 2023 17:09

fghoussen reviewed Sep 2, 2023

View reviewed changes

FabienPean closed this Sep 14, 2023

FabienPean mentioned this pull request Sep 20, 2023

Update RNG seed initialization in PARPACK #437

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify RNG seed initialization in PARPACK #431

Simplify RNG seed initialization in PARPACK #431

FabienPean commented Aug 19, 2023

fghoussen commented Aug 20, 2023

szhorvat commented Aug 20, 2023

fghoussen commented Aug 20, 2023

FabienPean commented Aug 20, 2023 •

edited

Loading

fghoussen commented Aug 20, 2023

fghoussen commented Aug 20, 2023

szhorvat commented Aug 30, 2023

fghoussen Sep 2, 2023

fghoussen commented Sep 2, 2023

szhorvat commented Sep 2, 2023 •

edited

Loading

FabienPean commented Sep 2, 2023 •

edited

Loading

szhorvat commented Sep 4, 2023 •

edited

Loading

szhorvat commented Sep 4, 2023

fghoussen commented Sep 5, 2023

szhorvat commented Sep 5, 2023

FabienPean commented Sep 14, 2023

Simplify RNG seed initialization in PARPACK #431

Simplify RNG seed initialization in PARPACK #431

Conversation

FabienPean commented Aug 19, 2023

fghoussen commented Aug 20, 2023

szhorvat commented Aug 20, 2023

fghoussen commented Aug 20, 2023

FabienPean commented Aug 20, 2023 • edited Loading

fghoussen commented Aug 20, 2023

fghoussen commented Aug 20, 2023

szhorvat commented Aug 30, 2023

fghoussen Sep 2, 2023

Choose a reason for hiding this comment

fghoussen commented Sep 2, 2023

szhorvat commented Sep 2, 2023 • edited Loading

FabienPean commented Sep 2, 2023 • edited Loading

szhorvat commented Sep 4, 2023 • edited Loading

szhorvat commented Sep 4, 2023

fghoussen commented Sep 5, 2023

szhorvat commented Sep 5, 2023

FabienPean commented Sep 14, 2023

FabienPean commented Aug 20, 2023 •

edited

Loading

szhorvat commented Sep 2, 2023 •

edited

Loading

FabienPean commented Sep 2, 2023 •

edited

Loading

szhorvat commented Sep 4, 2023 •

edited

Loading