diff --git a/src/task.c b/src/task.c
index a1e8dabb89f9d..7c8770530c326 100644
--- a/src/task.c
+++ b/src/task.c
@@ -856,202 +856,143 @@ guaranteed to avoid collisions between the RNG streams of all tasks. The main
 RNG is the xoshiro256++ RNG whose state is stored in rngState[0..3]. There is
 also a small internal RNG used for task forking stored in rngState[4]. This
 state is used to iterate a linear congruential generator (LCG), which is then
-put through four different variations of the strongest PCG output function,
-referred to as PCG-RXS-M-XS-64 [1]. This output function is invertible: it maps
-a 64-bit state to 64-bit output. This is one of the reasons it's not recommended
-for general purpose RNGs unless space is at an absolute premium, but in our
-usage invertibility is actually a benefit (as is explained below) and adding as
-little additional memory overhead to each task object as possible is preferred.
+combined with xoshiro256's state and put through four different variations of
+the strongest PCG output function, referred to as PCG-RXS-M-XS-64 [1].
 
 The goal of jl_rng_split is to perturb the state of each child task's RNG in
-such a way each that for an entire tree of tasks spawned starting with a given
-state in a root task, no two tasks have the same RNG state. Moreover, we want to
-do this in a way that is deterministic and repeatable based on (1) the root
-task's seed, (2) how many random numbers are generated, and (3) the task tree
-structure. The RNG state of a parent task is allowed to affect the initial RNG
-state of a child task, but the mere fact that a child was spawned should not
-alter the RNG output of the parent. This second requirement rules out using the
-main RNG to seed children: if we use the main RNG, we either advance it, which
-affects the parent's RNG stream or, if we don't advance it, then every child
-would have an identical RNG stream. Therefore some separate state must be
-maintained and changed upon forking a child task while leaving the main RNG
-state unchanged.
-
-The basic approach is that used by the DotMix [2] and SplitMix [3] RNG systems:
-each task is uniquely identified by a sequence of "pedigree" numbers, indicating
-where in the task tree it was spawned. This vector of pedigree coordinates is
-then reduced to a single value by computing a dot product with a shared vector
-of random weights. The weights are common but each pedigree of each task is
-distinct, so the dot product of each task is unlikely to be the same. The DotMix
-paper provides a proof that this dot product hash value (referred to as a
-"compression function") is collision resistant in the sense that the pairwise
-collision probability of two distinct tasks is 1/N where N is the number of
-possible weight values. Both DotMix and SplitMix use a prime value of N because
-the proof requires that the difference between two distinct pedigree coordinates
-have a multiplicative inverse, which is guaranteed by N being prime since all
-values are invertible then. We take a somewhat different approach: instead of
-assigning n-ary pedigree coordinates, we assign binary tree coordinates to
-tasks, which means that our pedigree vectors have only 0/1 and differences
-between them can only be -1, 0 or 1. Since the only possible non-zero coordinate
-differences are ±1 which are invertible regardless of the modulus, we can use a
-modulus of 2^64, which is far easier and more efficient then using a prime
-modulus. It also means that when accumulating the dot product incrementally, as
-described in SplitMix, we don't need to multiply weights by anything, we simply
-add the random weight for the current task tree depth to the parent's dot
-product to derive the child's dot product.
-
-we instead limit pedigree coordinates to being binary, guaranteeing
-invertibility regardless of modulus. When a task spawns a child, the parent and
-child share the parent's previous pedigree prefix and the parent appends a zero
-to its coordinates, which doesn't affect the task's dot product value, while the
-child appends a one, which does produce a new dot product. In this manner a
-binary pedigree vector uniquely identifies each task and since the coordinates
-are binary, the difference between coordinates is always invertible: 1 and -1
-are their own multiplicative inverses regardless of the modulus.
-
-How does our assignment of pedigree coordinates to tasks differ from DotMix and
-SplitMix? In DotMix and SplitMix, each task has a fixed pedigree vector that
-never changes. The root tasks's pedigree is `()`, its first child's pedigree is
-`(0,)`, its second child's pedigree is `(2,)` and so on. The length of a task's
-pedigree tuple corresponds to how many ancestors tasks it has. Our approach
-instead appends 0 to the parent's pedigree when it forks a child and appends 1
-to the child's pedigree at the same time. The root task starts with a pedigree
-of `()` as before, but when it spawns a child, we update its pedigree to `(0,)`
-and give its child a pedigree of `(1,)`. When the root task then spawns a second
-child, we update its pedigree to `(0,0)` and give it's second child a pedigree
-of `(0,1)`. If the first child spawns a grandchild, the child's pedigree is
-changed from `(1,)` to `(1,0)` and the grandchild is assigned a pedigree of
-`(1,1)`. In other words, DotMix and SplitMix build an n-ary tree where every
-node is a task: parent nodes are higher up the tree and child tasks are children
-in the pedigree tree. Our approach is to build a binary tree where only leaves
-are tasks and each task spawn replaces a leaf in the tree with two leaves: the
-parent moves to the left/zero leaf while the child is the right/one leaf. Since
-the tree is binary, the pedigree coordinates are binary.
-
-It may seem odd for a task's pedigree coordinates to change, but note that we
-only ever append zeros to a task's pedigree, which does not change its dot
-product. So while the pedigree changes, the dot product is fixed. What purpose
-does appending zeros like this serve if the task's dot product doesn't change?
-Changing the pedigree length (which is also the binary tree depth) ensures that
-the next child spawned by that task will have new and different dot product from
-the previous child since it will have a different pseudo-random weight added to
-the parent's dot product value. Whereas the pedigree length in DotMix and
-SplitMix is unchanging and corresponds to how many ancestors a task has, in our
-scheme the pedigree length corresponds to the number of ancestors *plus*
-children a task has, which increases every time it spawns another child.
-
-We use the LCG in rngState[4] to generate pseudorandom weights for the dot
-product. Each time a child is forked, we update the LCG in both parent and child
-tasks. In the parent, that's all we have to do -- the main RNG state remains
-unchanged. (Recall that spawning a child should *not* affect subsequent RNG
-draws in the parent). The next time the parent forks a child, the dot product
-weight used will be different, corresponding to being a level deeper in the
-pedigree tree. In the child, we use the LCG state to generate four pseudorandom
-64-bit weights (more below) and add each weight to one of the xoshiro256 state
-registers, rngState[0..3]. If we assume the main RNG remains unused in all
-tasks, then each register rngState[0..3] accumulates a different dot product
-hash as additional child tasks are spawned. Each one is collision resistant with
-a pairwise collision chance of only 1/2^64. Assuming that the four pseudorandom
-64-bit weight streams are sufficiently independent, the pairwise collision
-probability for distinct tasks is 1/2^256. If we somehow managed to spawn a
-trillion tasks, the probability of a collision would be on the order of 1/10^54.
-In other words, practically impossible. Put another way, this is the same as the
-probability of two SHA256 hash values accidentally colliding, which we generally
-consider so unlikely as not to be worth worrying about.
-
-What about the random "junk" that's in the xoshiro256 state registers from
-normal use of the RNG? For a tree of tasks spawned with no intervening samples
-taken from the main RNG, all tasks start with the same junk which doesn't affect
-the chance of collision. The Dot/SplitMix papers even suggest adding a random
-base value to the dot product, so we can consider whatever happens to be in the
-xoshiro256 registers to be that. What if the main RNG gets used between task
-forks? In that case, the initial state registers will be different. The DotMix
-collision resistance proof doesn't apply without modification, but we can
-generalize the setup by adding a different base constant to each compression
-function and observe that we still have a 1/N chance of the weight value
-matching that exact difference. This proves collision resistance even between
-tasks whose dot product hashes are computed with arbitrary offsets. We can
-conclude that this scheme provides collision resistance even in the face of
-different starting states of the main RNG. Does this seem too good to be true?
-Perhaps another way of thinking about it will help. Suppose we seeded each task
-completely randomly. Then there would also be a 1/2^256 chance of collision,
-just as the DotMix proof gives. Essentially what the proof is telling us is that
-if the weights are chosen uniformly and uncorrelated with the rest of the
-compression function, then the dot product construction is a good enough way to
-pseudorandomly seed each task based on its parent's RNG state and where in the
-task tree it lives. From that perspective, all we need to believe is that the
-dot product construction is random enough (assuming the weights are), and it
-becomes easier to believe that adding an arbitrary constant to each dot product
-value doesn't make its randomness any worse.
-
-This leaves us with the question of how to generate four pseudorandom weights to
-add to the rngState[0..3] registers at each depth of the task tree. The scheme
-used here is that a single 64-bit LCG state is iterated in both parent and child
-at each task fork, and four different variations of the PCG-RXS-M-XS-64 output
-function are applied to that state to generate four different pseudorandom
-weights. Another obvious way to generate four weights would be to iterate the
-LCG four times per task split. There are two main reasons we've chosen to use
-four output variants instead:
-
-1. Advancing four times per fork reduces the set of possible weights that each
-   register can be perturbed by from 2^64 to 2^60. Since collision resistance is
-   proportional to the number of possible weight values, that would reduce
-   collision resistance. While it would still be strong engough, why reduce it?
-
-2. It's easier to compute four PCG output variants in parallel. Iterating the
-   LCG is inherently sequential. PCG variants can be computed independently. All
-   four can even be computed at once with SIMD vector instructions. The C
-   compiler doesn't currently choose to do that transformation, but it could.
-
-A key question is whether the approach of using four variations of PCG-RXS-M-XS
-is sufficiently random both within and between streams to provide the collision
-resistance we expect. We obviously can't test that with 256 bits, but we have
-tested it with a reduced state analogue using four PCG-RXS-M-XS-8 output
-variations applied to a common 8-bit LCG. Test results do indicate sufficient
-independence: a single register has collisions at 2^5 while four registers only
-start having collisions at 2^20. This is actually better scaling of collision
-resistance than we theoretically expect. In theory, with one byte of resistance
-we have a 50% chance of some collision at 20 tasks, which matches what we see,
-but four bytes should give a 50% chance of collision at 2^17 tasks and our
-reduced size analogue construction remains collision free at 2^19 tasks. This
-may be due to the next observation, which is that the way we generate
-pseudorandom weights actually guarantees collision avoidance in many common
-situations rather than merely providing collision resistance and thus is better
-than true randomness.
-
-In the specific case where a parent task spawns a sequence of child tasks with
-no intervening usage of its main RNG, the parent and child tasks are actually
-_guaranteed_ to have different RNG states. This is true because the four PCG
-streams each produce every possible 2^64 bit output exactly once in the full
-2^64 period of the LCG generator. This is considered a weakness of PCG-RXS-M-XS
-when used as a general purpose RNG, but is quite beneficial in this application.
-Since each of up to 2^64 children will be perturbed by different weights, they
-cannot have hash collisions. What about parent colliding with child? That can
-only happen if all four main RNG registers are perturbed by exactly zero. This
-seems unlikely, but could it occur? Consider the core of the output function:
-
-    p ^= p >> ((p >> 59) + 5);
-    p *= m[i];
-    p ^= p >> 43
-
-It's easy to check that this maps zero to zero. An unchanged parent RNG can only
-happen if all four `p` values are zero at the end of this, which implies that
-they were all zero at the beginning. However, that is impossible since the four
-`p` values differ from `x` by different additive constants, so they cannot all
-be zero. Stated more generally, this non-collision property: assuming the main
-RNG isn't used between task forks, sibling and parent tasks cannot have RNG
-collisions. If the task tree structure is more deeply nested or if there are
-intervening uses of the main RNG, we're back to relying on "merely" 256 bits of
-collision resistance, but it's nice to know that in what is likely the most
-common case, RNG collisions are actually impossible. This fact may also explain
-better-than-theoretical collision resistance observed in our experiment with a
-reduced size analogue of our hashing system.
+such a way that for an entire tree of tasks spawned starting with a given root
+task state, no two tasks have the same RNG state. Moreover, we want to do this
+in a way that is deterministic and repeatable based on (1) the root task's seed,
+(2) how many random numbers are generated, and (3) the task tree structure. The
+RNG state of a parent task is allowed to affect the initial RNG state of a child
+task, but the mere fact that a child was spawned should not alter the RNG output
+of the parent. This second requirement rules out using the main RNG to seed
+children: if we use the main RNG, we either advance it, which affects the
+parent's RNG stream or, if we don't advance it, then every child would have an
+identical RNG stream. Therefore some separate state must be maintained and
+changed upon forking a child task while leaving the main RNG state unchanged.
+
+The basic approach is a generalization and simplification of that used in the
+DotMix [2] and SplitMix [3] RNG systems: each task is uniquely identified by a
+sequence of "pedigree" numbers, indicating where in the task tree it was
+spawned. This vector of pedigree coordinates is then reduced to a single value
+by computing a "dot product" with a shared vector of random weights. I write
+"dot product" in quotes because what we use is not an actual dot product. The
+linear dot product construction used in both DotMix and SplitMix was found by
+@foobar_iv2 [4] to allow easy construction of linear relationships between the
+main RNG states of tasks, which was in turn reflected in observable linear
+relationships between the outputs of their RNGs. This relationship was between a
+minimum of four tasks, so doesn't constitute a collision, per se, but is clearly
+undesirable and highlights a hazard of the plain dot product construction.
+
+As in DotMix and SplitMix, each task is assigned unique task "pedigree"
+coordinates. Our pedigree construction is a bit different and uses only binary
+coordinates rather than arbitrary integers. Each pedigree is an infinite
+sequence of ones and zeros with only finitely many ones. Each task has a "fork
+index": the root task has index 0; the fork index of a task the jth child task
+of a parent task with fork index i is i+j. The root task's coordinates are all
+zeros; each child task's coordinates are the same as its parents except at its
+fork index, where the parent has a zero while the child has a one. A task's
+coordinates after its fork index are all zeros. The coordinates of a tasks
+ancestors are all prefixes of its own coordinates, padded with zeros.
+
+Also as in DotMix and SplitMix, we generate a sequence of pseudorandom weights
+to combine with the coordinates of each task. This sequence is common across all
+tasks, and different mix values for each task derive from their coordinates
+being different. In DotMix and SplitMix, this is a literal dot product: the
+pseudorandom weights are multiplied by corresponding task coordinate and added
+up. While this does provably make collisions as unlikely as randomly assigned
+task seeds, this linear construction can be used to create linearly correlated
+states between tasks. However, it turns out that the compression construction
+need not be linear, commutative, associative, etc. which allows us to avoid any
+linear or other obvious correlations between related sets of tasks.
+
+To generalize SplitMix's optimized dot product construction, we similarly
+compute each task's compression function value incrementally by combining the
+parent's compression value with pseudorandom weight corresponding with the
+child's fork index. Formally, if the parent's compression value is c then we can
+compute the child's compression value as c′ = f(c, wᵢ) where w is the vector of
+pseudorandom weights. What is f? It can be any function that is bijective in
+each argument for all values of the other argument:
+
+    * For all c: w ↦ f(c, w) is bijective
+    * For all w: c ↦ f(c, w) is bijective
+
+The proof that these requirements are sufficient to ensure collision resistance
+is in the linked discussion [4]. DotMix/SplitMix are a special case where f is
+just addition. Instead we use a much less simple mixing function:
+
+    1. We use (2c+1)(2w+1)÷2 % 2^64 to mix the bits of c and w
+    2. We then apply the PCG-RXS-M-XS-64 output function
+
+The first step thoroughly mixes the bits of the previous compression value and
+the pseudorandom weight value using multiplication, which is non-commutative
+with xoshiro's operations (xor, shift, rotate). This mixing function is a
+bijection on each argument witnessed by these inverses:
+
+    * c′ ↦ (2c′+1)(2w+1)⁻¹÷2 % 2^64
+    * w′ ↦ (2c+1)⁻¹(2w′+1)÷2 % 2^64
+
+The second PCG output step is a bijection and designed to be significantly
+non-linear -- non-linear enough to mask the linearity of the LCG that drives the
+PCG-RXS-M-XS-64 RNG and allows it to pass statistical RNG test suites despite
+having the same size state and output. In particular, since this mixing function
+is highly non-associative and non-linear, we (hopefully) don't have any
+discernible relationship between these values:
+
+    * c₀₀ = c
+    * c₁₀ = f(c, wᵢ)
+    * c₀₁ = f(c, wⱼ)
+    * c₁₁ = f(f(c, wᵢ), wⱼ)
+
+When f is simply `+` then these have a very obvious relationship:
+
+    c₀₀ + c₁₁ == c₁₀ + c₀₁
+
+This relationship holds regardless of what wᵢ and wⱼ are and is precisely what
+allows easy creation of correlated tasks with the DotMix/SplitMix construction
+that we previously used. Expressing any relationship between these values with
+our mixing function would require inverting the PCG output function (doable but
+non-trivial), knowing the weights wᵢ and wⱼ, and then applying the inversion
+functions for those weights appropriately. Since the weights are pseudo-randomly
+generated and not directly observable, this is infeasible.
+
+We maintain an LCG in rngState[4] to generate pseudorandom weights. An LCG by
+itself is a very bad RNG, but we combine this one with xoshiro256 state
+registers in a non-trivial way and then apply the PCG-RXS-M-XS-64 output
+function to that. Even if the xoshiro256 states are all zeros, which they should
+never be, the output would be the same as PCG-RXS-M-XS-64, which is a solid
+statistical RNG.
+
+Each time a child is forked, we update the LCG in both parent and child tasks,
+corresponding to increasing the fork index. In the parent, that's all we have to
+do -- the main RNG state remains unchanged. Recall that spawning a child should
+*not* affect subsequent RNG draws in the parent. The next time the parent forks
+a child, the mixing weight used will be different. In the child, we use the LCG
+state to perturb the child's main RNG state registers, rngState[0..3].
+
+Since we want these registers to behave independently, we use four different
+variations on f to mix the LCG state with each of the four main RNG registers.
+Each variation first xors the LCG state with a different random constant before
+combining that value above with the old register state via multiplication; the
+PCG-RXS-M-XS-64 output function is then applied to that mixed state, with a
+different multiplier constant for each variation / register index. Xor is used
+in the first step since we multiply the result with the state immediately after
+and multiplication distributes over `+` and commutes with `*`, which makes both
+options suspect; multiplication doesn't distribute over or commute with xor. We
+also use a different odd multiplier in PCG-RXS-M-XS-64 for each RNG register.
+These three sources of variation (different xor constants, different xoshiro256
+state, different PCG multipliers) are sufficient for each of the four outputs to
+behave statistically independently.
 
 [1]: https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf
 
 [2]: http://supertech.csail.mit.edu/papers/dprng.pdf
 
 [3]: https://gee.cs.oswego.edu/dl/papers/oopsla14.pdf
+
+[4]:
+https://discourse.julialang.org/t/linear-relationship-between-xoshiro-tasks/110454
 */
 void jl_rng_split(uint64_t dst[JL_RNG_SIZE], uint64_t src[JL_RNG_SIZE]) JL_NOTSAFEPOINT
 {
@@ -1060,26 +1001,30 @@ void jl_rng_split(uint64_t dst[JL_RNG_SIZE], uint64_t src[JL_RNG_SIZE]) JL_NOTSA
     src[4] = dst[4] = x * 0xd1342543de82ef95 + 1;
     // high spectrum multiplier from https://arxiv.org/abs/2001.05304
 
+    // random xor constants
     static const uint64_t a[4] = {
-        0xe5f8fa077b92a8a8, // random additive offsets...
-        0x7a0cd918958c124d,
-        0x86222f7d388588d4,
-        0xd30cbd35f2b64f52
+        0x214c146c88e47cb7,
+        0xa66d8cc21285aafa,
+        0x68c7ef2d7b1a54d4,
+        0xb053a7d7aa238c61
     };
+    // random odd multipliers
     static const uint64_t m[4] = {
         0xaef17502108ef2d9, // standard PCG multiplier
-        0xf34026eeb86766af, // random odd multipliers...
+        0xf34026eeb86766af,
         0x38fd70ad58dd9fbb,
         0x6677f9b93ab0c04d
     };
 
     // PCG-RXS-M-XS-64 output with four variants
     for (int i = 0; i < 4; i++) {
-        uint64_t p = x + a[i];
-        p ^= p >> ((p >> 59) + 5);
-        p *= m[i];
-        p ^= p >> 43;
-        dst[i] = src[i] + p; // SplitMix dot product
+        uint64_t c = src[i];
+        uint64_t w = x ^ a[i];
+        c += w*(2*c + 1); // c = (2c+1)(2w+1)÷2 % 2^64 (double bijection)
+        c ^= c >> ((c >> 59) + 5);
+        c *= m[i];
+        c ^= c >> 43;
+        dst[i] = c;
     }
 }