Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gproc_pool): fix rand:uniform/1 range #193

Merged

Conversation

thalesmg
Copy link
Contributor

@thalesmg thalesmg commented May 3, 2023

Since rand:uniform/1's return value is a number in a closed interval (0 =< N =< Sz), we shouldn't call it with the pool size + 1, otherwise we will get a non-uniform distribution of results where one of the workers receive substantially more work than all others.

To verify this, I've started a pool with 8 workers using the random load-balancing algorithm and measured the frequencies of each PID in gproc_pool.

PoolName = my_pool.
PoolSize = 8.
ok = gproc_pool:new(PoolName, random, [{size, PoolSize}]).
lists:foreach(fun(I) ->
  gproc_pool:add_worker(PoolName, {PoolName, I}, I),
  spawn(fun() ->
    true = gproc_pool:connect_worker(PoolName, {PoolName, I}),
    receive
        stop -> ok
    end
  end),
  ok
end, lists:seq(1, PoolSize)).
MeasureFreqs = fun(Pool, NumPoints) ->
  Pids = lists:map(fun(_) -> gproc_pool:pick_worker(Pool) end, lists:seq(1, NumPoints)),
  lists:foldl(
    fun(Pid, Acc) ->
      maps:update_with(Pid, fun(N) -> N + 1 end, 1, Acc)
    end,
    #{},
    Pids
   )
end.
MeasureFreqs(PoolName, 100_000).

Results prior to the fix (note how <0.172.0> has almost double the frequency of other PIDs):

 #{<0.172.0> => 22222,<0.173.0> => 11138,<0.174.0> => 11000,
  <0.175.0> => 11046,<0.176.0> => 11141,<0.177.0> => 11165,
  <0.178.0> => 11166,<0.179.0> => 11122}

After the fix:

 #{<0.172.0> => 12460,<0.173.0> => 12367,<0.174.0> => 12541,
  <0.175.0> => 12494,<0.176.0> => 12649,<0.177.0> => 12399,
  <0.178.0> => 12439,<0.179.0> => 12651}

Since
[`rand:uniform/1`](https://www.erlang.org/doc/man/rand.html#uniform-1)'s
return value is a number in a closed interval (`0 =< N =< Sz`), we
shouldn't call it with the pool size + 1, otherwise we will get a
non-uniform distribution of results where one of the workers receive
substantially more work than all others.

To verify this, I've started a pool with 8 workers using the `random`
load-balancing algorithm and measured the frequencies of each PID in
`gproc_pool`.

```erlang
PoolName = my_pool.
PoolSize = 8.
ok = gproc_pool:new(PoolName, random, [{size, PoolSize}]).
lists:foreach(fun(I) ->
  gproc_pool:add_worker(PoolName, {PoolName, I}, I),
  spawn(fun() ->
    true = gproc_pool:connect_worker(PoolName, {PoolName, I}),
    receive
        stop -> ok
    end
  end),
  ok
end, lists:seq(1, PoolSize)).
MeasureFreqs = fun(Pool, NumPoints) ->
  Pids = lists:map(fun(_) -> gproc_pool:pick_worker(Pool) end, lists:seq(1, NumPoints)),
  lists:foldl(
    fun(Pid, Acc) ->
      maps:update_with(Pid, fun(N) -> N + 1 end, 1, Acc)
    end,
    #{},
    Pids
   )
end.
MeasureFreqs(PoolName, 100_000).
```

Results prior to the fix (note how `<0.172.0>` has almost double the
frequency of other PIDs):

```erlang
 #{<0.172.0> => 22222,<0.173.0> => 11138,<0.174.0> => 11000,
  <0.175.0> => 11046,<0.176.0> => 11141,<0.177.0> => 11165,
  <0.178.0> => 11166,<0.179.0> => 11122}
```

After the fix:

```erlang
 #{<0.172.0> => 12460,<0.173.0> => 12367,<0.174.0> => 12541,
  <0.175.0> => 12494,<0.176.0> => 12649,<0.177.0> => 12399,
  <0.178.0> => 12439,<0.179.0> => 12651}
```
@uwiger uwiger merged commit 4ca45e0 into uwiger:master May 4, 2023
@uwiger
Copy link
Owner

uwiger commented May 4, 2023

Thanks!

@thalesmg thalesmg deleted the fix-random-pool-uniform-distribution branch May 5, 2023 14:17
thalesmg added a commit to emqx/ehttpc that referenced this pull request May 8, 2023
Includes fix: uwiger/gproc#193

Prior to the fix, when using the `random` pool strategy, one of the
workers receives about double the load of other workers, which
decreases throughput of bridges like webhook.
@thalesmg
Copy link
Contributor Author

thalesmg commented May 8, 2023

Hi @uwiger , do you have plans to tag a new version that would contain this fix?

We are interested in using the upstream version if possible.

Thanks! 😸

thalesmg added a commit to thalesmg/emqx that referenced this pull request May 9, 2023
thalesmg added a commit to thalesmg/emqx that referenced this pull request May 9, 2023
Includes fix: uwiger/gproc#193

Prior to the fix, when using the `random` pool strategy, one of the
workers receives about double the load of other workers, which
decreases throughput of bridges like webhook.
thalesmg added a commit to emqx/ehttpc that referenced this pull request May 9, 2023
Includes fix: uwiger/gproc#193

Prior to the fix, when using the `random` pool strategy, one of the
workers receives about double the load of other workers, which
decreases throughput of bridges like webhook.
@uwiger
Copy link
Owner

uwiger commented May 12, 2023

Published version 0.9.1

@thalesmg
Copy link
Contributor Author

Thank you! 🍻

thalesmg added a commit to thalesmg/emqx that referenced this pull request May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants