-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHANGE: Performance Optimizations for IteratorRandom::choose()
and related methods
#1266
Comments
I like the idea Mark. This would be a value-breaking change (acceptable at this time). Could you run a couple more benchmarks, including for very short iterators and multiple RNGs? If they look reasonable (not too much penalty with very short iterators) then I'd like to see a PR. |
I made a PR. I actually found an even faster way of doing it, that uses the minimum possible number of gens (average of 2 per iterator item). That method won't work for |
|
Good idea. Will submit a PR at some point. |
I was going to close this now that #1268 is merged, but we should answer this first:
But is this useful enough to include in |
It’s quite common in machine learning and mathematical optimization at least. Before |
Probably not a huge deal but the |
Summary
IteratorRandom::choose()
and related methods callgen_range()
much more frequently than strictly necessary when choosing from iterators without size hints.By reducing the number of these calls it is possible to speed up the relevant benchmarks by about 1.5-2x
Specifically
This would not require a breaking change to the API or violate any explicit promises but if this change is made then different items will be randomly returned from the
choose()
method and theRng
will be in a different state afterwards which may affect some users.Details
The
IteratorRandom::choose()
and related methods would have to change.The way they work now (for iterators without size hints) is that for every item, they generate a random number in the range
0..n
wheren
is the number of items seen so far. If the generated number is zero (i.e. a 1/n chance) the current result changes to the new number.This means that for the first twenty items, twenty random numbers are generated. My suggestion is to instead generate a number in the range 0..20! (20 factorial is the largest factorial less than
u64::MAX
) and use this random number instead of the first twenty random numbers.A number in that range has a 1 in 2 chance of being 0 mod 2, then after division by 2, it has a 1 in 3 chance of being 0 mod 3 and so on up to 20.
After the first twenty numbers you then have to use 33!/20! which only gives you 13 numbers and the efficiency gradually decreases with larger n.
This seems like a lot of work but it's still much cheaper than generating new random numbers, especially with cryptographically secure rngs.
There is no measurable performance change when using
choose()
with a fixed slice iterator (the code for that part is unchanged).When using an iterator with a size hint but not a fixed size, performance seems to have degraded very slightly. It would be possible to apply this optimization to that case as well though.
I believe the results of the functions would be consistent between 32 and 64 bit architectures (I am generating
u64
, notusize
). I don't have 32 bit hardware to test on so I can't comment on the performance differences.The following methods would also probably benefit from similar optimizations:
Motivation
Performance.
Alternatives
The code could be left as it is and performance hungry users could use my crate which has a fast version of the choose function (I made the crate just to have a
random_max
function but got carried away optimizing it).If this is considered worth doing, I'm happy to submit a PR. I have written most of the code already for my own crate.
Have a lovely day, Mark
The text was updated successfully, but these errors were encountered: