-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random: Optimize Randomizer::getBytesFromString()
#14894
Conversation
a1cc215
to
4e92056
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this code is more optimized than the previous one.
There could be 3-4 different reasons, but all the code changes here don't pinpoint what is the major problem.
I would prefer to keep the for loops as they are clearer IMHO.
There is no indication if the speed-up comes from using a while loop, doing the comparisons against 0, or delaying the increment of failure
Thank you for confirmation. I wanted to keep this as simple as possible, but all of these changes are related to improving performance. If you don't mind a longer explanation, I can break down the commits into smaller chunks, measure them step by step, and make it clear how performance improves. Incidentally, the most important of these changes is that the bit mask is now calculated 8 bytes at a time, rather than 1 byte at a time. However, other changes also have a measurable effect on performance. |
I would prefer having the commits split and indicate each performance benefit it brings, so we can decide on a case by case if the tradeoff is worth it :) |
Okay, I think I'll probably split it into 5. I'll split it and force push it. |
4e92056
to
f23980d
Compare
@Girgias I kept only the changes that really worked and reverted the rest. The measurement results for commits 1 and 2 are listed in the explanation. I also re-measured "before". edit: |
No worries, it happens :) Now I can definitely see why the change improved the performance even without looking at the resulting assembly! I'll wait for @TimWolla to approve the PR. |
I'm afraid I'm unable to reproduce the improvements to the degree that your initial post indicates. I'm seeing a 1% difference between df6d85a and the latest commit in this PR. I am using a Intel(R) Core(TM) i7-1365U and I am compiling with:
My test script is:
(using a fixed seed to ensure that the seeding does not have an impact) and then running the benchmark using:
to get hyperfine to make the comparison for me instead of needing to manually compare the numbers.
My results are:
Could it be that you accidentally compiled a debug build with compiler optimizations disabled or something like that? |
My CPU: 2.6 GHz 6 core Intel Core i7 (Mac book pro)
The configurations are as follows:
My gcc is a little old, so that might be the problem. I'll test it later in another environment. |
@TimWolla The results are the same as before, with a significant difference in gcc, but almost no difference in clang. And clang is considerably slower than when compiled with gcc. Could you please try it with gcc? edit:
By the way, why is there such a big speed difference between gcc and clang...? |
For:
I get the following.
for
which uses a more realistic alphabet, I get:
So I can indeed confirm that gcc is faster than clang (but not the 2× you are seeing) and that this PR is (slightly) faster than the baseline for both. |
For:
it's
So it appears that the “set-up” is much slower with clang than with gcc, but the actual loop becomes much faster. In both cases the optimized version beats the non-optimized one. |
Thank you for confirmation. Hmm. It seems quite different from my environment. I'll leave it up to you to decide whether to merge this or not :) |
getBytesFromString
getBytesFromString
getBytesFromString
Randomizer::getBytesFromString()
@SakiTakamachi I've just pushed a commit to improve the clarity of the mask expansion, because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the changes do not make the situation worse in any case and improve it for gcc and clang with large output sizes.
I was just having dinner, thanks Marge! |
Benchmark codes
Since the description is too long,
<?php
anduse
are omitted.$str
is all 256 charactersUsing PcgOneseq128XslRr64
0.php:
1.php:
2.php
Omit constructor arguments
n0.php:
n1.php:
n2.php:
Using PcgOneseq128XslRr64
before
after commit 1
after commit 2
Omit constructor arguments
before
after