Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random: Optimize Randomizer::getBytesFromString() #14894

Merged
merged 3 commits into from
Jul 20, 2024

Conversation

SakiTakamachi
Copy link
Member

@SakiTakamachi SakiTakamachi commented Jul 10, 2024

Benchmark codes

Since the description is too long, <?php and use are omitted.
$str is all 256 characters

Using PcgOneseq128XslRr64

0.php:

$r = new Randomizer(new PcgOneseq128XslRr64());
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytesFromString($str, 1);
}

1.php:

$r = new Randomizer(new PcgOneseq128XslRr64());
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytesFromString($str, 16);
}

2.php

$r = new Randomizer(new PcgOneseq128XslRr64());
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 300000; $i++) {
    $r->getBytesFromString($str, 1024);
}

Omit constructor arguments

n0.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 1000000; $i++) {
    $r->getBytesFromString($str, 1);
}

n1.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 500000; $i++) {
    $r->getBytesFromString($str, 16);
}

n2.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 5000; $i++) {
    $r->getBytesFromString($str, 1024);
}

Using PcgOneseq128XslRr64

before

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     334.1 ms ±   5.6 ms    [User: 329.8 ms, System: 3.4 ms]
  Range (min … max):   327.7 ms … 343.4 ms    10 runs

# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     609.3 ms ±  12.1 ms    [User: 603.4 ms, System: 5.0 ms]
  Range (min … max):   596.2 ms … 632.6 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     614.3 ms ±   5.9 ms    [User: 609.0 ms, System: 4.4 ms]
  Range (min … max):   605.6 ms … 621.5 ms    10 runs

after commit 1

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     344.1 ms ±  11.6 ms    [User: 339.6 ms, System: 3.7 ms]
  Range (min … max):   335.1 ms … 374.4 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     583.9 ms ±   5.4 ms    [User: 578.8 ms, System: 4.2 ms]
  Range (min … max):   576.8 ms … 597.1 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     543.7 ms ±   3.4 ms    [User: 540.0 ms, System: 2.6 ms]
  Range (min … max):   538.6 ms … 549.4 ms    10 runs

after commit 2

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     332.8 ms ±   4.0 ms    [User: 328.4 ms, System: 3.6 ms]
  Range (min … max):   327.3 ms … 338.8 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     487.9 ms ±  11.4 ms    [User: 484.5 ms, System: 2.7 ms]
  Range (min … max):   479.6 ms … 514.9 ms    10 runs

# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     349.7 ms ±   8.0 ms    [User: 345.8 ms, System: 3.2 ms]
  Range (min … max):   343.3 ms … 369.7 ms    10 runs

Omit constructor arguments

before

# hyperfine "php /mount/random/fromstr/n0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n0.php
  Time (mean ± σ):     506.9 ms ±  22.2 ms    [User: 184.3 ms, System: 321.6 ms]
  Range (min … max):   476.7 ms … 536.2 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n1.php
  Time (mean ± σ):     479.5 ms ±   5.6 ms    [User: 184.4 ms, System: 294.2 ms]
  Range (min … max):   471.8 ms … 489.1 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n2.php
  Time (mean ± σ):     301.7 ms ±   4.8 ms    [User: 104.0 ms, System: 196.9 ms]
  Range (min … max):   296.3 ms … 309.7 ms    10 runs

after

# hyperfine "php /mount/random/fromstr/n0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n0.php
  Time (mean ± σ):     484.7 ms ±   5.2 ms    [User: 181.5 ms, System: 302.4 ms]
  Range (min … max):   475.7 ms … 491.3 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n1.php
  Time (mean ± σ):     475.9 ms ±   7.7 ms    [User: 174.4 ms, System: 300.7 ms]
  Range (min … max):   470.2 ms … 495.7 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n2.php
  Time (mean ± σ):     296.1 ms ±   4.8 ms    [User: 98.2 ms, System: 197.1 ms]
  Range (min … max):   289.4 ms … 305.5 ms    10 runs

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this code is more optimized than the previous one.

There could be 3-4 different reasons, but all the code changes here don't pinpoint what is the major problem.

I would prefer to keep the for loops as they are clearer IMHO.

There is no indication if the speed-up comes from using a while loop, doing the comparisons against 0, or delaying the increment of failure

@SakiTakamachi
Copy link
Member Author

@Girgias

Thank you for confirmation.

I wanted to keep this as simple as possible, but all of these changes are related to improving performance.

If you don't mind a longer explanation, I can break down the commits into smaller chunks, measure them step by step, and make it clear how performance improves.

Incidentally, the most important of these changes is that the bit mask is now calculated 8 bytes at a time, rather than 1 byte at a time.

However, other changes also have a measurable effect on performance.

@Girgias
Copy link
Member

Girgias commented Jul 11, 2024

I would prefer having the commits split and indicate each performance benefit it brings, so we can decide on a case by case if the tradeoff is worth it :)

@SakiTakamachi
Copy link
Member Author

Okay, I think I'll probably split it into 5. I'll split it and force push it.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 11, 2024

@Girgias
I'm very embarrassed, but it seems like I made a mistake in my measurements.
When I broke the changes down into smaller pieces, some changes didn't make sense, and one change actually slowed me down.

I kept only the changes that really worked and reverted the rest.

The measurement results for commits 1 and 2 are listed in the explanation. I also re-measured "before".

edit:
It's not that I made a mistake in my measurements, but rather that I didn't measure them carefully enough that I missed changes that didn't actually make sense.

@Girgias
Copy link
Member

Girgias commented Jul 12, 2024

No worries, it happens :)

Now I can definitely see why the change improved the performance even without looking at the resulting assembly!

I'll wait for @TimWolla to approve the PR.

@TimWolla
Copy link
Member

I'm afraid I'm unable to reproduce the improvements to the degree that your initial post indicates. I'm seeing a 1% difference between df6d85a and the latest commit in this PR.

I am using a Intel(R) Core(TM) i7-1365U and I am compiling with:

./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm --enable-werror CC=clang-16 CXX=clang++-16

My test script is:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

(using a fixed seed to ensure that the seeding does not have an impact)

and then running the benchmark using:

hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
  '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
  '/tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'

to get hyperfine to make the comparison for me instead of needing to manually compare the numbers.

/tmp/unoptimized is the version in df6d85a, /tmp/unoptimized-commit1 is the first commit of this PR and sapi/cli/php is the PR.

My results are:

Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.207 s ±  0.054 s    [User: 3.203 s, System: 0.003 s]
  Range (min … max):    3.141 s …  3.310 s    10 runs
 
Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.215 s ±  0.029 s    [User: 3.213 s, System: 0.002 s]
  Range (min … max):    3.169 s …  3.255 s    10 runs
 
Benchmark 3: /tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.181 s ±  0.014 s    [User: 3.178 s, System: 0.003 s]
  Range (min … max):    3.165 s …  3.206 s    10 runs
 
Summary
  /tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.01 ± 0.02 times faster than sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.01 ± 0.01 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

Could it be that you accidentally compiled a debug build with compiler optimizations disabled or something like that?

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 20, 2024

@TimWolla

My CPU: 2.6 GHz 6 core Intel Core i7 (Mac book pro)
I measured these using gcc. I tried it with clang and got roughly the same results as yours.

// gcc
// before
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):      5.876 s ±  0.214 s    [User: 5.871 s, System: 0.003 s]
  Range (min … max):    5.726 s …  6.285 s    10 runs

// after
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):      4.833 s ±  0.170 s    [User: 4.829 s, System: 0.003 s]
  Range (min … max):    4.748 s …  5.291 s    10 runs



// clang
// before
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     10.864 s ±  0.180 s    [User: 10.859 s, System: 0.004 s]
  Range (min … max):   10.697 s … 11.162 s    10 runs

// after
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     10.595 s ±  0.239 s    [User: 10.590 s, System: 0.003 s]
  Range (min … max):   10.177 s … 11.000 s    10 runs

The configurations are as follows:

// gcc
./configure --disable-debug --disable-all --enable-bcmath --enable-tokenizer

// clang
./configure --disable-debug --disable-all --enable-bcmath --enable-tokenizer CC=clang

My gcc is a little old, so that might be the problem. I'll test it later in another environment.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Jul 20, 2024

@TimWolla
I tried it with gcc version 13.2.0, clang version 18.1.3 (ubuntu24.04).

The results are the same as before, with a significant difference in gcc, but almost no difference in clang. And clang is considerably slower than when compiled with gcc.

Could you please try it with gcc?

edit:
I wrote that there is a significant speed difference with gcc, but the difference is smaller compared to measurements with older gcc. (It took too long, so I reduced the number of loops to 1/10.)

// gcc13
// before
# hyperfine "php /mount/random/fromstr/t.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     524.1 ms ±   9.9 ms    [User: 520.3 ms, System: 2.8 ms]
  Range (min … max):   515.5 ms … 550.2 ms    10 runs

// afte
# hyperfine "php /mount/random/fromstr/t.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     489.4 ms ±   2.3 ms    [User: 486.0 ms, System: 2.7 ms]
  Range (min … max):   485.8 ms … 493.9 ms    10 runs

By the way, why is there such a big speed difference between gcc and clang...?

@TimWolla
Copy link
Member

For:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

I get the following. -baseline is the commit df6d85a and -optimized is this PR. gcc is gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0 and clang is Ubuntu clang version 16.0.6 (15).

$ hyperfine -L binary clang-baseline,clang-optimized,gcc-baseline,gcc-optimized '/tmp/php/{binary} -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'
Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.278 s ±  0.098 s    [User: 3.274 s, System: 0.003 s]
  Range (min … max):    3.183 s …  3.489 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.215 s ±  0.036 s    [User: 3.210 s, System: 0.004 s]
  Range (min … max):    3.174 s …  3.268 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.926 s ±  0.020 s    [User: 2.921 s, System: 0.004 s]
  Range (min … max):    2.910 s …  2.969 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.832 s ±  0.027 s    [User: 2.827 s, System: 0.005 s]
  Range (min … max):    2.810 s …  2.890 s    10 runs
 
Summary
  /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.03 ± 0.01 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.14 ± 0.02 times faster than /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.16 ± 0.04 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

for

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
$str = implode('', range('a', 'z')).implode('', range('A', 'Z')).implode('', range('0', '9'));

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

which uses a more realistic alphabet, I get:

Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      4.032 s ±  0.047 s    [User: 4.028 s, System: 0.003 s]
  Range (min … max):    3.987 s …  4.143 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.891 s ±  0.024 s    [User: 3.886 s, System: 0.004 s]
  Range (min … max):    3.867 s …  3.954 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.653 s ±  0.029 s    [User: 3.648 s, System: 0.004 s]
  Range (min … max):    3.629 s …  3.716 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.534 s ±  0.053 s    [User: 3.526 s, System: 0.007 s]
  Range (min … max):    3.480 s …  3.650 s    10 runs
 
Summary
  /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.03 ± 0.02 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.10 ± 0.02 times faster than /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.14 ± 0.02 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

So I can indeed confirm that gcc is faster than clang (but not the 2× you are seeing) and that this PR is (slightly) faster than the baseline for both.

@TimWolla
Copy link
Member

For:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
$str = implode('', range('a', 'z')).implode('', range('A', 'Z')).implode('', range('0', '9'));

for ($i = 0; $i < 1000000; $i++) {
    $r->getBytesFromString($str, 1024);
}

it's

Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.377 s ±  0.003 s    [User: 1.375 s, System: 0.002 s]
  Range (min … max):    1.374 s …  1.383 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.128 s ±  0.036 s    [User: 1.125 s, System: 0.002 s]
  Range (min … max):    1.096 s …  1.217 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.378 s ±  0.014 s    [User: 1.374 s, System: 0.003 s]
  Range (min … max):    1.358 s …  1.390 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.241 s ±  0.032 s    [User: 1.238 s, System: 0.004 s]
  Range (min … max):    1.206 s …  1.314 s    10 runs
 
Summary
  /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.10 ± 0.05 times faster than /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.22 ± 0.04 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.22 ± 0.04 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

So it appears that the “set-up” is much slower with clang than with gcc, but the actual loop becomes much faster. In both cases the optimized version beats the non-optimized one.

@SakiTakamachi
Copy link
Member Author

@TimWolla

Thank you for confirmation.

Hmm. It seems quite different from my environment.

I'll leave it up to you to decide whether to merge this or not :)

@TimWolla TimWolla changed the title ext/random: Optimized getBytesFromString random: Optimized getBytesFromString Jul 20, 2024
@TimWolla TimWolla changed the title random: Optimized getBytesFromString random: Optimize Randomizer::getBytesFromString() Jul 20, 2024
@TimWolla
Copy link
Member

@SakiTakamachi I've just pushed a commit to improve the clarity of the mask expansion, because the ~0 / 0xff is an obscure way of writing 0x0101010101010101. The resulting assembly is the same.

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the changes do not make the situation worse in any case and improve it for gcc and clang with large output sizes.

@TimWolla TimWolla merged commit 1fc2ddc into php:master Jul 20, 2024
11 checks passed
@SakiTakamachi SakiTakamachi deleted the refactor_randomizer2 branch July 20, 2024 13:52
@SakiTakamachi
Copy link
Member Author

@TimWolla

I was just having dinner, thanks Marge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants