random: Optimize `Randomizer::getBytesFromString()` #14894

SakiTakamachi · 2024-07-10T06:03:44Z

Benchmark codes

Since the description is too long, <?php and use are omitted.
$str is all 256 characters

Using PcgOneseq128XslRr64

0.php:

$r = new Randomizer(new PcgOneseq128XslRr64());
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytesFromString($str, 1);
}

1.php:

$r = new Randomizer(new PcgOneseq128XslRr64());
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 10000000; $i++) {
    $r->getBytesFromString($str, 16);
}

2.php

$r = new Randomizer(new PcgOneseq128XslRr64());
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 300000; $i++) {
    $r->getBytesFromString($str, 1024);
}

Omit constructor arguments

n0.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 1000000; $i++) {
    $r->getBytesFromString($str, 1);
}

n1.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 500000; $i++) {
    $r->getBytesFromString($str, 16);
}

n2.php:

$r = new Randomizer();
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 5000; $i++) {
    $r->getBytesFromString($str, 1024);
}

Using PcgOneseq128XslRr64

before

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     334.1 ms ±   5.6 ms    [User: 329.8 ms, System: 3.4 ms]
  Range (min … max):   327.7 ms … 343.4 ms    10 runs

# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     609.3 ms ±  12.1 ms    [User: 603.4 ms, System: 5.0 ms]
  Range (min … max):   596.2 ms … 632.6 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     614.3 ms ±   5.9 ms    [User: 609.0 ms, System: 4.4 ms]
  Range (min … max):   605.6 ms … 621.5 ms    10 runs

after commit 1

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     344.1 ms ±  11.6 ms    [User: 339.6 ms, System: 3.7 ms]
  Range (min … max):   335.1 ms … 374.4 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     583.9 ms ±   5.4 ms    [User: 578.8 ms, System: 4.2 ms]
  Range (min … max):   576.8 ms … 597.1 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     543.7 ms ±   3.4 ms    [User: 540.0 ms, System: 2.6 ms]
  Range (min … max):   538.6 ms … 549.4 ms    10 runs

after commit 2

# hyperfine "php /mount/random/fromstr/0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/0.php
  Time (mean ± σ):     332.8 ms ±   4.0 ms    [User: 328.4 ms, System: 3.6 ms]
  Range (min … max):   327.3 ms … 338.8 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/1.php
  Time (mean ± σ):     487.9 ms ±  11.4 ms    [User: 484.5 ms, System: 2.7 ms]
  Range (min … max):   479.6 ms … 514.9 ms    10 runs

# hyperfine "php /mount/random/fromstr/2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/2.php
  Time (mean ± σ):     349.7 ms ±   8.0 ms    [User: 345.8 ms, System: 3.2 ms]
  Range (min … max):   343.3 ms … 369.7 ms    10 runs

Omit constructor arguments

before

# hyperfine "php /mount/random/fromstr/n0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n0.php
  Time (mean ± σ):     506.9 ms ±  22.2 ms    [User: 184.3 ms, System: 321.6 ms]
  Range (min … max):   476.7 ms … 536.2 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n1.php
  Time (mean ± σ):     479.5 ms ±   5.6 ms    [User: 184.4 ms, System: 294.2 ms]
  Range (min … max):   471.8 ms … 489.1 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n2.php
  Time (mean ± σ):     301.7 ms ±   4.8 ms    [User: 104.0 ms, System: 196.9 ms]
  Range (min … max):   296.3 ms … 309.7 ms    10 runs

after

# hyperfine "php /mount/random/fromstr/n0.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n0.php
  Time (mean ± σ):     484.7 ms ±   5.2 ms    [User: 181.5 ms, System: 302.4 ms]
  Range (min … max):   475.7 ms … 491.3 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n1.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n1.php
  Time (mean ± σ):     475.9 ms ±   7.7 ms    [User: 174.4 ms, System: 300.7 ms]
  Range (min … max):   470.2 ms … 495.7 ms    10 runs
 
# hyperfine "php /mount/random/fromstr/n2.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/n2.php
  Time (mean ± σ):     296.1 ms ±   4.8 ms    [User: 98.2 ms, System: 197.1 ms]
  Range (min … max):   289.4 ms … 305.5 ms    10 runs

Girgias

I don't see why this code is more optimized than the previous one.

There could be 3-4 different reasons, but all the code changes here don't pinpoint what is the major problem.

I would prefer to keep the for loops as they are clearer IMHO.

There is no indication if the speed-up comes from using a while loop, doing the comparisons against 0, or delaying the increment of failure

SakiTakamachi · 2024-07-11T12:33:46Z

@Girgias

Thank you for confirmation.

I wanted to keep this as simple as possible, but all of these changes are related to improving performance.

If you don't mind a longer explanation, I can break down the commits into smaller chunks, measure them step by step, and make it clear how performance improves.

Incidentally, the most important of these changes is that the bit mask is now calculated 8 bytes at a time, rather than 1 byte at a time.

However, other changes also have a measurable effect on performance.

Girgias · 2024-07-11T13:22:32Z

I would prefer having the commits split and indicate each performance benefit it brings, so we can decide on a case by case if the tradeoff is worth it :)

SakiTakamachi · 2024-07-11T13:26:44Z

Okay, I think I'll probably split it into 5. I'll split it and force push it.

SakiTakamachi · 2024-07-11T15:02:54Z

@Girgias
I'm very embarrassed, but it seems like I made a mistake in my measurements.
When I broke the changes down into smaller pieces, some changes didn't make sense, and one change actually slowed me down.

I kept only the changes that really worked and reverted the rest.

The measurement results for commits 1 and 2 are listed in the explanation. I also re-measured "before".

edit:
It's not that I made a mistake in my measurements, but rather that I didn't measure them carefully enough that I missed changes that didn't actually make sense.

Girgias · 2024-07-12T14:23:57Z

No worries, it happens :)

Now I can definitely see why the change improved the performance even without looking at the resulting assembly!

I'll wait for @TimWolla to approve the PR.

TimWolla · 2024-07-19T13:11:21Z

I'm afraid I'm unable to reproduce the improvements to the degree that your initial post indicates. I'm seeing a 1% difference between df6d85a and the latest commit in this PR.

I am using a Intel(R) Core(TM) i7-1365U and I am compiling with:

./configure --enable-zend-test --enable-option-checking=fatal --enable-phpdbg --enable-fpm --enable-werror CC=clang-16 CXX=clang++-16

My test script is:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

(using a fixed seed to ensure that the seeding does not have an impact)

and then running the benchmark using:

hyperfine 'sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
  '/tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php' \
  '/tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'

to get hyperfine to make the comparison for me instead of needing to manually compare the numbers.

/tmp/unoptimized is the version in df6d85a, /tmp/unoptimized-commit1 is the first commit of this PR and sapi/cli/php is the PR.

My results are:

Benchmark 1: sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.207 s ±  0.054 s    [User: 3.203 s, System: 0.003 s]
  Range (min … max):    3.141 s …  3.310 s    10 runs
 
Benchmark 2: /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.215 s ±  0.029 s    [User: 3.213 s, System: 0.002 s]
  Range (min … max):    3.169 s …  3.255 s    10 runs
 
Benchmark 3: /tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.181 s ±  0.014 s    [User: 3.178 s, System: 0.003 s]
  Range (min … max):    3.165 s …  3.206 s    10 runs
 
Summary
  /tmp/unoptimized-commit1 -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.01 ± 0.02 times faster than sapi/cli/php -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.01 ± 0.01 times faster than /tmp/unoptimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

Could it be that you accidentally compiled a debug build with compiler optimizations disabled or something like that?

SakiTakamachi · 2024-07-20T10:48:30Z

@TimWolla

My CPU: 2.6 GHz 6 core Intel Core i7 (Mac book pro)
I measured these using gcc. I tried it with clang and got roughly the same results as yours.

// gcc
// before
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):      5.876 s ±  0.214 s    [User: 5.871 s, System: 0.003 s]
  Range (min … max):    5.726 s …  6.285 s    10 runs

// after
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):      4.833 s ±  0.170 s    [User: 4.829 s, System: 0.003 s]
  Range (min … max):    4.748 s …  5.291 s    10 runs



// clang
// before
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     10.864 s ±  0.180 s    [User: 10.859 s, System: 0.004 s]
  Range (min … max):   10.697 s … 11.162 s    10 runs

// after
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     10.595 s ±  0.239 s    [User: 10.590 s, System: 0.003 s]
  Range (min … max):   10.177 s … 11.000 s    10 runs

The configurations are as follows:

// gcc
./configure --disable-debug --disable-all --enable-bcmath --enable-tokenizer

// clang
./configure --disable-debug --disable-all --enable-bcmath --enable-tokenizer CC=clang

My gcc is a little old, so that might be the problem. I'll test it later in another environment.

SakiTakamachi · 2024-07-20T11:36:00Z

@TimWolla
I tried it with gcc version 13.2.0, clang version 18.1.3 (ubuntu24.04).

The results are the same as before, with a significant difference in gcc, but almost no difference in clang. And clang is considerably slower than when compiled with gcc.

Could you please try it with gcc?

edit:
I wrote that there is a significant speed difference with gcc, but the difference is smaller compared to measurements with older gcc. (It took too long, so I reduced the number of loops to 1/10.)

// gcc13
// before
# hyperfine "php /mount/random/fromstr/t.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     524.1 ms ±   9.9 ms    [User: 520.3 ms, System: 2.8 ms]
  Range (min … max):   515.5 ms … 550.2 ms    10 runs

// afte
# hyperfine "php /mount/random/fromstr/t.php" --warmup 10
Benchmark 1: php /mount/random/fromstr/t.php
  Time (mean ± σ):     489.4 ms ±   2.3 ms    [User: 486.0 ms, System: 2.7 ms]
  Range (min … max):   485.8 ms … 493.9 ms    10 runs

By the way, why is there such a big speed difference between gcc and clang...?

TimWolla · 2024-07-20T12:03:14Z

For:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
// 256 characters
$str = '1GCOFExQwWVNBFxOpGXilQdjcCrS6CxJRbgRf214G6w0bnFdwlmylDIpAQlcHVH5co5heIrarVoodaGeTbMQwWaM16fwXfLr03tjYymp2hg0XI5MqTlEr1taXFyHD5fYNqX7nkFJKOYU3FRMeIUkXAMpqqkIdNXP2od5Wbkra5rvLV4WT1lGN0Yg0AeEXAPq8nRxdePU2M6vQOX7wvJKKrmMAWsopEEDAdOxtrOb8lp7oPI3RjhXMbTFXaPgjljO';

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

I get the following. -baseline is the commit df6d85a and -optimized is this PR. gcc is gcc (Ubuntu 13.2.0-4ubuntu3) 13.2.0 and clang is Ubuntu clang version 16.0.6 (15).

$ hyperfine -L binary clang-baseline,clang-optimized,gcc-baseline,gcc-optimized '/tmp/php/{binary} -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php'
Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.278 s ±  0.098 s    [User: 3.274 s, System: 0.003 s]
  Range (min … max):    3.183 s …  3.489 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.215 s ±  0.036 s    [User: 3.210 s, System: 0.004 s]
  Range (min … max):    3.174 s …  3.268 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.926 s ±  0.020 s    [User: 2.921 s, System: 0.004 s]
  Range (min … max):    2.910 s …  2.969 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      2.832 s ±  0.027 s    [User: 2.827 s, System: 0.005 s]
  Range (min … max):    2.810 s …  2.890 s    10 runs
 
Summary
  /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.03 ± 0.01 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.14 ± 0.02 times faster than /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.16 ± 0.04 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

for

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
$str = implode('', range('a', 'z')).implode('', range('A', 'Z')).implode('', range('0', '9'));

for ($i = 0; $i < 100000000; $i++) {
    $r->getBytesFromString($str, 16);
}

which uses a more realistic alphabet, I get:

Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      4.032 s ±  0.047 s    [User: 4.028 s, System: 0.003 s]
  Range (min … max):    3.987 s …  4.143 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.891 s ±  0.024 s    [User: 3.886 s, System: 0.004 s]
  Range (min … max):    3.867 s …  3.954 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.653 s ±  0.029 s    [User: 3.648 s, System: 0.004 s]
  Range (min … max):    3.629 s …  3.716 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      3.534 s ±  0.053 s    [User: 3.526 s, System: 0.007 s]
  Range (min … max):    3.480 s …  3.650 s    10 runs
 
Summary
  /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.03 ± 0.02 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.10 ± 0.02 times faster than /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.14 ± 0.02 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

So I can indeed confirm that gcc is faster than clang (but not the 2× you are seeing) and that this PR is (slightly) faster than the baseline for both.

TimWolla · 2024-07-20T12:36:18Z

For:

<?php
use Random\Randomizer;
use Random\Engine\PcgOneseq128XslRr64;
$r = new Randomizer(new PcgOneseq128XslRr64(0));
$str = implode('', range('a', 'z')).implode('', range('A', 'Z')).implode('', range('0', '9'));

for ($i = 0; $i < 1000000; $i++) {
    $r->getBytesFromString($str, 1024);
}

it's

Benchmark 1: /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.377 s ±  0.003 s    [User: 1.375 s, System: 0.002 s]
  Range (min … max):    1.374 s …  1.383 s    10 runs
 
Benchmark 2: /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.128 s ±  0.036 s    [User: 1.125 s, System: 0.002 s]
  Range (min … max):    1.096 s …  1.217 s    10 runs
 
Benchmark 3: /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.378 s ±  0.014 s    [User: 1.374 s, System: 0.003 s]
  Range (min … max):    1.358 s …  1.390 s    10 runs
 
Benchmark 4: /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
  Time (mean ± σ):      1.241 s ±  0.032 s    [User: 1.238 s, System: 0.004 s]
  Range (min … max):    1.206 s …  1.314 s    10 runs
 
Summary
  /tmp/php/clang-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php ran
    1.10 ± 0.05 times faster than /tmp/php/gcc-optimized -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.22 ± 0.04 times faster than /tmp/php/clang-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php
    1.22 ± 0.04 times faster than /tmp/php/gcc-baseline -d zend_extension=php-src/modules/opcache.so -d opcache.enable_cli=1 test.php

So it appears that the “set-up” is much slower with clang than with gcc, but the actual loop becomes much faster. In both cases the optimized version beats the non-optimized one.

SakiTakamachi · 2024-07-20T12:55:05Z

@TimWolla

Thank you for confirmation.

Hmm. It seems quite different from my environment.

I'll leave it up to you to decide whether to merge this or not :)

TimWolla · 2024-07-20T13:01:46Z

@SakiTakamachi I've just pushed a commit to improve the clarity of the mask expansion, because the ~0 / 0xff is an obscure way of writing 0x0101010101010101. The resulting assembly is the same.

TimWolla

I can confirm that the changes do not make the situation worse in any case and improve it for gcc and clang with large output sizes.

SakiTakamachi · 2024-07-20T13:53:33Z

@TimWolla

I was just having dinner, thanks Marge!

SakiTakamachi requested review from TimWolla and zeriyoshi as code owners July 10, 2024 06:03

github-actions bot added the Extension: random label Jul 10, 2024

SakiTakamachi force-pushed the refactor_randomizer2 branch 3 times, most recently from a1cc215 to 4e92056 Compare July 10, 2024 08:11

Girgias reviewed Jul 11, 2024

View reviewed changes

SakiTakamachi added 2 commits July 11, 2024 23:35

Changed calculate bitmask 8 bytes at a time.

65ea801

Optimized bit shifting

f23980d

SakiTakamachi force-pushed the refactor_randomizer2 branch from 4e92056 to f23980d Compare July 11, 2024 14:57

Improve clarity of the mask expansion

2fd607f

TimWolla changed the title ~~ext/random: Optimized getBytesFromString~~ random: Optimized getBytesFromString Jul 20, 2024

TimWolla changed the title ~~random: Optimized getBytesFromString~~ random: Optimize Randomizer::getBytesFromString() Jul 20, 2024

TimWolla approved these changes Jul 20, 2024

View reviewed changes

Girgias approved these changes Jul 20, 2024

View reviewed changes

TimWolla merged commit 1fc2ddc into php:master Jul 20, 2024
11 checks passed

SakiTakamachi deleted the refactor_randomizer2 branch July 20, 2024 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random: Optimize `Randomizer::getBytesFromString()` #14894

random: Optimize `Randomizer::getBytesFromString()` #14894

SakiTakamachi commented Jul 10, 2024 •

edited

Loading

Girgias left a comment

SakiTakamachi commented Jul 11, 2024

Girgias commented Jul 11, 2024

SakiTakamachi commented Jul 11, 2024

SakiTakamachi commented Jul 11, 2024 •

edited

Loading

Girgias commented Jul 12, 2024

TimWolla commented Jul 19, 2024

SakiTakamachi commented Jul 20, 2024 •

edited

Loading

SakiTakamachi commented Jul 20, 2024 •

edited

Loading

TimWolla commented Jul 20, 2024

TimWolla commented Jul 20, 2024

SakiTakamachi commented Jul 20, 2024

TimWolla commented Jul 20, 2024

TimWolla left a comment

SakiTakamachi commented Jul 20, 2024

random: Optimize Randomizer::getBytesFromString() #14894

random: Optimize Randomizer::getBytesFromString() #14894

Conversation

SakiTakamachi commented Jul 10, 2024 • edited Loading

Benchmark codes

Using PcgOneseq128XslRr64

Omit constructor arguments

Using PcgOneseq128XslRr64

before

after commit 1

after commit 2

Omit constructor arguments

before

after

Girgias left a comment

Choose a reason for hiding this comment

SakiTakamachi commented Jul 11, 2024

Girgias commented Jul 11, 2024

SakiTakamachi commented Jul 11, 2024

SakiTakamachi commented Jul 11, 2024 • edited Loading

Girgias commented Jul 12, 2024

TimWolla commented Jul 19, 2024

SakiTakamachi commented Jul 20, 2024 • edited Loading

SakiTakamachi commented Jul 20, 2024 • edited Loading

TimWolla commented Jul 20, 2024

TimWolla commented Jul 20, 2024

SakiTakamachi commented Jul 20, 2024

TimWolla commented Jul 20, 2024

TimWolla left a comment

Choose a reason for hiding this comment

SakiTakamachi commented Jul 20, 2024

random: Optimize `Randomizer::getBytesFromString()` #14894

random: Optimize `Randomizer::getBytesFromString()` #14894

SakiTakamachi commented Jul 10, 2024 •

edited

Loading

SakiTakamachi commented Jul 11, 2024 •

edited

Loading

SakiTakamachi commented Jul 20, 2024 •

edited

Loading

SakiTakamachi commented Jul 20, 2024 •

edited

Loading