Special JIT support for FFI #14491

dstogov · 2024-06-06T21:46:01Z

No description provided.

dstogov · 2024-06-06T22:16:03Z

@iluuu1994 @nielsdos I would appreciate, if you could take a quick look over this, when you have time. If this is interesting for you - please, share your ideas.

This is a very initial PoC yet. It aims to generate optimized native code instead of generic calls to FFI callbacks. There are a lot of not solved questions:

JIT should generate FFI type guards (check that the FFI\CData type is the same as during trace recording and compilation)
most FFI types are not persistent. I didn't find an efficient way to implement FFI\CData type guards yet.
FFI array bounds checks are not implemented
It would be great to store pointers to FFI\CData in CPU registers and "unbox" the temporary FFI\CData objects
Guards, bounds checks and CData pointers loads should be moved out of the loops
Access to FFI structures and unions fields is not implemented
Access to FFI variables
Native code to call FFI functions
Native wrappers for FFI callbacks

Even in the current state this makes access to FFI Arrays more than 20 times faster. See the following example:

<?php
function ary3($n) {
  for ($i=0; $i<$n; $i++) {
    $X[$i] = $i + 1;
    $Y[$i] = 0;
  }
  for ($k=0; $k<1000; $k++) {
    for ($i=$n-1; $i>=0; $i--) {
      $Y[$i] += $X[$i];
    }
  }
  $last = $n-1;
  print "$Y[0] $Y[$last]\n";
}

function ary3_ffi($n) {
  $X = FFI::new("int[$n]");
  $Y = FFI::new("int[$n]");
  for ($i=0; $i<$n; $i++) {
    $X[$i] = $i + 1;
    $Y[$i] = 0;
  }
  for ($k=0; $k<1000; $k++) {
    for ($i=$n-1; $i>=0; $i--) {
      $Y[$i] += $X[$i];
    }
  }
  $last = $n-1;
  print "$Y[0] $Y[$last]\n";
}

/*****/

function gethrtime()
{
  $hrtime = hrtime();
  return (($hrtime[0]*1000000000 + $hrtime[1]) / 1000000000);
}

function start_test()
{
  ob_start();
  return gethrtime();
}

function end_test($start, $name)
{
  global $total;
  $end = gethrtime();
  ob_end_clean();
  $total += $end-$start;
  $num = number_format($end-$start,3);
  $pad = str_repeat(" ", 24-strlen($name)-strlen($num));

  echo $name.$pad.$num."\n";
  ob_start();
  return gethrtime();
}

function total()
{
  global $total;
  $pad = str_repeat("-", 24);
  echo $pad."\n";
  $num = number_format($total,3);
  $pad = str_repeat(" ", 24-strlen("Total")-strlen($num));
  echo "Total".$pad.$num."\n";
}

$t0 = $t = start_test();
ary3(200000);
$t = end_test($t, "ary3(200000)");
ary3_ffi(200000);
$t = end_test($t, "ary3_ffi(200000)");

bwoebi · 2024-06-07T11:08:10Z

I absolutely love the idea of JITting specific functions (like FFI here). It will also allow JITing some function calls completely away in future I hope.

I just think that the JIT should expose an API to JIT specific functions rather than the other way round, that extensions expose their internals to the JIT and it needs to be hardcoded in JIT then. That should likely scale better when more extensions find something JIT worthy.
I.e. the code doing the JITting of the FFI functions and operator overloads should live in ext/ffi.
I'm okay with not doing that right away, but I feel like JIT should become separate from opcache and have a proper public API eventually...

nielsdos · 2024-06-07T15:58:46Z

I like the idea. Extensions in PHP are often wrappers around C libraries, and by adding support for JIT specializations for FFI, it opens the door for creating extension-like functionality within PHP with reasonable overhead.

I think that LuaJIT does something similar with their FFI, but it's been a long time since I looked at that. Perhaps there are ideas there that we could use here too. I'm not sure.

I agree with Bob's comment, but it also seems like a lot more effort and difficulty (as he already pointed out).

most FFI types are not persistent. I didn't find an efficient way to implement FFI\CData type guards yet.

If I understand right, the problem is the following: In normal cases you'd compare the FFI type pointer in the guard, but because they are not persistent the pointers aren't a unique way of identifying the type (e.g. a type allocated later may reuse the same memory address). Furthermore, the type pointer may not always be dereferenced because it could have been freed.
Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

dstogov · 2024-06-10T09:58:45Z

Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

LuaJIT uses this approach, but we will have to serialize IDs across several workers and probably keep the types forever

arnaud-lb · 2024-06-10T18:00:38Z

I also like the idea. This would reduce the amount of C code in use-cases such as Niels mentioned, which is a good thing.

Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

LuaJIT uses this approach, but we will have to serialize IDs across several workers and probably keep the types forever

At a minimum this requires a mapping from type structures to IDs, so that IDs are stable across workers and subsequent requests?

The size of the associated storage may be manageable if IDs are only used by JIT and are only allocated when a type is JITed, because then the mapping has the same lifetime as the JIT buffer, and also grows at the same time as the JIT buffer.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

Agreed. I was looking at range analysis earlier this year, and will continue working on this topic (range analysis) soon (unless someone else does it first - I don't want to block progress), so I will check if this can have an impact here.

dstogov · 2024-06-10T18:32:54Z

At a minimum this requires a mapping from type structures to IDs, so that IDs are stable across workers and subsequent requests?

yes.

The size of the associated storage may be manageable if IDs are only used by JIT and are only allocated when a type is JITed, because then the mapping has the same lifetime as the JIT buffer, and also grows at the same time as the JIT buffer.

I'm not sure if we can "persist" some CType during JIT-ing, because we will need to update all CData objects of this type.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

Agreed. I was looking at range analysis earlier this year, and will continue working on this topic (range analysis) soon (unless someone else does it first - I don't want to block progress), so I will check if this can have an impact here.

Luajit achieves good code through loop-peeling. It repeats loop body two times and removes all redundant code in the second copy using folding rules (common subexpression elimination, load forwarding, guard elimination, etc)

arnaud-lb · 2024-06-11T10:42:59Z

I'm not sure if we can "persist" some CType during JIT-ing, because we will need to update all CData objects of this type.

Indeed. I was thinking about something like this:

get_id(ctype):
    if ctype.id:
        return ctype.id
    if mapping[ctype]:
        return ctype.id := mapping[ctype]
    return ctype.id := mapping[ctype] := next_id()

This handles future instances, but this doesn't account for other existing instances in the same request, or existing instances of other workers that will get to execute the JITed code.

Maybe we can have a special exit that fetches the id? This is starting to get complicated though.

arnaud-lb · 2024-06-14T14:19:51Z

ext/opcache/ZendAccelerator.c

+	str = accel_find_interned_string(str);
+	if (str && (str->gc.u.type_info & IS_STR_FFI_TYPE)) {


You may need to include the cdef in the lookup key in some way, as str may depend on it.

E.g.:

$cdef = FFI::cdef("typedef char test;"); $cdata = $cdef->new("test"); $cdef = FFI::cdef("typedef int test;"); $cdata = $cdef->new("test");

An other possible issue is that this could lead to a high number of cached types due to types like $cdef->new('char[' . $len . ']').

You may need to include the cdef in the lookup key in some way, as str may depend on it.

yes. One of the PHPT tests already catches this problem. The work is in progress...

chopins · 2024-06-24T08:43:53Z

At present, the low performance of FFI\CData calculations and other operations is caused by conversion to PHP types and magic calls. If the value is simply assigned to CData, its performance is not inferior. So these problems can be avoided with good coding. The other thing is to avoid frequent type conversions by manipulating symbol overloads. I don't think it's a good idea to get a little bit of acceleration through JIT, and it would make the FFI API ugly

dstogov · 2024-06-24T08:54:05Z

At present, the low performance of FFI\CData calculations and other operations is caused by conversion to PHP types and magic calls. If the value is simply assigned to CData, its performance is not inferior.

Right. This is what JIT is doing to do.

I don't think it's a good idea to get a little bit of acceleration through JIT, and it would make the FFI API ugly

The current PoC shows 20 times speedup (see the example at the top).
This PR doesn't change PHP ext/ffi API at all.

chopins · 2024-06-24T09:51:30Z

Isn't it better to use class handles of do_operation. similar to GMP .
As discussed above, the FFI type needs to be clarified, so it is necessary to require access to the FFI type through an instance. I don't recommend accessing the FFI API through an instance.
FFI is not enable by default, so JIT may not be available

dstogov · 2024-06-24T10:10:07Z

Isn't it better to use class handles of do_operation. similar to GMP .

I don't understand what do you propose.
See the following PHP code:

$x = FFI::new("int[42]");
$y = $x[$i];

This PR translates the last line into 3 machine instructions

movq 0x60(%r14), %rcx       ; load Z_OBJ_P() from $x zval
movq 0x40(%rcx), %rcx       ; load CData->ptr (start of the array)
movl (%rcx, %rax, 4), %ecx  ; load element of the array (%rax contains value of $i)

How can you make this better with do_operation?

chopins · 2024-06-25T03:33:43Z

I want to optimize the zend_ffi_cdata_do_operation() function, but there is no good way to match the CData array.

I'm not against JIT improving performance, but I'm against changing the API to be less good because of the need to optimize performance. It is still necessary to make sure that PHP can write elegant and concise code.

dstogov · 2024-06-25T05:58:32Z

I'm not against JIT improving performance, but I'm against changing the API to be less good because of the need to optimize performance. It is still necessary to make sure that PHP can write elegant and concise code.

What kind of API changes do you mean? This PR doesn't change anything visible to PHP programmers.

chopins · 2024-06-25T06:33:52Z

The following PR is related to FFI JIT ?
4acf008#commitcomment-143451098

dstogov · 2024-06-25T06:58:32Z

The following PR is related to FFI JIT ?
4acf008#commitcomment-143451098

Not at all. I don't like it, and I think your last RFC may be a good solution.

github-actions bot added Extension: ffi Extension: opcache labels Jun 6, 2024

dstogov force-pushed the jit_ffi branch from 39774e1 to 6f9f0e1 Compare June 10, 2024 11:21

github-actions bot added the Category: Engine label Jun 10, 2024

dstogov force-pushed the jit_ffi branch from 6f9f0e1 to dd19210 Compare June 14, 2024 08:45

arnaud-lb reviewed Jun 14, 2024

View reviewed changes

dstogov force-pushed the jit_ffi branch from 6b021e4 to b569fca Compare June 17, 2024 12:50

dstogov referenced this pull request Jun 24, 2024

Deprecate calling FFI::cast(), FFI::new(), and FFI::type() statically

4acf008

dstogov force-pushed the jit_ffi branch from 45f1f50 to e4f49d7 Compare June 24, 2024 09:03

dstogov force-pushed the jit_ffi branch 3 times, most recently from a40ccea to a75bec6 Compare July 3, 2024 22:17

dstogov force-pushed the jit_ffi branch 2 times, most recently from 34f7504 to b6b4f34 Compare July 26, 2024 10:22

github-actions bot added the Category: Build System label Jul 26, 2024

dstogov added 28 commits September 19, 2024 15:13

Support for FFI pointer assignment

0c735ee

FFI JIT for FETCH_DIM/OBJ_W

473b9c4

Fix uninitialized data access

fdc2bcc

Fix calling convention

2b369b0

FFI JIT for FETCH_DIM/OBJ_W + ASSIGN_DIM

1fdf332

FFI JIT for FETCH_DIM_RW + ASSIGN_DIM_OP

45ded20

FFI JIT for PRE/POST_INC/DEC_OBJ

6ba17bb

Remove useless checks

137c869

Move FFI related JIT code generation into zend_jit_ir_ffi.c

44de6f4

Better JIT support for FFI function calls

5c46d94

JIT/FFI support for "cdata" property of scalar ffi objects

788ee84

Avoid modification of "const" CData

d5ae4da

Add JIT/FFI tests

9c73df9

Add SKIPIF sectionis and "stdout" recovery code

d29afc8

Reciver "stdout" before calling var_dump()

819f39d

Add EXTENSIONS section

2abbe8d

Fix SKIPIF sections

d688505

JIT/FFI support for closure calls (pointers to functions)

913c0cd

JIT/FFI cleanup arguments

c8c951b

FFI/JIT better support for enums

2c4d943

JIT/FFI add tet for closure call

4f55a31

JIT/FFI print FFI types in JIT debug traces

893e834

JIT/FFI support for temporary POINTER types (e.g. created by FFI::addr)

df1befe

JIT/FFI better support for temporary POINTER types

29278f9

Fixed support for lazy objects

c1a7a72

Fix register allocation

ed8f25e

Use signed comparison

68d9d2f

Improve register allocation

f59d92d

dstogov force-pushed the jit_ffi branch from 3af10b3 to f59d92d Compare September 19, 2024 14:53

Fix tests for 32-bit systems

528eec9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special JIT support for FFI #14491

Special JIT support for FFI #14491

dstogov commented Jun 6, 2024

dstogov commented Jun 6, 2024

bwoebi commented Jun 7, 2024

nielsdos commented Jun 7, 2024 •

edited

Loading

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 10, 2024

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 11, 2024

arnaud-lb Jun 14, 2024

dstogov Jun 17, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

		str = accel_find_interned_string(str);
		if (str && (str->gc.u.type_info & IS_STR_FFI_TYPE)) {

Special JIT support for FFI #14491

Are you sure you want to change the base?

Special JIT support for FFI #14491

Conversation

dstogov commented Jun 6, 2024

dstogov commented Jun 6, 2024

bwoebi commented Jun 7, 2024

nielsdos commented Jun 7, 2024 • edited Loading

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 10, 2024

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 11, 2024

arnaud-lb Jun 14, 2024

Choose a reason for hiding this comment

dstogov Jun 17, 2024

Choose a reason for hiding this comment

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

nielsdos commented Jun 7, 2024 •

edited

Loading