Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CFFI for the pixel access object #248

Closed
wiredfool opened this issue Jun 14, 2013 · 14 comments · Fixed by #476
Closed

Use CFFI for the pixel access object #248

wiredfool opened this issue Jun 14, 2013 · 14 comments · Fixed by #476
Milestone

Comments

@wiredfool
Copy link
Member

Hoisted from #67, post-close, with additional brain dumping.

@fijal said:

If you need to call, call it with cffi (C functions directly) instead of C API. That would be much better and should be reasonably fast on pypy. Pure python is obviously better (you avoid function calls), but cffi is definitely next good, if the image does not export any way to get to the raw data. to be perfectly clear - I doubt there is no way to access raw data one way or another. The ideal way would be to get the raw data pointer and access it via cffi.

The crux that I see is that (IIRC) the data isn't necessarily in a continuous block. If it were, the raw data pointer would be easy.

The backbrain is thinking that the solution might be related to checking the strides from #224. If the data's not continuous, we can fall back to slow access.

If a cffi approach would work, we'd need to require cffi and libffi-dev, but it's really only necessary on pypy, and anyone doing this on pypy would likely be ok an additional dependency.

Docs are here: http://cffi.readthedocs.org/en/latest/index.html
It looks like this is what we'd need:

ffi.buffer(cdata, [size]): return a buffer object that references the raw C data pointed to by the given ‘cdata’, of ‘size’ bytes. The ‘cdata’ must be a pointer or an array.

So, Assuming that I can get a pointer from something like PyLong_FromVoidPtr, this should give higher performance access to the pixel_access_object.

@aclark4life
Copy link
Member

Cool 👍

@aclark4life
Copy link
Member

Is this a 2.2.0 or some other milestone target?

@aclark4life
Copy link
Member

@wiredfool Will this happen for 2.3.0?

@wiredfool
Copy link
Member Author

Maybe. I think I know what has to happen.

@aclark4life
Copy link
Member

Last call for 2.3.0!

@wiredfool
Copy link
Member Author

Ok, made some progress on this one. I'm able to read and write one pixel of an rgb image through cffi without segfaulting or going through the _imaging c API for the get pixel/put pixel part.

wiredfool added a commit to wiredfool/Pillow that referenced this issue Jan 5, 2014
@wiredfool
Copy link
Member Author

Still some undones, but it nominally works.

@aclark4life
Copy link
Member

Time to merge?

@wiredfool
Copy link
Member Author

No. This demonstrates that it's possible. It's not complete, one of the tests is failing, documentation is iffy at best, and it hasn't proved that's it's any faster than the old version either. I don't expect that it'll be slower, it just hasn't been tested.

@wiredfool
Copy link
Member Author

The crux on this was that the ffi.buffer call returns new memory that it owns, and duplicating the memory is a non-starter for the pixel access methods. The bit that I was missing is that you can use ffi.cast('foo *', a_pointer) and use the already allocated memory at a_pointer* as a foo.That gets us everything we need, so long as we can get the pointer out of the c-api side, and that can be done with a 'n' format in Py_BuildValue.

This patch can be cut down pretty dramatically, as we probably don't need most of the parts of the ImageMemoryInstance object, we're just using xsize, ysize, and one of the image8/image32 pointers. (and possibly the image pointer). We can avoid creating an ImagingMemoryInstance object at all, and just use the one pointer that we need.

@wiredfool
Copy link
Member Author

Benchmarks:
pypy2.2

(vpypy22)erics:~/Pillow$ python Tests/bench_cffi_access.py --installed
Size: 128x128
PyAccess - get: 0.0087 s
PyAccess - set: 0.0227 s
C-api - get: 0.1372 s
C-api - set: 0.2737 s

py2.7:

(vpy27)erics:~/Pillow$ python Tests/bench_cffi_access.py --installed
Size: 128x128
PyAccess - get: 0.0285 s
PyAccess - set: 0.0475 s
C-api - get: 0.0027 s
C-api - set: 0.0056 s
ok

So, a win for pypy, not so much for c-python.

@fijal
Copy link

fijal commented Jan 6, 2014

Those times are way too small for PyPy - please run it in a loop so the total is at least a second

@wiredfool
Copy link
Member Author

Ok, longer, running 5k iterations or 10 seconds:
pypy2.2

(vpypy22)erics:~/Pillow$ python Tests/bench_cffi_access.py --installed
Size: 128x128
PyAccess - get: 0.7001 s  0.000140 per iteration
PyAccess - set: 1.3735 s  0.000275 per iteration
C-api - get: breaking at 82 iterations, 0.122830 per iteration
C-api - set: breaking at 37 iterations, 0.273431 per iteration

py2.7

(vpy27)erics:~/Pillow$ python Tests/bench_cffi_access.py --installed
Size: 128x128
PyAccess - get: breaking at 454 iterations, 0.022029 per iteration
PyAccess - set: breaking at 217 iterations, 0.046280 per iteration
C-api - get: breaking at 3600 iterations, 0.002779 per iteration
C-api - set: breaking at 1757 iterations, 0.005694 per iteration

So, pypy is 2 OOM faster when running longer? I wonder if the JIT is optimizing something important out of the loop, since the entire benchmark reduces to a bunch of noops.

@wiredfool
Copy link
Member Author

And while I've got benchmarks on the brain, I'll note that test_image_point.py is really slow on pypy (2m6.9s) and not on cpython (.6 seconds).

radarhere pushed a commit to radarhere/Pillow that referenced this issue Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants