Merge branch 'simd/info' into simd/4.3.x

uploadcare · Oct 4, 2017 · 5049c5d · 5049c5d
2 parents d87f2d6 + b1127d9
commit 5049c5d
Show file tree

Hide file tree

Showing 6 changed files with 239 additions and 72 deletions.
diff --git a/CHANGES.SIMD.rst b/CHANGES.SIMD.rst
@@ -0,0 +1,101 @@
+Changelog (Pillow-SIMD)
+=======================
+
+4.3.0.post0
+-----------
+
+- Float-based filters, single-band: 3x3 SSE4, 5x5 SSE4
+- Float-based filters, multi-band: 3x3 SSE4 & AVX2, 5x5 SSE4
+- Int-based filters, multi-band: 3x3 SSE4 & AVX2, 5x5 SSE4 & AVX2
+- Box blur: fast path for radius < 1
+- Alpha composite: fast div approximation
+- Color conversion: RGB to L SSE4, fast div in RGBa to RGBA
+- Resampling: optimized coefficients loading
+- Split and get_channel: SSE4
+
+3.4.1.post1
+-----------
+
+- Critical memory error for some combinations of source/destinatnion 
+  sizes is fixed.
+
+3.4.1.post0
+-----------
+
+- A lot of optimizations in resampling including 16-bit
+  intermediate color representation and heavy unrolling.
+
+3.3.2.post0
+-----------
+
+- Maintenance release
+
+3.3.0.post2
+-----------
+
+- Fixed error in RGBa -> RGBA conversion
+
+3.3.0.post1
+-----------
+
+Alpha compositing
+~~~~~~~~~~~~~~~~~
+
+- SSE4 and AVX2 fixed-point full loading implementation.
+  Up to 4.6x faster.
+
+3.3.0.post0
+-----------
+
+Resampling
+~~~~~~~~~~
+
+- SSE4 and AVX2 fixed-point full loading horizontal pass.
+- SSE4 and AVX2 fixed-point full loading vertical pass.
+
+Conversion
+~~~~~~~~~~
+
+- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations.
+  Up to 2.6x faster.
+- RGBa -> RGBA AVX2 implementation using gather instructions.
+  Up to 5x faster.
+
+
+3.2.0.post3
+-----------
+
+Resampling
+~~~~~~~~~~
+
+- SSE4 and AVX2 float full loading horizontal pass.
+- SSE4 float full loading vertical pass.
+
+
+3.2.0.post2
+-----------
+
+Resampling
+~~~~~~~~~~
+
+- SSE4 and AVX2 float full loading horizontal pass.
+- SSE4 float per-pixel loading vertical pass.
+
+
+2.9.0.post1
+-----------
+
+Resampling
+~~~~~~~~~~
+
+- SSE4 and AVX2 float per-pixel loading horizontal pass.
+- SSE4 float per-pixel loading vertical pass.
+- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling.
+- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling.
+
+
+Box blur
+~~~~~~~~
+
+- Simple SSE4 fixed-point implementations with per-pixel loading.
+- Up to 2.1x faster.
diff --git a/PIL/version.py b/PIL/version.py
@@ -1,2 +1,2 @@
 # Master version for Pillow
-__version__ = '4.3.0'
+__version__ = '4.3.0.post0'
diff --git a/PyPI.rst b/PyPI.rst
@@ -0,0 +1,6 @@
+
+`Pillow-SIMD repo and readme <https://github.com/uploadcare/pillow-simd>`_
+
+`Pillow-SIMD changelog <https://github.com/uploadcare/pillow-simd/blob/simd/3.4.x/CHANGES.SIMD.rst>`_
+
+`Pillow documentation <https://pillow.readthedocs.io/>`_
diff --git a/README.md b/README.md
@@ -0,0 +1,126 @@
+# Pillow-SIMD
+
+Pillow-SIMD is "following" [Pillow][original-docs].
+Pillow-SIMD versions are 100% compatible
+drop-in replacements for Pillow of the same version.
+For example, `Pillow-SIMD 3.2.0.post3` is a drop-in replacement for
+`Pillow 3.2.0`, and  `Pillow-SIMD 3.3.3.post0` — for `Pillow 3.3.3`.
+
+For more information on the original Pillow, please refer to:
+[read the documentation][original-docs],
+[check the changelog][original-changelog] and
+[find out how to contribute][original-contribute].
+
+
+## Why SIMD
+
+There are multiple ways to tweak image processing performance.
+To name a few, such ways can be: utilizing better algorithms, optimizing existing implementations, 
+using more processing power and/or resources. 
+One of the great examples of using a more efficient algorithm is [replacing][gaussian-blur-changes] 
+a convolution-based Gaussian blur with a sequential-box one.
+
+Such examples are rather rare, though. It is also known, that certain processes might be optimized 
+by using parallel processing to run the respective routines.
+But a more practical key to optimizations might be making things work faster 
+using the resources at hand. For instance, SIMD computing might be the case.
+
+SIMD stands for "single instruction, multiple data" and its essence is 
+in performing the same operation on multiple data points simultaneously 
+by using multiple processing elements. 
+Common CPU SIMD instruction sets are MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
+
+Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) or AVX2 support.
+
+
+## Status
+
+Pillow-SIMD project is production-ready.
+The project is supported by Uploadcare, a SAAS for cloud-based image storing and processing.
+
+[![Uploadcare][uploadcare.logo]][uploadcare.com]
+
+In fact, Uploadcare has been running Pillow-SIMD for about two years now.
+
+The following image operations are currently SIMD-accelerated:
+
+- Resize (convolution-based resampling): SSE4, AVX2
+- Gaussian and box blur: SSE4
+- Alpha composition: SSE4, AVX2
+- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
+- RGBa → RGBA (division by alpha): AVX2
+
+See [CHANGES](CHANGES.SIMD.rst) for more information.
+
+
+## Benchmarks
+
+Tons of tests can be found on the [Pillow Performance][pillow-perf-page] page.
+There are benchmarks against different versions of Pillow and Pillow-SIMD
+as well as ImageMagick, Skia, OpenCV and IPP.
+
+The results show that for resizing Pillow is always faster than ImageMagick, 
+Pillow-SIMD, in turn, is even faster than the original Pillow by the factor of 4-6. 
+In general, Pillow-SIMD with AVX2 is always **16 to 40 times faster** than 
+ImageMagick and outperforms Skia, the high-speed graphics library used in Chromium.
+
+
+## Why Pillow itself is so fast
+
+No cheats involved. We've used identical high-quality resize and blur methods for the benchmark. 
+Outcomes produced by different libraries are in almost pixel-perfect agreement. 
+The difference in measured rates is only provided with the performance of every involved algorithm. 
+
+
+## Why Pillow-SIMD is even faster
+
+Because of the SIMD computing, of course. But there's more to it: 
+heavy loops unrolling, specific instructions, which aren't available for scalar data types.
+
+
+## Why do not contribute SIMD to the original Pillow
+
+Well, it's not that simple. First of all, the original Pillow supports 
+a large number of architectures, not just x86.
+But even for x86 platforms, Pillow is often distributed via precompiled binaries.
+In order for us to integrate SIMD into the precompiled binaries 
+we'd need to execute runtime CPU capabilities checks.
+To compile the code this way we need to pass the `-mavx2` option to the compiler.
+But with the option included, a compiler will inject AVX instructions even
+for SSE functions (i.e. interchange them) since every SSE instruction has its AVX equivalent.
+So there is no easy way to compile such library, especially with setuptools.
+
+
+## Installation
+
+If there's a copy of the original Pillow installed, it has to be removed first
+with `$ pip uninstall -y pillow`.
+The installation itself is simple just as running `$ pip install pillow-simd`, 
+and if you're using SSE4-capable CPU everything should run smoothly.
+If you'd like to install the AVX2-enabled version, 
+you need to pass the additional flag to a C compiler. 
+The easiest way to do so is to define the `CC` variable during the compilation.
+
+```bash
+$ pip uninstall pillow
+$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
+```
+
+
+## Contributing to Pillow-SIMD
+
+Please be aware that Pillow-SIMD and Pillow are two separate projects.
+Please submit bugs and improvements not related to SIMD to the [original Pillow][original-issues].
+All bugfixes to the original Pillow will then be transferred to the next Pillow-SIMD version automatically.
+
+
+  [original-homepage]: https://python-pillow.org/
+  [original-docs]: https://pillow.readthedocs.io/
+  [original-issues]: https://github.com/python-pillow/Pillow/issues/new
+  [original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst
+  [original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md
+  [gaussian-blur-changes]: https://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask
+  [pillow-perf-page]: https://python-pillow.org/pillow-perf/
+  [pillow-perf-repo]: https://github.com/python-pillow/pillow-perf
+  [uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd
+  [uploadcare.logo]: https://ucarecdn.com/dc4b8363-e89f-402f-8ea8-ce606664069c/-/preview/
diff --git a/README.rst b/README.rst
diff --git a/setup.py b/setup.py
@@ -113,7 +113,7 @@ def get_version():
     # pypy emits an oserror
     _tkinter = None
 
-NAME = 'Pillow'
+NAME = 'Pillow-SIMD'
 PILLOW_VERSION = get_version()
 JPEG_ROOT = None
 JPEG2K_ROOT = None
@@ -618,7 +618,8 @@ def build_extensions(self):
         exts = [(Extension("PIL._imaging",
                            files,
                            libraries=libs,
-                           define_macros=defs))]
+                           define_macros=defs,
+                           extra_compile_args=['-msse4']))]
 
         #
         # additional libraries
@@ -754,10 +755,10 @@ def debug_build():
     setup(name=NAME,
           version=PILLOW_VERSION,
           description='Python Imaging Library (Fork)',
-          long_description=_read('README.rst').decode('utf-8'),
+          long_description=_read('PyPI.rst').decode('utf-8'),
           author='Alex Clark (Fork Author)',
           author_email='aclark@aclark.net',
-          url='https://python-pillow.org',
+          url='https://github.com/uploadcare/pillow-simd',
           classifiers=[
               "Development Status :: 6 - Mature",
               "Topic :: Multimedia :: Graphics",