[PR] Add support for native memory compression and decompression #311

VladRodionov · 2024-04-27T22:20:18Z

This PR introduces support for handling native memory buffers that are allocated using the sun.misc.Unsafe.allocateMemory API. With this update, it is now possible to compress and decompress data between two native memory buffers, as well as transfer data from a byte array to native memory and vice versa.

Added support to compress and decompress native memory buffers

VladRodionov · 2024-04-27T22:22:46Z

This feature is essential for any application which works with off heap memory directly.

VladRodionov · 2024-04-27T22:36:16Z

Will add unit tests.

luben

LGTM. Left some nits on the code. Can you add some tests

luben · 2024-05-03T23:35:07Z

src/main/native/jni_fast_zstd.c

+    char *dst_buff = (char *) dst;
+    char *src_buff = (char *) src;


this should be void *

luben · 2024-05-03T23:36:07Z

src/main/native/jni_fast_zstd.c

+    if (NULL == (void *) dst) return -ZSTD_error_memory_allocation;
+    if (NULL == (void *) src) return -ZSTD_error_memory_allocation;


may be move these below, after you cast dst to dst_buff

luben · 2024-05-03T23:37:06Z

src/main/native/jni_fast_zstd.c

+    char *dst_buff = (char *) dst;
+    char *src_buff = (char *) src;


luben · 2024-05-03T23:37:14Z

src/main/native/jni_fast_zstd.c

+    if (NULL == (void *) dst) return -ZSTD_error_memory_allocation;
+    if (NULL == (void *) src) return -ZSTD_error_memory_allocation;


codecov · 2024-05-03T23:42:05Z

Codecov Report

Attention: Patch coverage is 0% with 54 lines in your changes missing coverage. Please review.

Project coverage is 57.88%. Comparing base (c76455c) to head (b904897).
Report is 10 commits behind head on master.

❗ Current head b904897 differs from pull request most recent head c75f02f

Please upload reports for the commit c75f02f to get more accurate results.

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #311      +/-   ##
============================================
- Coverage     60.01%   57.88%   -2.13%     
- Complexity      308      312       +4     
============================================
  Files            26       26              
  Lines          1473     1541      +68     
  Branches        170      186      +16     
============================================
+ Hits            884      892       +8     
- Misses          434      494      +60     
  Partials        155      155

VladRodionov · 2024-05-04T02:41:55Z

Sure, will add test this weekend. Thank you for the review @luben

luben · 2024-05-30T13:25:33Z

install-jar.sh

@@ -0,0 +1,27 @@
+#!/bin/bash
+


Can you make this filename more descriptive and put a comment on top? Or just don't include it in the commit?

I will remove this file from the commit. I had tried to add tests to this feature, but it has turned out that it's not that straightforward task. I had to enable sun.misc.Unsafe access to be able to allocate native memory and Scala (sbt) just makes this impossible (at least for me). I have zero experience in Scala and its tooling. In Java 9+ the access to this class is prohibited by default, so you have to specify additional command line args:
java --add-opens jdk.unsupported/sun.misc=ALL-UNNAMED --add-opens java.base/jdk.internal.misc=ALL-UNNAMED

For some reason this does not work with sbt. Probably I did something wrong.

Can you try to add these options to https://github.com/luben/zstd-jni/blob/master/build.sbt#L39 . I just pushed a change to run the tests in forked JVM so these options will apply.

I recently moved repo fork to a new owner (organization) - https://github.com/carrotdata/zstd-jni/tree/master. The current PR is in some kind of a zombie state after that. I am going to close this PR and open new one, @luben . What do you think?

install-jar.sh has been removed.

I recently moved repo fork to a new owner (organization) - https://github.com/carrotdata/zstd-jni/tree/master. The current PR is in some kind of a zombie state after that. I am going to close this PR and open new one, @luben . What do you think?

Sure, we can continue on a new PR

It looks like we can keep the old one. I removed 'install-jar.sh and rebased it to the luben:master.

I just pushed a change to run the tests in forked JVM so these options will apply.

Hmm, it was not so easy - forking broke builds on Windows and Mac OS, not sure why. So I reverted that

luben · 2024-06-05T12:06:24Z

These binary files should not be checked in git - I re-build them on each supported platform for each release.

VladRodionov · 2024-06-05T19:01:18Z

Files have been removed.

joakime · 2024-08-05T18:43:21Z

Basing anything off sun.misc.Unsafe behavior is not a good idea anymore. It has been deprecated since 2006.
There are 2 active JEPs that are almost done with their implementations and rollout in OpenJDK.

https://openjdk.org/jeps/471 - Deprecate the Memory-Access Methods in sun.misc.Unsafe for Removal - (already present in JDK 23 releases)
https://openjdk.org/jeps/472 - Prepare to Restrict the Use of JNI - (present in JDK 24-ea releases now)

VladRodionov · 2024-08-05T21:05:03Z

It s long way to go until all Java code with direct sun.misc.Unsafe access will be ported to JDK 21+ (Java FFM), meanwhile we need to support JDK 11+ at least. Performance - wise Unsafe is still the champion, at least for direct memory access.

joakime · 2024-08-05T21:09:16Z

Performance wise, Unsafe no longer wins.
Eclipse Jetty removed Unsafe a few year ago, and the various performance metrics has improved.

VladRodionov · 2024-08-05T23:53:26Z

Jetty? Can it handle 500K+ RPS out of the box? Really doubt :). JFF is finally on par with JNI or slightly better, but for direct memory access and manipulations of bits and bytes outside of Java heap, Unsafe is the champ. And you missed my reqs - JDK 11+ support (actually Java 8+).
Java 2024 report - almost 30% are still using Java 8, the rest - Java 11 and Java 17, all of them are missing JFFM support.

joakime · 2024-08-06T13:12:26Z

500K+ requests per second is not hard to do.
You have to be mindful of network saturation in regards to request/response size and optional http details.

This has been done on an official release of Eclipse Jetty 10, and Jetty 11, and Jetty 12 servers (all of which do not have Unsafe operations anymore).

The setup is as follows ...

The requests themselves should be sized to not overload whatever network limits you have. (so no testing of 1MB payloads!). I usually just pick a small 1 line quote from somebody famous (like Mark Twain) as the payload on the response (or request, depends on what I'm testing)
Test the server on a physical machine (not a vm or docker or cloud).
A sufficient number of separate physical clients to generate the load (about 5 clients per 1 server being tested to start, scale up the number clients slowly based network saturation at server and switch, which in usual testing at home with my equipment is about 320MB/s)
Clients MUST be on separate machines size, quality, and speed of machines is mostly irrelevant.
Clients MUST read/write fully and follow the HTTP/1.1 spec.
Clients SHOULD use HTTP/1.1 with persistent connections. (this turns into a MUST for TLS)
Clients SHOULD be configured for minimum headers (eg: drop Accept, User-Agent headers. If you can end up with just a method, path, and host that's the optimal setup for raw requests per second)
Disable server logging (even for console).
Default server configuration of http and server modules is usually sufficient to hit 400K second. You can cross the 500K second threshold by turning off various http features (example: turn off the production of the Server header, and Date header).
On rare occasions, depending on the payloads the max threads should be bumped up.
Server side request/response exchange handling should not have a code path that depends on filesystem I/O. (far too slow for this kind of testing)

This setup results in sub 50 byte requests, and sub 200 byte responses (or about 120 bytes for request on network, and 280 bytes for response on network), which is only really useful for load testing the server for requests per second and latency metrics.

When I monitor (with something like wireshark) with 1 client to confirm its setup, I'm looking at the total bytes on the network and wanting something sub 400 bytes per request/response exchange and no FIN (we should be using persistent connections).

Hitting 510k requests per second is very attainable on a 10GbE network against a Jetty Server with a decent networking interface (some crappy 10GbE interfaces cannot get close to even 20% saturation).

Java 8 went EOSL in many contexts already. (eg: google cloud dropped it Jan 2024)
Many Java 11 providers have it going EOSL at the end of this year too (eg: redhat in october, google in december)

VladRodionov · 2024-08-06T19:17:54Z

https://medium.com/deno-the-complete-reference/netty-vs-jetty-hello-world-performance-e9ce990a9294

far from 510K RPS. May be its attainable, may be its - not. Not, I presume. Any Java network server which utilizes any type of thread pool executors will be handicapped due to significant thread context switch overhead. You are free to share links, which confirm, that 510K RPS is attainable for Jetty. I have not managed to find any proof of that statement, quite contrary, I found many benchmarks with a very abysmal performance and latency numbers. I am the developer of the Memcarrot - memcached-compatible caching server, written in Java with a heavy dosage of sun.misc.Unsafe. All memory management is manual (malloc(), free()). The server can run in less than 100MB of java heap while storing hundreds of millions of cached objects. Below are yesterday's test results (standard testing tool - memtier_benchmark was used):

parallels@ubuntu-linux-22-04-02-desktop:~$ memtier_benchmark -p 11211 -P memcache_text --test-time=100
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%, 100 secs]  0 threads:    53594005 ops,  535766 (avg:  535922) ops/sec, 15.35MB/sec (avg: 15.31MB/sec),  0.37 (avg:  0.37) msec latency

4         Threads
50        Connections per thread
100       Seconds


ALL STATS
============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets        48721.12          ---          ---         0.37491         0.35900         0.66300         0.93500      3325.15 
Gets       487201.32       634.00    486567.32         0.37314         0.35900         0.66300         0.94300     12355.39 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     535922.44       634.00    486567.32         0.37331         0.35900         0.66300         0.94300     15680.54

This is 535K RPS with p99.9 latency less than 1ms. These numbers are within 5% of native memcached. The test have been run on Mac Studio M1 (64GB RAM).

Other benchmark results (memory consumption, surprise, surprise) are here:
https://github.com/carrotdata/membench

Memcarrot will be released next week. sun.misc.Unsafe made it possible. This is why we need direct access to off heap memory and I am not sure that the code can we rewritten with JFFM API.

joakime · 2024-08-06T22:34:55Z

https://medium.com/deno-the-complete-reference/netty-vs-jetty-hello-world-performance-e9ce990a9294

An unconfigured Jetty and testing on the same machine, that person just tested the performance of their localhost network stack, nothing else. That is a horrible set of tests and doesn't test performance of Jetty.
Using jetty-maven-plugin:run which is focused on developer needs by its configuration, not performance.
The configuration they used also did zero for tuning the http exchange.
I bet their Jetty server was barely being used, they simply couldn't generate enough load (a super common scenario when attempting to load test on the same machine).

Any Java network server which utilizes any type of thread pool executors will be handicapped due to significant thread context switch overhead.

Jetty doesn't use native JVM thread pool executors, it's got it's own and a EatWhatYouKill model that minimizes thread context switching, we even see improvements on CPU caching with this model.

When we participated in the TechEmpower benchmarks years ago (back in Jetty 10.0.0 days) we were consistently in the top 5%, and when we learned the tricks of the those above us we could easily get into the top 3%, but those tricks were not representing real world scenarios.

joakime · 2024-08-06T22:56:30Z

I am the developer of the Memcarrot - memcached-compatible caching server, written in Java with a heavy dosage of sun.misc.Unsafe. All memory management is manual (malloc(), free()). The server can run in less than 100MB of java heap while storing hundreds of millions of cached objects. Below are yesterday's test results (standard testing tool - memtier_benchmark was used):

Congrats, that's a really fantastic outcome.

Anyway, this is devolved into a totally different set of arguments.
Do what you want. It is your repo after all.

Eclipse Jetty just has to monitor how the new JVMs react to our usage of the current state of zstd-jni. (so far it looks like we have to, at a minimum, document the demands that zstd-jni put to ByteBufferPool implementation, and the JVM command line switches necessary to allow zstd-jni to function.)

Update README to include custom build steps

luben · 2024-08-12T12:23:36Z

lib/darwin/aarch64/libzstd-jni-1.5.6-3.dylib

please, don't put binaries in the source code.

VladRodionov added 2 commits April 27, 2024 15:08

added support for native memory compression

11c1c8f

Merge branch 'master' of https://github.com/VladRodionov/zstd-jni

909ad36

Added support to compress and decompress native memory buffers

luben reviewed May 3, 2024

View reviewed changes

added artifact installation script to a local maven repo

b904897

VladRodionov force-pushed the master branch from e3c6141 to b904897 Compare May 22, 2024 21:33

luben reviewed May 30, 2024

View reviewed changes

VladRodionov added 8 commits May 31, 2024 16:54

remove instal.sh

e9da410

Merge branch 'luben:master' into master

04375ef

to support multiplatform builds

05f759c

Merge branch 'luben:master' into master

5ef09e1

Merge branch 'master' of https://github.com/carrotdata/zstd-jni

1c4328d

to support multiplatform build

7f40ab8

to support multiplatform builds

a471754

to support multiplatform builds

ce41492

removed support binaries

c75f02f

VladRodionov and others added 3 commits July 21, 2024 17:30

added instal_jar.sh

f1b1e94

add os dependent script

5e12c5b

Fix grammatical error

cd407ee

luben force-pushed the master branch from d050a19 to 76acf0f Compare July 22, 2024 13:07

VladRodionov added 6 commits August 9, 2024 13:22

Update README.md

a63e0c0

Update README to include custom build steps

add repo for platform binaries

4a6c263

remove lib

b3565ad

multiplatform support libraries

7718d7a

Merge branch 'master' of https://github.com/carrotdata/zstd-jni

9e183e3

multiplatform jar with libraries

3ad6ec4

luben reviewed Aug 12, 2024

View reviewed changes

lib/darwin/aarch64/libzstd-jni-1.5.6-3.dylib Outdated

Copy link

Owner

luben Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, don't put binaries in the source code.

VladRodionov and others added 9 commits August 28, 2024 13:15

Merge branch 'luben:master' into master

083a70f

Linux/aarch version with glibc 2.39 dependency

702877a

Merge branch 'master' of https://github.com/carrotdata/zstd-jni

5cc0d96

linux/amd64 with glibc 2.31

31e76d2

upgrade to 1.5.6-4

637d8e6

upgrade to 1.5.6-4

f77bab5

upgrade to 1.5.6-4

808b7c0

upgrade to 1.5.6-4

645e431

Update README.md

7596eb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PR] Add support for native memory compression and decompression #311

[PR] Add support for native memory compression and decompression #311

VladRodionov commented Apr 27, 2024

VladRodionov commented Apr 27, 2024

VladRodionov commented Apr 27, 2024

luben left a comment

luben May 3, 2024

luben May 3, 2024

luben May 3, 2024

luben May 3, 2024

codecov bot commented May 3, 2024 •

edited

Loading

VladRodionov commented May 4, 2024

luben May 30, 2024

VladRodionov May 30, 2024 •

edited

Loading

luben May 31, 2024

VladRodionov Jun 1, 2024 •

edited

Loading

VladRodionov Jun 1, 2024

luben Jun 1, 2024

VladRodionov Jun 1, 2024 •

edited

Loading

luben Jun 18, 2024 •

edited

Loading

luben commented Jun 5, 2024

VladRodionov commented Jun 5, 2024

joakime commented Aug 5, 2024

VladRodionov commented Aug 5, 2024

joakime commented Aug 5, 2024

VladRodionov commented Aug 5, 2024

joakime commented Aug 6, 2024 •

edited

Loading

VladRodionov commented Aug 6, 2024 •

edited

Loading

joakime commented Aug 6, 2024

joakime commented Aug 6, 2024

luben Aug 12, 2024

		char dst_buff = (char ) dst;
		char src_buff = (char ) src;

		if (NULL == (void *) dst) return -ZSTD_error_memory_allocation;
		if (NULL == (void *) src) return -ZSTD_error_memory_allocation;

[PR] Add support for native memory compression and decompression #311

Are you sure you want to change the base?

[PR] Add support for native memory compression and decompression #311

Conversation

VladRodionov commented Apr 27, 2024

VladRodionov commented Apr 27, 2024

VladRodionov commented Apr 27, 2024

luben left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 3, 2024 • edited Loading

Codecov Report

VladRodionov commented May 4, 2024

Choose a reason for hiding this comment

VladRodionov May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VladRodionov Jun 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VladRodionov Jun 1, 2024 • edited Loading

Choose a reason for hiding this comment

luben Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

luben commented Jun 5, 2024

VladRodionov commented Jun 5, 2024

joakime commented Aug 5, 2024

VladRodionov commented Aug 5, 2024

joakime commented Aug 5, 2024

VladRodionov commented Aug 5, 2024

joakime commented Aug 6, 2024 • edited Loading

VladRodionov commented Aug 6, 2024 • edited Loading

joakime commented Aug 6, 2024

joakime commented Aug 6, 2024

Choose a reason for hiding this comment

codecov bot commented May 3, 2024 •

edited

Loading

VladRodionov May 30, 2024 •

edited

Loading

VladRodionov Jun 1, 2024 •

edited

Loading

VladRodionov Jun 1, 2024 •

edited

Loading

luben Jun 18, 2024 •

edited

Loading

joakime commented Aug 6, 2024 •

edited

Loading

VladRodionov commented Aug 6, 2024 •

edited

Loading