Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test, demonstrate, and document using zlib-ng for faster (but backward compatible) compression/decompression #2022

Closed
edwardhartnett opened this issue Jun 26, 2021 · 17 comments

Comments

@edwardhartnett
Copy link
Contributor

edwardhartnett commented Jun 26, 2021

This is part of #1545.

There is a new zlib library, zlib-ng: https://github.com/zlib-ng/zlib-ng

This is a drop-in replacement for zlib (when correctly configured). But much faster. And it's fully backward compatible (so they say). That is, we can use zlib-ng and the resulting compressed data can be read by existing zlib releases. (Though reading is also faster if zlib-ng is used there as well).

This should therefore work transparently in netcdf-java.

The goal is to make whatever modifications in the cmake and autotools build systems to support zlib-ng. It will also need to be tested.

I believe very little if any changes will be necessary, but of course this needs to be thoroughly tested. Then it needs to be explained to netCDF users, and the results demonstrated.

@WardF
Copy link
Member

WardF commented Jun 26, 2021

Interesting! I wasn’t aware of this library/replacement for zlib. Occupied with family stuff but will dig into this.

@edwardhartnett
Copy link
Contributor Author

This plus #1548 could be a real game-changer for large data producers, speeding write times by an order of magnitude, while increasing compression dramatically.

@gsjaardema
Copy link
Contributor

I've been using it for several weeks and it is definitely faster. I didn't have to change anything in HDF5 or NetCDF to use it.

@edwardhartnett edwardhartnett changed the title modify build systems and add tests to take advantage of zlib-ng for faster (but backward compatible) compression/decompression test, demonstrate, and document using zlib-ng for faster (but backward compatible) compression/decompression Jun 26, 2021
@DennisHeimbigner
Copy link
Collaborator

Greg- how did you do that; did you just rename the .so file?

@DennisHeimbigner
Copy link
Collaborator

BTW thanks Ed for finding all these filter improvements. Really appreciated.

@edwardhartnett
Copy link
Contributor Author

image

@gsjaardema
Copy link
Contributor

gsjaardema commented Jun 28, 2021

@DennisHeimbigner The zlib-ng build process has a -DZLIB_COMPAT=YES cmake variable which causes it to install its library as libz:

-rwxr-xr-x  1 gdsjaar  1049671531   157608 Jun 21 16:27 ../lib/libz.1.2.11.zlib-ng.dylib
lrwxr-xr-x  1 gdsjaar  1049671531       25 Jun 21 16:27 ../lib/libz.1.dylib -> libz.1.2.11.zlib-ng.dylib
-rw-r--r--  1 gdsjaar  1049671531   163352 Jun 21 16:27 ../lib/libz.a
lrwxr-xr-x  1 gdsjaar  1049671531       12 Jun 21 16:27 ../lib/libz.dylib -> libz.1.dylib

So all that needs to be done is to point to the correct directory and everything works as it should (just faster)

@edwardhartnett
Copy link
Contributor Author

@gsjaardema do you know how fast?

Also @gsjaardema , @DennisHeimbigner , @WardF, anyone interested in co-authoring a paper about this to AGU this year? We need to get this news out to the peeps, and an AGU extended abstract and poster seem like a great way to do it.

What I would love to do is implement the bit-grooming, and then demonstrate the value of both big-grooming and zlib-ng in a poster/paper which can then be distributed to any who are interested...

@edwardhartnett
Copy link
Contributor Author

(@gsjaardema send me your email and I will send draft abstract...)

@gsjaardema
Copy link
Contributor

These are some very quick test results.

old = system zlib,
new = zlib-ng; c0 -- no compression, c1 and c9 are level 1 and 9

Time is seconds to write a 100,000,000 element generated mesh. Note that for zlib-ng
the compress time is less than non-compressed (repeatable). Admittedly this is a easily compressed
generated mesh (very regular), but I also get speedups on realistic meshes with output variables.

Compress read speed is also faster.

  • old c0 28.872

  • old c1 34.475 (5.603 slower than c0)

  • old c9 52.212 (23.34 slower than c0)

  • new c0 29.828

  • new c1 24.007 (5.821 faster than c0) 10.468 faster than old c1

  • new c9 30.521 (0.693 slower than c0), 21.691 faster than old c9

@gsjaardema
Copy link
Contributor

@edwardhartnett For email, just add @gmail.com to my username...

@WardF
Copy link
Member

WardF commented Jun 28, 2021

Interesting; @gsjaardema, thanks for the information. We can update the documentation re: configuring the new library so that it acts as a drop-in replacement, and it's nice we won't have to make modifications to the build systems. @edwardhartnett I'd be interested in contributing, sure. I agree this is information we need to get out there.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Jun 30, 2021

@gsjaardema awesome results!

The speedup of writing compressed data is no doubt because the compression is now happening so much faster, and then there's less data to be written, also speeding up write times. As a result, we get a faster time writing compressed than uncompressed data.

NOAA will be DELIGHTED with these results! Other large data producers, like NASA and ESA will be similarly happy.

image

@dopplershift
Copy link
Member

@gsjaardema What kind of storage were you writing to? (e.g. spinning disk, ssd, NVMe, network, etc.)

@gsjaardema
Copy link
Contributor

@dopplershift It was spinning disk.

@dopplershift
Copy link
Member

Thanks. Implies the numbers above, while awesome, are definitely a best case scenario (WRT compressed writing being faster anyway).

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Jan 1, 2022

OK, I think this issue can be closed.

zlib-ng works well, and does indeed function as a drop-in replacement for Zlib. We have tested this on several HPC systems, with the UFS and other software, as well as dedicated testing within the netcdf-c nc_perf directory.

Zlib-ng produces data that can also easily and transparently be read by zlib. In other words, it is fully backward compatible. Data producers can use zlib-ng even if their users are still using zlib.

Here's a typical performance chart comparing zlib to other compression methods. Notice zlib-ng is about twice as fast as zlib:

image

I recommend that all data producers switch to zlib-ng for better performance, with zero code changes. This will be our recommendation to the NOAA UFS team, and I suspect they will approve of this upgrade to performance.

For more detail on our recent compression studies, see our AGU extended abstract: https://www.researchgate.net/publication/357001251_Quantization_and_Next-Generation_Zlib_Compression_for_Fully_Backward-Compatible_Faster_and_More_Effective_Data_Compression_in_NetCDF_Files

This paper from Nature is also highly relevant and interesting: "Compressing atmospheric data into its real information content
Milan Klöwer, Miha Razinger, Juan J. Dominguez, Peter D. Düben & Tim N. Palmer": https://www.nature.com/articles/s43588-021-00156-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants