Details
-
Wish
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.22
-
None
-
None
-
java 11.0.2 (openjdk )
tested on both Windows 10 and linux (Ubuntu 20.04)
Description
Hello,
I've been using the snappy format as a way to quickly compress/decompress json files, and have been using the
FramedSnappyCompressorOutputStream and
FramedSnappyCompressorInputStream provided by Apache Compress to do so since I already had several dependencies to apache.compress module.
Although the compression/decompression works fine for every file, feedback regarding performance issues for large files started to emerge.
The performance of these streams was very underwhelming upon testing.
The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was taking 2minutes, which is far from the expected perfomances of a snappy stream which "[...] does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.".
Switching to xerial/snappy-java 's Framed IO Streams reduced the compression/decompression times by two orders of magnitude.
Running the same code in the provided Tools.java through a maven command took 1.5sec by replacing the Stream implementation to org.xerial.snappy.SnappyFramedInputStream , versus a consistent 125+secs with FramedSnappyCompressorInputStream.
Since it's not a bug, i'm not flagging this ticket as such but it makes the usage of the apache compress library pointless for that format, and even counter-productive.
Having performances up to par with other implementations, or the decompressor to be deprecated would be greatly appreciated.
I've tried to upload the aforementionned file, but Jira refuses to take as the direct upload limit is 60mb. I should however be able to provide a 40-ish mb file if necessary.
Best Regards,
Mehdi Ennaïme