Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.26.0
-
None
-
None
-
Commons compress 1.26.0 to get a failure. Any tar tgz.
Description
Something in https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master seems to make iterating through the tar entries of multiple
TarArchiveInputStreams throw Corrupted TAR archive:
@Test void bla() { ExecutorService executorService = Executors.newFixedThreadPool(10); List<CompletableFuture<Void>> tasks = IntStream.range(0, 200) .mapToObj(_idx -> CompletableFuture.runAsync( () -> { try (InputStream inputStream = this.getClass() .getResourceAsStream( "/<your favourite tar tgz>"); TarArchiveInputStream tarInputStream = new TarArchiveInputStream(new GZIPInputStream(inputStream))) { TarArchiveEntry tarEntry; while ((tarEntry = tarInputStream.getNextTarEntry()) != null) { System.out.println("Reading entry %s with size %d" .formatted(tarEntry.getName(), tarEntry.getSize())); } } catch (Exception ex) { throw new RuntimeException(ex); } }, executorService)) .toList(); Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new CompletableFuture<?>[0]))); }
Although TarArchiveInputStream is marked as not thread safe, I am not reusing objects here. Those are in fact separate objects, presumably all with their own position tracking info.
The stacktrace here looks like:
Caused by: java.io.IOException: Corrupted TAR archive. at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480) at org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534) at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431) at Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 'dddddddddddd' len=12 at org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516) at org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540) at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496) at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478) ... 7 more
That code shows that occasionally the header is wrong (the tar entry name contains gibberish bits) which makes me think that `getNextTarEntry()` can be faulty.
Running that code with commons compress 1.25.0 works as expected. So it's probably something added since November. Note that this is something related to parallelism - using an executor service with a single thread doesn't suffer from the same error. The tgz to decompress doesn't really matter - you can use a manually created one worth a few KBs.
Attachments
Issue Links
- links to