Status: Resolved
Resolution: Fixed
Commons compress 1.26.0 to get a failure. Any tar tgz.
Something in seems to make iterating through the tar entries of multiple
TarArchiveInputStreams throw Corrupted TAR archive:
@Test void bla() { ExecutorService executorService = Executors.newFixedThreadPool(10); List<CompletableFuture<Void>> tasks = IntStream.range(0, 200) .mapToObj(_idx -> CompletableFuture.runAsync( () -> { try (InputStream inputStream = this.getClass() .getResourceAsStream( "/<your favourite tar tgz>"); TarArchiveInputStream tarInputStream = new TarArchiveInputStream(new GZIPInputStream(inputStream))) { TarArchiveEntry tarEntry; while ((tarEntry = tarInputStream.getNextTarEntry()) != null) { System.out.println("Reading entry %s with size %d" .formatted(tarEntry.getName(), tarEntry.getSize())); } } catch (Exception ex) { throw new RuntimeException(ex); } }, executorService)) .toList(); Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new CompletableFuture<?>[0]))); }
Although TarArchiveInputStream is marked as not thread safe, I am not reusing objects here. Those are in fact separate objects, presumably all with their own position tracking info.
The stacktrace here looks like:
Caused by: Corrupted TAR archive. at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader( at org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>( at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry( at Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 'dddddddddddd' len=12 at org.apache.commons.compress.archivers.tar.TarUtils.parseOctal( at org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary( at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped( at org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader( ... 7 more
That code shows that occasionally the header is wrong (the tar entry name contains gibberish bits) which makes me think that `getNextTarEntry()` can be faulty.
Running that code with commons compress 1.25.0 works as expected. So it's probably something added since November. Note that this is something related to parallelism - using an executor service with a single thread doesn't suffer from the same error. The tgz to decompress doesn't really matter - you can use a manually created one worth a few KBs.
Issue Links
- links to