Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-408

Shutdown hook in parquet-avro library corrupts data and disables logging

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.8.1
    • None
    • parquet-avro
    • None

    Description

      Parquet-avro and probably also other Parquet libraries are not well behaved. It registers a shutdown hook that bypasses application shutdown sequence, corrupts data written to currently opened Parquet file(s) and disables or reconfigures slf4j/logback logger so no further log message is visible.

      Scope

      Our application is a microservice that handles stop request in form of signal SIGTERM, resp. JVM shutdown hook. If it arrives the application will close all opened files (writers), release all other resources and gracefully shutdown. We are swiching from sequence files to Parquet at the moment and using Maven dependency org.apache.parquet:parquet-avro:1.8.1 which is current latest version. We are using Runtime.getRuntime().addShutdownHook() to handle SIGTERM.

      Example code

      See archive in attachment.

      • Optionally update version of hadoop-client in pom.xml to match your Hadoop.
      • Use mvn package to compile.
      • Copy Hadoop configuration XMLs to config directory.
      • Update configuration at the top of ParquetBrokenShutdown class.
      • Execute ParquetBrokenShutdown class.
      • Send SIGTERM to shutdown the application (kill PID).

      Initial analysis

      Parquet library tries to care about application shutdown but this introduces more issues than solves. If application is writing to a file and the library asynchronously decides to close underlying writer, data loss will occur. The handle is just closed and all remaining records can't be written.

      Writing to HDFS/Parquet failed
      java.io.IOException: can not write PageHeader(type:DICTIONARY_PAGE, uncompressed_page_size:14, compressed_page_size:34, dictionary_page_header:DictionaryPageHeader(num_values:1, encoding:PLAIN))
      	at org.apache.parquet.format.Util.write(Util.java:224)
      	at org.apache.parquet.format.Util.writePageHeader(Util.java:61)
      	at org.apache.parquet.format.converter.ParquetMetadataConverter.writeDictionaryPageHeader(ParquetMetadataConverter.java:760)
      	at org.apache.parquet.hadoop.ParquetFileWriter.writeDictionaryPage(ParquetFileWriter.java:307)
      	at org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writeToFileWriter(ColumnChunkPageWriteStore.java:179)
      	at org.apache.parquet.hadoop.ColumnChunkPageWriteStore.flushToFileWriter(ColumnChunkPageWriteStore.java:238)
      	at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:165)
      	at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
      	at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)
      	at com.avast.bugreport.parquet.ParquetBrokenShutdown.writeParquetFile(ParquetBrokenShutdown.java:86)
      	at com.avast.bugreport.parquet.ParquetBrokenShutdown.run(ParquetBrokenShutdown.java:53)
      	at com.avast.bugreport.parquet.ParquetBrokenShutdown.main(ParquetBrokenShutdown.java:153)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:497)
      	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
      Caused by: parquet.org.apache.thrift.transport.TTransportException: java.nio.channels.ClosedChannelException
      	at parquet.org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
      	at parquet.org.apache.thrift.transport.TTransport.write(TTransport.java:105)
      	at parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:424)
      	at parquet.org.apache.thrift.protocol.TCompactProtocol.writeByteDirect(TCompactProtocol.java:431)
      	at parquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBeginInternal(TCompactProtocol.java:194)
      	at parquet.org.apache.thrift.protocol.TCompactProtocol.writeFieldBegin(TCompactProtocol.java:176)
      	at org.apache.parquet.format.InterningProtocol.writeFieldBegin(InterningProtocol.java:74)
      	at org.apache.parquet.format.PageHeader.write(PageHeader.java:918)
      	at org.apache.parquet.format.Util.write(Util.java:222)
      	... 16 more
      Caused by: java.nio.channels.ClosedChannelException
      	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1635)
      	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104)
      	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
      	at java.io.DataOutputStream.write(DataOutputStream.java:107)
      	at parquet.org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
      	... 24 more
      

      Stored file is also currupted in such case. It contains only four bytes "PAR1" header so actually all data are lost.

      Together with that whole application logging using slf4j/logback is reconfigured or completely disabled so no more application's log messages appear in the output. This may be related to PARQUET-369 and will be hopefully fixed in PARQUET-401. No such issue was observed when sequence files were used in the past.

      Known workaround

      Java executes threads of shutdown hooks independently and in parallel, but it is possible for the application to force its own shutdown to be executed before the Parquet's one. Parquet library uses Hadoop's ShutdownHookManager to register the hook which has a possibility to define a priority.

      if (!USE_WORKAROUND) {
      	Runtime.getRuntime().addShutdownHook(shutdownHook);
      } else {
      	int maxPriority = Integer.MAX_VALUE; // Execute application hook before hook in Hadoop library
      	ShutdownHookManager.get().addShutdownHook(shutdownHook, maxPriority);
      }
      

      This workaround would be impossible if there are two or more such nasty libraries.

      No workaround has been found for the second issue with disabled logging until now.

      Suggestions for fix

      • Never touch logging configuration in libraries.
        • Use slf4j-api interfaces and just output log messages.
        • Application is responsible for configuration of logging.
      • Do not register any shutdown hook in libraries that releases resources that are still possibly in use.
        • Instead provide an API to the application to close/free the allocated resources (already present).
        • The application is responsible to call close on all Closeables during shutdown in correct order and correct time.
        • No library has enough information for that.
      • Be well behaved library, do not try to be smarter than the application, it isn't possible.

      Attachments

        Issue Links

          Activity

            People

              turek@avast.com Michal Turek
              turek@avast.com Michal Turek
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: