Description
ORC files can be produced by any external tools. Some corrupt files may contain malformed protobuf objects which can crash the process. The attachment is an example.
In a debug build, protobuf will throw exceptions for this file:
$ build/tools/src/orc-scan maleformed_protobuf.orc [libprotobuf FATAL /mnt/volume1/impala-orc/orc/build/c++/libs/thirdparty/protobuf_ep-install/include/google/protobuf/repeated_field.h:1522] CHECK failed: (index) < (current_size_): Caught exception in maleformed_protobuf.orc: CHECK failed: (index) < (current_size_):
It hits a DCHECK which is eliminated in a release build.
1518 template <typename TypeHandler> 1519 inline const typename TypeHandler::Type& 1520 RepeatedPtrFieldBase::Get(int index) const { 1521 GOOGLE_DCHECK_GE(index, 0); 1522 GOOGLE_DCHECK_LT(index, current_size_); 1523 return *cast<TypeHandler>(rep_->elements[index]); 1524 }
In a release build, the process crash immediately, which means any system integrated with the orc-lib will crash when processing such kind of files.
$ build/tools/src/orc-scan maleformed_protobuf.orc Segmentation fault (core dumped)
The stacktrace for this crash:
#0 0x0000000000588c1e in orc::ReaderImpl::ReaderImpl(std::shared_ptr<orc::FileContents>, orc::ReaderOptions const&, unsigned long, unsigned long) () #1 0x000000000058b1ee in orc::createReader(std::unique_ptr<orc::InputStream, std::default_delete<orc::InputStream> >, orc::ReaderOptions const&) () #2 0x00000000005847c0 in scanFile (out=..., filename=0x7ffcf03a173d "maleformed_protobuf.orc", batchSize=batchSize@entry=1024) at /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:32 #3 0x0000000000584150 in main (argc=<optimized out>, argv=<optimized out>) at /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:84
We may need to introduce checksums to avoid this.
Attachments
Attachments
Issue Links
- blocks
-
IMPALA-6772 Enable test_scanners_fuzz for ORC format
- Resolved
- links to