Description
SearchArgument currently only provides interfaces using column names. Only columns of struct fields can be correctly resolved. Other columns (e.g. inside LIST or MAP) will cause crash in resolving them.
The following codes reproduce the issue:
#include <orc/OrcFile.hh> using namespace std; using namespace orc; int main() { ORC_UNIQUE_PTR<InputStream> inStream = readLocalFile("complextypestbl.orc"); ReaderOptions options; ORC_UNIQUE_PTR<Reader> reader = createReader(move(inStream), options); RowReaderOptions rowReaderOptions; ORC_UNIQUE_PTR<SearchArgumentBuilder> sarg = SearchArgumentFactory::newBuilder(); sarg->lessThanEquals("f", PredicateDataType::STRING, Literal("bbb", 3)); ORC_UNIQUE_PTR<SearchArgument> final_sarg = sarg->build(); rowReaderOptions.searchArgument(move(final_sarg)); ORC_UNIQUE_PTR<RowReader> rowReader = reader->createRowReader(rowReaderOptions); ORC_UNIQUE_PTR<ColumnVectorBatch> batch = rowReader->createRowBatch(1024); return 0; }
complextypestbl.orc is an ORC file of a ACID table with the following schema:
id bigint int_array array<int> int_array_array array<array<int>> int_map map<string, int> int_map_array array<map<string, int>> nested_struct struct<a: int, b: array<int>, c: struct<d: array<array<struct<e: int, f: string>>>>, g: map<string, struct<h: struct<i: array<double>>>>>
The above C++ codes push down a predicate on the "f" column. GDB stacktrace for the crash:
Program received signal SIGSEGV, Segmentation fault. orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28 28 if (type.getFieldName(i) == colName) { (gdb) bt #0 orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28 #1 0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31 #2 0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31 #3 0x000000000045a67f in orc::SargsApplier::SargsApplier (this=0x200b9f0, type=..., searchArgument=<optimized out>, rowIndexStride=<optimized out>, writerVersion=<optimized out>) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:56 #4 0x00000000004253f8 in orc::RowReaderImpl::RowReaderImpl (this=0x2009760, _contents=..., opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:244 #5 0x00000000004257ad in orc::ReaderImpl::createRowReader (this=<optimized out>, opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:765 #6 0x000000000040b688 in main () (gdb) l 23 24 // find column id from column name 25 uint64_t SargsApplier::findColumn(const Type& type, 26 const std::string& colName) { 27 for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) { 28 if (type.getFieldName(i) == colName) { 29 return type.getSubtype(i)->getColumnId(); 30 } else { 31 uint64_t ret = findColumn(*type.getSubtype(i), colName); 32 if (ret != INVALID_COLUMN_ID) { (gdb) p type.getKind() $16 = orc::LIST
Only STRUCT type has valid field names. So the above codes crash.