Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.1.0
-
None
-
CentOS-7, Impala-4.1
-
ghx-label-13
Description
When querying Hive table was added columns without using 'cascade', Impala will encounter error like "Unable to find SchemaNode for path 'db.table.column' in the schema of file 'hdfs://xxx/path/to/parquet_file_before_add_column'." I checked parquet file in error log and found that the schema is not compatible with table metadata. Call stack is attached as below. Path and table name is masked:
I0609 18:04:25.970052 115413 status.cc:129] c94d0ab3fdf8f943:3203006100000002] Unable to find SchemaNode for path 'xxx_db.xxx_table.xxx_column' in the schema of file 'hdfs://xxx_nn/xxx_table_path/000000_0'. @ 0xea543b impala::Status::Status() @ 0x1e3225c impala::HdfsParquetScanner::CreateColIdx2EqConjunctMap() @ 0x1e363ea impala::HdfsParquetScanner::Open() @ 0x19b40d0 impala::HdfsScanNodeBase::CreateAndOpenScannerHelper() @ 0x1b5cbae impala::HdfsScanNode::ProcessSplit() @ 0x1b5e12a impala::HdfsScanNode::ScannerThread() @ 0x1b5e9c6 _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE @ 0x18eafa9 impala::Thread::SuperviseThread() @ 0x18ee11a boost::detail::thread_data<>::run() @ 0x2385510 thread_proxy @ 0x7fb5b0745162 start_thread @ 0x7fb5ad21df6c __clone
The error may be relation with IMPALA-10640. Bloom filter requires right hand values of equal conjunction matches with current file schema. The filter will be unavailable if the column does not exist in all parquet files scanned. I think we can disable parquet bloom filter for this single query or scan node when discovered such situation.
How to reproduce (using impala-shell):
- create table parquet_test (id INT) stored as parquet;
- insert into parquet_test values (1),(2),(3);
- alter table parquet_test add columns (name STRING);
- insert into parquet_test values (4, "James");
- select * from parquet_test where name in ("Lily");
- Error occured.