Details
Description
1.create a table
CREATE TABLE `itmp8888`(`id` INT, `name` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'serialization.format' = '1'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
'transient_lastDdlTime' = '1631174384',
'orc.encrypt' = 'AES_CTR_128:id,name',
'orc.mask' = 'sha256:id,name',
'orc.encrypt.ezk' = 'jNCeDBtNfT8wPaTpR34JHA=='
)
2. insert data
3. a select statement that no filters is fine
select * from itmp8888
4. a select statement with the filter including the encrypted column will throw exception
select * from itmp8888 where id = 1
5.the stack trace
Caused by: java.lang.AssertionError: Index is not populated for 1Caused by: java.lang.AssertionError: Index is not populated for 1 at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:995) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1083) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1101) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1151) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1186) at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:248) at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:864) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:142) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:211) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:175) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
6. I debug the code find that the RowIndex is null for all the encrypted columns
Attachments
Attachments
Issue Links
- links to