Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
Description
@Override public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter getRawRecordWriter(Path path, Options options) throws IOException { final Path filename = AcidUtils.createFilename(path, options); final OrcFile.WriterOptions opts = OrcFile.writerOptions(options.getTableProperties(), options.getConfiguration()); if (!options.isWritingBase()) { opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE) .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE) .blockPadding(false) .compress(CompressionKind.NONE) .rowIndexStride(0) ; }
rowIndexStride(0) makes StripeStatistics.getColumnStatistics() return objects but with meaningless values, like min/max for IntegerColumnStatistics set to MIN_LONG/MAX_LONG.
This interferes with ability to infer min ROW_ID for a split but also creates inefficient files.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events
- Closed