[HIVE-20604] Minor compaction disables ORC column stats - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 4.0.0-alpha-1
Component/s: Transactions
Labels:
None

Target Version/s:

4.0.0

Description

  @Override
  public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter
        getRawRecordWriter(Path path, Options options) throws IOException {
    final Path filename = AcidUtils.createFilename(path, options);
    final OrcFile.WriterOptions opts =
        OrcFile.writerOptions(options.getTableProperties(), options.getConfiguration());
    if (!options.isWritingBase()) {
      opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE)
          .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE)
          .blockPadding(false)
          .compress(CompressionKind.NONE)
          .rowIndexStride(0)
      ;
    }

rowIndexStride(0) makes StripeStatistics.getColumnStatistics() return objects but with meaningless values, like min/max for IntegerColumnStatistics set to MIN_LONG/MAX_LONG.

This interferes with ability to infer min ROW_ID for a split but also creates inefficient files.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-20604.01.patch
26/Sep/18 00:06
2 kB
Eugene Koifman

Issue Links

is related to

HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events

Closed

Activity

People

Assignee:: Eugene Koifman

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 20/Sep/18 00:31

Updated:: 17/Nov/22 08:55

Resolved:: 26/Sep/18 18:32