Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3347

Kudu scanner : Expensive per row per column IsNull check

    XMLWordPrintableJSON

Details

    Description

      Kudu should annotate each column in the batch if it is nullable, as today per row per column from a kudu batch the scanner checks if the slot is null, it would be much more efficient to store a per column bit in the KuduScanBatch indicating nullability of a column.

      Status KuduScanner::KuduRowToImpalaTuple(const KuduScanBatch::RowPtr& row,
          RowBatch* row_batch, Tuple* tuple) {
        for (int i = 0; i < scan_node_->tuple_desc_->slots().size(); ++i) {
          const SlotDescriptor* info = scan_node_->tuple_desc_->slots()[i];
          void* slot = tuple->GetSlot(info->tuple_offset());
      
          if (row.IsNull(i)) {
            SetSlotToNull(tuple, *info);
            continue;
          }
      
          int max_len = -1;
          switch (info->type().type) {
            case TYPE_VARCHAR:
              max_len = info->type().len;
              DCHECK_GT(max_len, 0);
      

      For a basic scan null check consumes 4% of the CPU cycles.

      Attachments

        Issue Links

          Activity

            People

              mjacobs Matthew Jacobs
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: