Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1986

partition pruner do not take effect for non-deterministic UDF

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.4.1, 0.5.0, 0.6.0, 0.7.0
    • 0.11.0
    • Query Processor
    • None
    • trunk-src,hive default configure

    Description

      hive udf can be deterministic or non-deterministic,but for non-deterministic udf such as rand and unix_timestamp,ppr do not take effect.
      and for unix_timestamp with para, for example unix_timestamp('2010-01-01'),I think it is deterministic.

      case :

      hive -hiveconf hive.root.logger=DEBUG,console

      create kv_part(key int,value string) partitioned by(ds string);
      alter table kv_part add partition (ds=2010) partition (ds=2011) partition (ds=2012);

      create kv2(key int,value string) partitioned by(ds string);
      alter table kv2 add partition (ds=2013) partition (ds=2014) partition (ds=2015);

      explain select * from kv_part join kv2 on(kv_part.key=kv2.key) where kv_part.ds=2011 and rand() > 0.5

      rand() is non-deterministic ,so kv_part.ds=2011 no not filter the partition ds=2010,ds=2012
      .....
      11/02/14 12:22:32 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[key, value] columnTypes=[int, string] separator=[[B@1ac9683] nullstring=\N lastColumnTakesRest=false
      11/02/14 12:22:32 INFO hive.log: DDL: struct kv_part

      { i32 key, string value}

      11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2010
      11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2011
      11/02/14 12:22:32 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2012
      11/02/14 12:22:32 INFO parse.SemanticAnalyzer: Completed plan generation
      .....

      explain select * from kv_part join kv2 on(kv_part.key=kv2.key) where kv_part.ds=2011 and sin(kv2.key) < 0.5;

      sin() is deterministic,so ppr work ok
      .....
      11/02/14 12:25:22 DEBUG optimizer.GenMapRedUtils: Information added for path hdfs://172.25.38.253:54310/user/hive/warehouse/kv_part/ds=2011
      ....

      And user should get the deterministic info for UDF from wiki,or we shoud add this info to describe function

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhaowei zhaowei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: