Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-994

String comparison stopped working on trunk code

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Invalid
    • None
    • 0.5.0
    • None
    • None
    • Hadoop v0.20.1 and latest Hive from trunk

    Description

      Comparing string field with specified value always yield no match.

      E.g,

      Table staff

      id name
      1 Peter
      2 Mary
      3 John

      query:

      select * from staff where name="Peter";
      

      yield no match.

      Hive stable version with the same configuration works correctly.

      Please help troubleshooting it.

      Thanks,

      Attachments

        Activity

          cwsteinbach Carl Steinbach added a comment -

          String comparison works for me on 0.18.3 with the latest trunk.

          @Nhan: Can you include the output of "select * from staff" and "describe extended staff"?

          cwsteinbach Carl Steinbach added a comment - String comparison works for me on 0.18.3 with the latest trunk. @Nhan: Can you include the output of "select * from staff" and "describe extended staff"?
          nnguyen Nhan Nguyen added a comment -

          Hi Steinbach,

          Thanks for looking into this problem. Here's the complete output of your requested queries

          hive> select * from staff;                                                                                                                                     
          OK
          1       Peter
          2       Mary
          3       John
          Time taken: 0.138 seconds
          hive> select * from staff where name="Peter";
          Total MapReduce jobs = 1
          Launching Job 1 out of 1
          Number of reduce tasks is set to 0 since there's no reduce operator
          Starting Job = job_200912152159_0006, Tracking URL = http://hadoop-1:50030/jobdetails.jsp?jobid=job_200912152159_0006
          Kill Command = /data/hadoop-hive/bin/../bin/hadoop job  -Dmapred.job.tracker=hadoop-1:51444 -kill job_200912152159_0006
          2009-12-16 02:23:18,695 Stage-1 map = 0%,  reduce = 0%
          2009-12-16 02:23:33,808 Stage-1 map = 55%,  reduce = 0%
          2009-12-16 02:23:36,836 Stage-1 map = 64%,  reduce = 0%
          2009-12-16 02:23:39,859 Stage-1 map = 100%,  reduce = 0%
          2009-12-16 02:23:42,883 Stage-1 map = 100%,  reduce = 100%
          Ended Job = job_200912152159_0006
          OK
          Time taken: 26.58 seconds
          hive> describe extended staff;     
          OK
          id      bigint
          name    string
                           
          Detailed Table Information      Table(tableName:staff, dbName:default, owner:hive, createTime:1260958974, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:bigint, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://hadoop-1:51000/staff, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,line.delim=         
          ,field.delim=   }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{transient_lastDdlTime=1260958974})
          Time taken: 0.081 seconds
          

          I believe this has nothing to do with Hadoop as when I switch back to Hive stable release 0.4.0 with hive-default.xml copied from my current Hive, string comparison just works perfectly.

          Thanks,

          nnguyen Nhan Nguyen added a comment - Hi Steinbach, Thanks for looking into this problem. Here's the complete output of your requested queries hive> select * from staff; OK 1 Peter 2 Mary 3 John Time taken: 0.138 seconds hive> select * from staff where name= "Peter" ; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_200912152159_0006, Tracking URL = http: //hadoop-1:50030/jobdetails.jsp?jobid=job_200912152159_0006 Kill Command = /data/hadoop-hive/bin/../bin/hadoop job -Dmapred.job.tracker=hadoop-1:51444 -kill job_200912152159_0006 2009-12-16 02:23:18,695 Stage-1 map = 0%, reduce = 0% 2009-12-16 02:23:33,808 Stage-1 map = 55%, reduce = 0% 2009-12-16 02:23:36,836 Stage-1 map = 64%, reduce = 0% 2009-12-16 02:23:39,859 Stage-1 map = 100%, reduce = 0% 2009-12-16 02:23:42,883 Stage-1 map = 100%, reduce = 100% Ended Job = job_200912152159_0006 OK Time taken: 26.58 seconds hive> describe extended staff; OK id bigint name string Detailed Table Information Table(tableName:staff, dbName: default , owner:hive, createTime:1260958974, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:bigint, comment: null ), FieldSchema(name:name, type:string, comment: null )], location:hdfs: //hadoop-1:51000/staff, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed: false , numBuckets:-1, serdeInfo:SerDeInfo(name: null , serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=9,line.delim= ,field.delim= }), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{transient_lastDdlTime=1260958974}) Time taken: 0.081 seconds I believe this has nothing to do with Hadoop as when I switch back to Hive stable release 0.4.0 with hive-default.xml copied from my current Hive, string comparison just works perfectly. Thanks,
          nnguyen Nhan Nguyen added a comment -

          I found that the problem occurs only from r889672. Previous revisions don't have this problem.

          Thanks

          nnguyen Nhan Nguyen added a comment - I found that the problem occurs only from r889672. Previous revisions don't have this problem. Thanks
          zshao Zheng Shao added a comment -

          Nhan, please try:

          select *, length(name) from staff;
          

          It's likely that the name column in your table has trailing spaces.

          zshao Zheng Shao added a comment - Nhan, please try: select *, length(name) from staff; It's likely that the name column in your table has trailing spaces.
          namit Namit Jain added a comment -

          Blocker for 0.5

          namit Namit Jain added a comment - Blocker for 0.5
          nnguyen Nhan Nguyen added a comment -

          Hi Shao,

          Here's the output of your query. It shows correct length of each entry.

          hive> select name, length(name) from staff;                                                                                                                  
          Total MapReduce jobs = 1                                                                                                                                     
          Launching Job 1 out of 1                                                                                                                                     
          Number of reduce tasks is set to 0 since there's no reduce operator                                                                                          
          Starting Job = job_200912170925_0023, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_200912170925_0023                                       
          Kill Command = /usr/local/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_200912170925_0023                                     
          2009-12-18 09:21:49,217 Stage-1 map = 0%,  reduce = 0%                                                                                                       
          2009-12-18 09:21:58,284 Stage-1 map = 40%,  reduce = 0%                                                                                                      
          2009-12-18 09:22:01,337 Stage-1 map = 80%,  reduce = 0%                                                                                                      
          2009-12-18 09:22:04,357 Stage-1 map = 100%,  reduce = 0%                                                                                                     
          2009-12-18 09:22:07,373 Stage-1 map = 100%,  reduce = 100%                                                                                                   
          Ended Job = job_200912170925_0023                                                                                                                            
          OK                                                                                                                                                           
          Peter   5                                                                                                                                                    
          Mary    4                                                                                                                                                    
          John    4                                                                                                                                                    
          Time taken: 20.641 seconds
          

          Here's the interesting output of the explain query on revision r889672

          hive> explain select * from staff where name="Peter";
          OK
          ABSTRACT SYNTAX TREE:
            (TOK_QUERY (TOK_FROM (TOK_TABREF staff)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL name) "Peter"))))
          
          STAGE DEPENDENCIES:
            Stage-1 is a root stage
            Stage-0 is a root stage
          
          STAGE PLANS:
            Stage: Stage-1
              Map Reduce
                Alias -> Map Operator Tree:
                  staff
                    TableScan
                      alias: staff
                      Filter Operator
                        predicate:
                            expr: (name = '"Peter"')
                            type: boolean
                        Filter Operator
                          predicate:
                              expr: (name = '"Peter"')
                              type: boolean
                          Select Operator
                            expressions:
                                  expr: id
                                  type: bigint
                                  expr: name
                                  type: string
                            outputColumnNames: _col0, _col1
                            File Output Operator
                              compressed: false
                              GlobalTableId: 0
                              table:
                                  input format: org.apache.hadoop.mapred.TextInputFormat
                                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          
            Stage: Stage-0
              Fetch Operator
                limit: -1
          
          
          Time taken: 0.123 seconds
          

          and here's the output of r889506

          hive> explain select * from staff where name="Peter";
          OK
          ABSTRACT SYNTAX TREE:
            (TOK_QUERY (TOK_FROM (TOK_TABREF staff)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL name) "Peter"))))
          
          STAGE DEPENDENCIES:
            Stage-1 is a root stage
            Stage-0 is a root stage
          
          STAGE PLANS:
            Stage: Stage-1
              Map Reduce
                Alias -> Map Operator Tree:
                  staff
                    TableScan
                      alias: staff
                      Filter Operator
                        predicate:
                            expr: (name = 'Peter')
                            type: boolean
                        Filter Operator
                          predicate:
                              expr: (name = 'Peter')
                              type: boolean
                          Select Operator
                            expressions:
                                  expr: id
                                  type: bigint
                                  expr: name
                                  type: string
                            outputColumnNames: _col0, _col1
                            File Output Operator
                              compressed: false
                              GlobalTableId: 0
                              table:
                                  input format: org.apache.hadoop.mapred.TextInputFormat
                                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          
            Stage: Stage-0
              Fetch Operator
                limit: -1
          
          
          Time taken: 4.306 seconds
          

          So, you could see that in r889672 we have

          expr: (name = '"Peter"')
          

          while in r889506 we have

          expr: (name = 'Peter')
          

          Do I miss something? Am I the only one that have this problem ?

          Thanks

          nnguyen Nhan Nguyen added a comment - Hi Shao, Here's the output of your query. It shows correct length of each entry. hive> select name, length(name) from staff; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_200912170925_0023, Tracking URL = http: //localhost:50030/jobdetails.jsp?jobid=job_200912170925_0023 Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_200912170925_0023 2009-12-18 09:21:49,217 Stage-1 map = 0%, reduce = 0% 2009-12-18 09:21:58,284 Stage-1 map = 40%, reduce = 0% 2009-12-18 09:22:01,337 Stage-1 map = 80%, reduce = 0% 2009-12-18 09:22:04,357 Stage-1 map = 100%, reduce = 0% 2009-12-18 09:22:07,373 Stage-1 map = 100%, reduce = 100% Ended Job = job_200912170925_0023 OK Peter 5 Mary 4 John 4 Time taken: 20.641 seconds Here's the interesting output of the explain query on revision r889672 hive> explain select * from staff where name= "Peter" ; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF staff)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL name) "Peter" )))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: staff TableScan alias: staff Filter Operator predicate: expr: (name = ' "Peter" ' ) type: boolean Filter Operator predicate: expr: (name = ' "Peter" ' ) type: boolean Select Operator expressions: expr: id type: bigint expr: name type: string outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Time taken: 0.123 seconds and here's the output of r889506 hive> explain select * from staff where name= "Peter" ; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF staff)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL name) "Peter" )))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: staff TableScan alias: staff Filter Operator predicate: expr: (name = 'Peter' ) type: boolean Filter Operator predicate: expr: (name = 'Peter' ) type: boolean Select Operator expressions: expr: id type: bigint expr: name type: string outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 Time taken: 4.306 seconds So, you could see that in r889672 we have expr: (name = ' "Peter" ' ) while in r889506 we have expr: (name = 'Peter' ) Do I miss something? Am I the only one that have this problem ? Thanks
          zshao Zheng Shao added a comment -

          This definitely is a bug. I will take a look.

          zshao Zheng Shao added a comment - This definitely is a bug. I will take a look.
          zshao Zheng Shao added a comment -

          Nhan, I cannot reproduce your problem with r889672.
          Can you check if you have any local modifications?

          zshao Zheng Shao added a comment - Nhan, I cannot reproduce your problem with r889672. Can you check if you have any local modifications?
          nnguyen Nhan Nguyen added a comment -

          I make no change to the source code. Just gets the code synced with trunk and rebuilds Hive. However, the problem has gone when I tried a clean svn checkout. I guess the problem occurred because I compiled new code without cleaning the old compiled code first.

          Thanks alot for your help Zheng.

          Nhan

          nnguyen Nhan Nguyen added a comment - I make no change to the source code. Just gets the code synced with trunk and rebuilds Hive. However, the problem has gone when I tried a clean svn checkout. I guess the problem occurred because I compiled new code without cleaning the old compiled code first. Thanks alot for your help Zheng. Nhan

          People

            zshao Zheng Shao
            nnguyen Nhan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: