Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10451

TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive to have HIVE-24157

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.5.0, Impala 4.4.1
    • None
    • None
    • ghx-label-8

    Description

      TestAvroSchemaResolution.test_avro_schema_resolution recently fails when building against a Hive version with HIVE-24157.

      query_test.test_avro_schema_resolution.TestAvroSchemaResolution.test_avro_schema_resolution[protocol: beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: avro/snap/block] (from pytest)
      
      query_test/test_avro_schema_resolution.py:36: in test_avro_schema_resolution
       self.run_test_case('QueryTest/avro-schema-resolution', vector, unique_database)
      common/impala_test_suite.py:690: in run_test_case
       self.__verify_results_and_errors(vector, test_section, result, use_db)
      common/impala_test_suite.py:523: in __verify_results_and_errors
       replace_filenames_with_placeholder)
      common/test_result_verifier.py:456: in verify_raw_results
       VERIFIER_MAP[verifier](expected, actual)
      common/test_result_verifier.py:278: in verify_query_result_is_equal
       assert expected_results == actual_results
      E assert Comparing QueryTestResults (expected vs actual):
      E 10 != 0 
      

      The failed query is

      select count(*) from functional_avro_snap.avro_coldef 

      The cause is that data loading for avro_coldef failed. The DML is

      INSERT OVERWRITE TABLE avro_coldef PARTITION(year=2014, month=1)
      SELECT bool_col, tinyint_col, smallint_col, int_col, bigint_col,
      float_col, double_col, date_string_col, string_col, timestamp_col
      FROM (select * from functional.alltypes order by id limit 5) a;
      

      The failure (found in HS2) is:

      2021-01-24T01:52:16,340 ERROR [9433ee64-d706-4fa4-a146-18d71bf17013 HiveServer2-Handler-Pool: Thread-4946] parse.CalcitePlanner: CBO failed, skipping CBO.
      org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting DATE/TIMESTAMP types to NUMERIC is prohibited (hive.strict.timestamp.conversion)
       at org.apache.hadoop.hive.ql.udf.TimestampCastRestrictorResolver.getEvalMethod(TimestampCastRestrictorResolver.java:62) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:168) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:260) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:292) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDescWithUdfData(TypeCheckProcFactory.java:987) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.ParseUtils.createConversionCast(ParseUtils.java:163) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:8551) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7908) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11100) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10972) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11901) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11771) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:593) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12678) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:423) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:607) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:553) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:547) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_144]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_144]
       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
      

      This check is introduced in HIVE-24157. Describe on the table shows the timestamp_col is bigint:

      0: jdbc:hive2://localhost:11050> desc avro_coldef;
      INFO  : Compiling command(queryId=systest_20210125012100_83dadafd-8e20-4a45-8dd2-54d3a6f4b6e2): desc avro_coldef
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
      INFO  : Completed compiling command(queryId=systest_20210125012100_83dadafd-8e20-4a45-8dd2-54d3a6f4b6e2); Time taken: 0.016 seconds
      INFO  : Executing command(queryId=systest_20210125012100_83dadafd-8e20-4a45-8dd2-54d3a6f4b6e2): desc avro_coldef
      INFO  : Starting task [Stage-0:DDL] in serial mode
      INFO  : Completed executing command(queryId=systest_20210125012100_83dadafd-8e20-4a45-8dd2-54d3a6f4b6e2); Time taken: 0.008 seconds
      INFO  : OK
      +--------------------------+------------+----------+
      |         col_name         | data_type  | comment  |
      +--------------------------+------------+----------+
      | bool_col                 | boolean    |          |
      | tinyint_col              | int        |          |
      | smallint_col             | int        |          |
      | int_col                  | int        |          |
      | bigint_col               | bigint     |          |
      | float_col                | float      |          |
      | double_col               | double     |          |
      | date_string_col          | string     |          |
      | string_col               | string     |          |
      | timestamp_col            | bigint     |          |
      | year                     | int        |          |
      | month                    | int        |          |
      |                          | NULL       | NULL     |
      | # Partition Information  | NULL       | NULL     |
      | # col_name               | data_type  | comment  |
      | year                     | int        |          |
      | month                    | int        |          |
      +--------------------------+------------+----------+

      This hits the restriction.

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: