Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3092

Cannot query tables after add columns for AVRO table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.5.0
    • Impala 2.6.0
    • Backend

    Description

      — Copying the test case —

      Preparing(Execute this on Hive)

      DROP TABLE sandw1;
      CREATE TABLE `sandw1`(`name` string, `description` string, `cost` decimal(4,2)) STORED AS avro;
      INSERT INTO sandw1 values('tuna', 'bread tuna and mayo', 4.99);
      INSERT INTO sandw1 values('egg', 'bread egg and mayo', NULL);
      DROP TABLE sandwiches;
      CREATE TABLE `sandwiches`( `name` string, `description` string ) stored as avro;
      INSERT INTO sandwiches select name, description FROM sandw1;
      

      Test(Execute this on Hive/Impala)

      SELECT * from sandwiches;
      ALTER TABLE sandwiches ADD COLUMNS (`cost` decimal(4,2));
      SELECT * from sandwiches;
      

      Hive can deal with the added new column as followings.

      hive> SELECT * from sandwiches;
      OK
      tuna    bread tuna and mayo
      egg     bread egg and mayo
      Time taken: 0.13 seconds, Fetched: 2 row(s)
      hive> ALTER TABLE sandwiches ADD COLUMNS (`cost` decimal(4,2));
      OK
      Time taken: 0.115 seconds
      hive> SELECT * from sandwiches;
      OK
      tuna    bread tuna and mayo     NULL
      egg     bread egg and mayo      NULL
      Time taken: 0.072 seconds, Fetched: 2 row(s)
      

      Then Impala cannot select the same table alter add columns.

      [nightly55-2.vpc.cloudera.com:21000] > SELECT * from sandwiches;
      Query: select * from sandwiches
      +------+---------------------+
      | name | description         |
      +------+---------------------+
      | tuna | bread tuna and mayo |
      | egg  | bread egg and mayo  |
      +------+---------------------+
      Fetched 2 row(s) in 0.63s
      [nightly55-2.vpc.cloudera.com:21000] > ALTER TABLE sandwiches ADD COLUMNS (`cost` decimal(4,2));
      Query: alter TABLE sandwiches ADD COLUMNS (`cost` decimal(4,2))
      [nightly55-2.vpc.cloudera.com:21000] > SELECT * from sandwiches;
      Query: select * from sandwiches
      
      WARNINGS: Field cost is missing from file and does not have a default value.
      
      Fetched 0 row(s) in 0.30s
      

      From the followings, NULL itself is fine but dealing with default value may have the issue.

      [nightly55-2.vpc.cloudera.com:21000] > SELECT * from sandw1;
      Query: select * from sandw1
      +------+---------------------+------+
      | name | description         | cost |
      +------+---------------------+------+
      | egg  | bread egg and mayo  | NULL |
      | tuna | bread tuna and mayo | 4.99 |
      +------+---------------------+------+
      

      — End of copying —

      This renders impala unable to execute query after adding new column. Fix is to replicate Hive's behavior.

      Attachments

        Activity

          People

            HuaisiXu Huaisi Xu
            HuaisiXu Huaisi Xu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: