Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2602

Throw an error on schema change during streaming aggregation

    XMLWordPrintableJSON

Details

    Description

      We don't recoginize schema change during streaming aggregation when column is a mix of required and optional types.
      Hash aggregation does throw correct error message.

      I have a table 'mix' where:

      [Fri Mar 27 09:46:07 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ls -ltr
      total 753
      -rwxr-xr-x 1 root root 759879 Mar 27 09:41 optional.parquet
      -rwxr-xr-x 1 root root   9867 Mar 27 09:41 required.parquet
      
      [Fri Mar 27 09:46:09 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema optional.parquet
      message root {
        optional binary c_varchar (UTF8);
        optional int32 c_integer;
        optional int64 c_bigint;
        optional float c_float;
        optional double c_double;
        optional int32 c_date (DATE);
        optional int32 c_time (TIME);
        optional int64 c_timestamp (TIMESTAMP);
        optional boolean c_boolean;
        optional double d9;
        optional double d18;
        optional double d28;
        optional double d38;
      }
      
      [Fri Mar 27 09:46:41 root@/mapr/vmarkman.cluster.com/drill/testdata/joins/mix ] # ~/parquet-tools-1.5.1-SNAPSHOT/parquet-schema required.parquet
      message root {
        required binary c_varchar (UTF8);
        required int32 c_integer;
        required int64 c_bigint;
        required float c_float;
        required double c_double;  required int32 c_date (DATE);
        required int32 c_time (TIME);
        required int64 c_timestamp (TIMESTAMP);
        required boolean c_boolean;
        required double d9;
        required double d18;
        required double d28;
        required double d38;
      }
      

      Nice error message on hash aggregation:

      0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer;
      +------------+
      |   EXPR$0   |
      +------------+
      Query failed: Query stopped., Hash aggregate does not support schema changes [ 2bc255ce-c7f9-47bf-80b0-a5c87cfa67be on atsqa4-134.qa.lab:31010 ]
      java.lang.RuntimeException: java.sql.SQLException: Failure while executing query.
              at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
              at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
              at sqlline.SqlLine.print(SqlLine.java:1809)
              at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
              at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
              at sqlline.SqlLine.dispatch(SqlLine.java:889)
              at sqlline.SqlLine.begin(SqlLine.java:763)
              at sqlline.SqlLine.start(SqlLine.java:498)
              at sqlline.SqlLine.main(SqlLine.java:460)
      

      On streaming aggregation, exception that is hard for the end user to understand:

      0: jdbc:drill:schema=dfs> alter session set `planner.enable_hashagg` = false;
      +------------+------------+
      |     ok     |  summary   |
      +------------+------------+
      | true       | planner.enable_hashagg updated. |
      +------------+------------+
      1 row selected (0.067 seconds)
      
      0: jdbc:drill:schema=dfs> select count(*) from mix group by c_integer;
      +------------+
      |   EXPR$0   |
      +------------+
      Query failed: RemoteRpcException: Failure while running fragment., Failure while reading vector.  Expected vector class of org.apache.drill.exec.vector.IntVector but was holding vector class org.apache.drill.exec.vector.NullableIntVector. [ 5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ]
      [ 5610e589-38e0-4dc5-a560-649516180ba4 on atsqa4-134.qa.lab:31010 ]
      java.lang.RuntimeException: java.sql.SQLException: Failure while executing query.
              at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
              at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
              at sqlline.SqlLine.print(SqlLine.java:1809)
              at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
              at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
              at sqlline.SqlLine.dispatch(SqlLine.java:889)
              at sqlline.SqlLine.begin(SqlLine.java:763)
              at sqlline.SqlLine.start(SqlLine.java:498)
              at sqlline.SqlLine.main(SqlLine.java:460)
      

      Attachments

        1. optional.parquet
          742 kB
          Victoria Markman
        2. required.parquet
          10 kB
          Victoria Markman
        3. DRILL-2602.1.patch.txt
          10 kB
          Abdel Hakim Deneche
        4. DRILL-2602.2.patch.txt
          10 kB
          Abdel Hakim Deneche
        5. DRILL-2602.3.patch.txt
          5 kB
          Abdel Hakim Deneche
        6. DRILL-2602.4.patch.txt
          7 kB
          Abdel Hakim Deneche

        Activity

          People

            jaltekruse Jason Altekruse
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: