[DRILL-4255] SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.12.0
Component/s: Execution - Flow
Labels:
None
Environment:

CentOS

Description

SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in unsupported operation. An exact query over another set of JSON data returns correct results.

MapR Drill 1.4.0, commit ID : 9627a80f
MapRBuildVersion : 5.1.0.36488.GA
OS : CentOS x86_64 GNU/Linux

0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t;
Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes

Fragment 3:3

[Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] (state=,code=0)

Stack trace from drillbit.log

2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO  o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes


[Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ]
        at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
        at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_65]
        at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1506.jar:na]
         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
        at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]

Query plan for above query.

00-00    Screen : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.4100499276E7 rows, 1.69455861396E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7572
00-01      UnionExchange : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.408635556E7 rows, 1.6944171768E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7571
01-01        Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7570
01-02          HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7569
01-03            Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7568
01-04              HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7567
02-01                UnorderedMuxExchange : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.1116175200000003E7 rows, 1.34365302E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7566
03-01                  Project(operation=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {2.97018036E7 rows, 1.329509304E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7565
03-02                    HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {2.8287432E7 rows, 1.27293444E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7564
03-03                      Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/auditlogs, numFiles=31, columns=[`operation`], files=[maprfs:/tmp/auditlogs/DBAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-04-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-03-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-07-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-08-001.json]]]) : rowType = RecordType(ANY operation): rowcount = 1.4143716E7, cumulative cost = {1.4143716E7 rows, 1.4143716E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7563

Another query that is exactly like the failing query reported here, this one returns correct results though.

0: jdbc:drill:schema=dfs.tmp> select distinct t.key2 from `twoKeyJsn.json` t;
+-------+
| key2  |
+-------+
| d     |
| c     |
| b     |
| 1     |
| a     |
| 0     |
| k     |
| m     |
| j     |
| h     |
| e     |
| n     |
| g     |
| f     |
| l     |
| i     |
+-------+
16 rows selected (27.097 seconds)

Attachments

Issue Links

is related to

DRILL-5546 Schema change problems caused by empty batch

Resolved

SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION

Details

Description

Attachments

Issue Links

Activity

People

Dates