Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.6.1
-
None
Description
I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0
I have this UDAF and have included it in the classpath of spark https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java
And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")
However, when I call it in Spark in the following CREATE VIEW query
CREATE VIEW VIEW_1 AS SELECT a.A, a.B, maxrow ( a.C, a.D, a.E, a.F, a.G, a.H, a.I ) as m FROM table_1 a JOIN table_2 b ON b.Z = a.D AND b.Y = a.C JOIN dummy_table GROUP BY a.A, a.B
It gave me the following error
16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string 16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint 16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C' org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C' at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)
Running the query without CREATE VIEW is fine.