[SPARK-4968] [SparkSQL] java.lang.UnsupportedOperationException when hive partition doesn't exist and order by and limit are used - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.1
Fix Version/s: 1.2.1, 1.3.0
Component/s: SQL
Labels:
None
Environment:

Spark 1.1.1
scala - 2.10.2
hive metastore db - pgsql
OS- Linux

Description

Create table with partitions
run query for partition which doesn't exist and contains order by and limit

I am running queries in hiveContext

1. Create hive table

create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

2. Create data

1,2,"2014-11-01","2014-11-02"
2,3,"2014-11-01","2014-11-02"
3,4,"2014-11-01","2014-11-02"

3. Load data in hive

LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region="North", market='market1');

4. run query

SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;


Error trace
java.lang.UnsupportedOperationException: empty collection
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
	at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
	at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
	at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171)
	at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)

Attachments

Issue Links

links to

[Github] Pull Request #3830 (saucam)

Activity

People

Assignee:: Unassigned

Reporter:: Shekhar Bansal

Votes:: 2 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Dec/14 16:20

Updated:: 29/Dec/14 21:50

Resolved:: 29/Dec/14 21:50