Details
Description
Create table with partitions
run query for partition which doesn't exist and contains order by and limit
I am running queries in hiveContext
1. Create hive table
create table if not exists testTable (ID1 BIGINT, ID2 BIGINT,Start_Time STRING, End_Time STRING) PARTITIONED BY (Region STRING,Market STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
2. Create data
1,2,"2014-11-01","2014-11-02" 2,3,"2014-11-01","2014-11-02" 3,4,"2014-11-01","2014-11-02"
3. Load data in hive
LOAD DATA LOCAL INPATH '/tmp/input.txt' OVERWRITE INTO TABLE testTable PARTITION (Region="North", market='market1');
4. run query
SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100;
Error trace
java.lang.UnsupportedOperationException: empty collection
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:863)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136)
at org.apache.spark.sql.execution.TakeOrdered.executeCollect(basicOperators.scala:171)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
Attachments
Issue Links
- links to