[KUDU-2210] Apache Spark stucks while reading Kudu table. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: None
Fix Version/s: None
Component/s: client, perf, spark
Labels:
None

Target Version/s:

1.2.0

Description

When I try reading Kudu table with Apache Spark using following code

import org.apache.kudu.spark.kudu._
import sqlContext.implicits._
val kuduOptions: Map[String, String] = Map(
"kudu.table"  -> "test_table", 
"kudu.master" -> "host1:7051,host2:7051,host3:7051")
val kuduDF = sqlContext.read.options(kuduOptions).kudu
kuduDF.registerTempTable("t")
sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)

after completing 95% of tasks the job stucks for more than three days. The table is partitioned by date and partitions have uneven size. Table have one partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and some partitions with Mb's and kb's of data.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Ya

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Nov/17 11:29

Updated:: 27/Nov/17 07:37

Resolved:: 27/Nov/17 07:37