Details
Description
In a 'spark.pyspark' paragraph a StreamingContext to a Kafka stream is created.
The paragraph is started and while the job is running the spark context produces correct output from the code.
The problem is the job cannot be stopped in the Zeppelin web interface.
Installed Kafka version: kafka_2.11-0.8.2.2
Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
Zeppelin: zeppelin-0.7.0-bin-all
Tried:
1. Paragraph Cancel ( || button ) has no effect.
2. Zeppelin Job view Stop All has no effect
3. Another paragraph with
%spark.pyspark
ssc.stop(stopSparkContext=false, stopGracefully=true)
is started by stays in 'Pending'
4. Restarting the 'spark' interpreter stops the job
The example logic:
%spark.pyspark
import sys
import json
from pyspark import SparkContext
from pyspark.streaming import StreamingContext(II())
from pyspark.streaming.kafka import KafkaUtils
zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)
ssc = StreamingContext(sc, interval)
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",
{topic: 1})
parsed = kvs.map(lambda (k, v): json.loads(v))
summed = parsed.\
filter(lambda event: 'kind' in event and event['kind']=='gate').\
map(lambda event: ('count_all', int(event['value']['passengers']))).\
reduceByKey(lambda x,y: x + y).\
map(lambda x:
)
summed.pprint()
ssc.start()
ssc.awaitTermination()