Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Not able to read kafka topic data in spark job

avatar
Contributor

I am running kafka and spark in seperate cluster.

I am writing data to a particular topic in kafka and trying to run a python code for kafka work count(basic example in spark).

But I am not able to count the word.

After submitting the job I am not getting any error.. job is succesfully submitted also getting some Info logging. in between the Info logging. I am getting 

------------------------------------------------

Time: and Date

------------------------------------------------

 

after this time string, I should get the word count of the Kafka topic.

3 REPLIES 3

avatar
Master Collaborator

Hi @Nick

 

Yes, you should get a count of the words. Something like this:

 

-------------------------------------------
Time: 2018-05-11 01:05:20
-------------------------------------------
(u'', 160)
...

To start with, please let us know if you are using kerberos on either of the clusters?

 

Next, can you help confirm you can read the kafka topic data using a kafka-console-consumer command from the kafka cluster?

 

Next, can you verify (the host from where you are running spark job) that you can reach out to the zookeeper on the kafka cluster (using ping and nc on port 2181).

 

Lastly, please double check that you have the topic name listed correctly and the ZK quorum in the spark(2)-submit command line. 

 

For comparison, I am sharing the same exercise from my cluster, one running Spark and other Kafka (however note both are using SIMPLE authentication i.e non kerberized).

 

Kafka-Cluster
=========
[systest@nightly511 tmp]$ kafka-topics --create --zookeeper localhost:2181 --topic wordcounttopic --partitions 1 --replication-factor 3
....
Created topic "wordcounttopic".

[systest@nightly511-unsecure-1 tmp]$ vmstat 1 | kafka-console-producer --broker-list `hostname`:9092 --topic wordcounttopic

Spark- Cluster
===========
[user1@host-10-17-101-208 ~]$ vi kafka_wordcount.py
from __future__ import print_function import sys from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr) exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount") ssc = StreamingContext(sc, 10) zkQuorum, topic = sys.argv[1:] kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1}) lines = kvs.map(lambda x: x[1]) counts = lines.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b) counts.pprint() ssc.start() ssc.awaitTermination() [user1@host-10-17-101-208 ~]$ spark2-submit --master yarn --deploy-mode client --conf "spark.dynamicAllocation.enabled=false" --jars /opt/cloudera/parcels/SPARK2/lib/spark2/examples/jars/spark-examples_*.jar kafka_wordcount.py nightly511:2181 wordcounttopic Notice the last 2 arguments are the ZK(hostname/URL) in the kafka cluster and the kafka-topic name in the kafka cluster. 18/05/11 01:04:55 INFO cluster.YarnClientSchedulerBackend: Application application_1525758910545_0024 has started running. 18/05/11 01:05:21 INFO scheduler.DAGScheduler: ResultStage 4 (runJob at PythonRDD.scala:446) finished in 0.125 s 18/05/11 01:05:21 INFO scheduler.DAGScheduler: Job 2 finished: runJob at PythonRDD.scala:446, took 1.059940 s ------------------------------------------- Time: 2018-05-11 01:05:20 ------------------------------------------- (u'', 160) (u'216', 1) (u'13', 1) (u'15665', 1) (u'28', 1) (u'17861', 1) (u'872', 6) (u'3', 5) (u'8712', 1) (u'5', 1) ... 18/05/11 01:05:21 INFO scheduler.JobScheduler: Finished job streaming job 1526025920000 ms.0 from job set of time 1526025920000 ms 18/05/11 01:05:21 INFO scheduler.JobScheduler: Total delay: 1.625 s for time 1526025920000 ms (execution: 1.128 s)

Let us know if you find any differences and manage to get it working. If it's still not working, let us know that too.

 

Good Luck!

avatar
Contributor

Hey thanx for the reply.

Objective:-

Read Kafka topic message and do some processing on that message.

I have 1 spark cluster and 1 Kafka cluster.

I am able to create a topic and also put the message on that topic.

I am also able to see the message on that topic.

Both clusters are reachable from each other.

 

But when I am deploying the spark job with this command:

 sudo bin/spark-submit  --jars /usr/lib/spark/spark-2.0.1-bin-hadoop2.6/jars/spark-streaming-kafka-0-8-assembly_2.11-2.0.1.jar /usr/lib/spark/spark-2.0.1-bin-hadoop2.6/bin/kafka_wordcount.py bluedata-33.bdlocal:2080 wordcounttopic

 

No output and error is coming only Info Log are coming.

 

Can you guide me or provide some tutorial for setting up the cluster and setting all env path.

So that I can do what I want.

 

I also try your command but ended with an error.

Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

 

Please help me out from this.

 

 My Output:

/usr/lib/spark/spark-2.0.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/context.py:487: DeprecationWarning: HiveContext is deprecated in Spark 2.0.0. Please use SparkSession.builder.enableHiveSupport().getOrCreate() instead.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/05/10 23:34:47 INFO SparkContext: Running Spark version 2.0.1
18/05/10 23:34:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/10 23:34:48 INFO SecurityManager: Changing view acls to: root
18/05/10 23:34:48 INFO SecurityManager: Changing modify acls to: root
18/05/10 23:34:48 INFO SecurityManager: Changing view acls groups to:
18/05/10 23:34:48 INFO SecurityManager: Changing modify acls groups to:
18/05/10 23:34:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/05/10 23:34:49 INFO Utils: Successfully started service 'sparkDriver' on port 34448.
18/05/10 23:34:49 INFO SparkEnv: Registering MapOutputTracker
18/05/10 23:34:49 INFO SparkEnv: Registering BlockManagerMaster
18/05/10 23:34:49 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5013911b-cd4a-4ed1-9299-fde4bdef6b1c
18/05/10 23:34:49 INFO MemoryStore: MemoryStore started with capacity 408.9 MB
18/05/10 23:34:49 INFO SparkEnv: Registering OutputCommitCoordinator
18/05/10 23:34:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/05/10 23:34:49 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.18.0.21:4040
18/05/10 23:34:49 INFO SparkContext: Added JAR file:/usr/lib/spark/spark-2.0.1-bin-hadoop2.6/jars/spark-streaming-kafka-0-8-assembly_2.11-2.0.1.jar at spark://172.18.0.21:34448/jars/spark-streaming-kafka-0-8-assembly_2.11-2.0.1.jar with timestamp 1526009689849
18/05/10 23:34:50 INFO SparkContext: Added file file:/usr/lib/spark/spark-2.0.1-bin-hadoop2.6/bin/kafka_wordcount.py at spark://172.18.0.21:34448/files/kafka_wordcount.py with timestamp 1526009689999
18/05/10 23:34:50 INFO Utils: Copying /usr/lib/spark/spark-2.0.1-bin-hadoop2.6/bin/kafka_wordcount.py to /tmp/spark-afd2b1b3-20e3-4c6e-be4a-114db19605fd/userFiles-a917bd4a-3297-491a-9bf0-5b4bb4d73e7b/kafka_wordcount.py
18/05/10 23:34:50 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://bluedata-48.bdlocal:7077...
18/05/10 23:34:50 INFO TransportClientFactory: Successfully created connection to bluedata-48.bdlocal/172.18.0.21:7077 after 38 ms (0 ms spent in bootstraps)
18/05/10 23:34:50 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180510233450-0019
18/05/10 23:34:50 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180510233450-0019/0 on worker-20180502064457-172.18.0.21-40240 (172.18.0.21:40240) with 4 cores
18/05/10 23:34:50 INFO StandaloneSchedulerBackend: Granted executor ID app-20180510233450-0019/0 on hostPort 172.18.0.21:40240 with 4 cores, 1024.0 MB RAM
18/05/10 23:34:50 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180510233450-0019/1 on worker-20180505081919-172.18.0.22-38306 (172.18.0.22:38306) with 4 cores
18/05/10 23:34:50 INFO StandaloneSchedulerBackend: Granted executor ID app-20180510233450-0019/1 on hostPort 172.18.0.22:38306 with 4 cores, 1024.0 MB RAM
18/05/10 23:34:50 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40602.
18/05/10 23:34:50 INFO NettyBlockTransferService: Server created on 172.18.0.21:40602
18/05/10 23:34:50 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.18.0.21, 40602)
18/05/10 23:34:50 INFO BlockManagerMasterEndpoint: Registering block manager 172.18.0.21:40602 with 408.9 MB RAM, BlockManagerId(driver, 172.18.0.21, 40602)
18/05/10 23:34:50 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180510233450-0019/1 is now RUNNING
18/05/10 23:34:50 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180510233450-0019/0 is now RUNNING
18/05/10 23:34:50 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.18.0.21, 40602)
18/05/10 23:34:50 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/05/10 23:34:51 INFO SparkContext: Starting job: start at NativeMethodAccessorImpl.java:-2
18/05/10 23:34:51 INFO DAGScheduler: Registering RDD 1 (start at NativeMethodAccessorImpl.java:-2)
18/05/10 23:34:51 INFO DAGScheduler: Got job 0 (start at NativeMethodAccessorImpl.java:-2) with 20 output partitions
18/05/10 23:34:51 INFO DAGScheduler: Final stage: ResultStage 1 (start at NativeMethodAccessorImpl.java:-2)
18/05/10 23:34:51 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/05/10 23:34:51 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/05/10 23:34:52 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[1] at start at NativeMethodAccessorImpl.java:-2), which has no missing parents
18/05/10 23:34:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KB, free 408.9 MB)
18/05/10 23:34:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2001.0 B, free 408.9 MB)
18/05/10 23:34:52 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.18.0.21:40602 (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:52 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:52 INFO DAGScheduler: Submitting 50 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[1] at start at NativeMethodAccessorImpl.java:-2)
18/05/10 23:34:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 50 tasks
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.18.0.22:35050) with ID 1
18/05/10 23:34:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.18.0.22, partition 0, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.18.0.22, partition 1, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 172.18.0.22, partition 2, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 172.18.0.22, partition 3, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 0 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 1 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 2 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 3 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.18.0.21:56426) with ID 0
18/05/10 23:34:54 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 172.18.0.21, partition 4, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 172.18.0.21, partition 5, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO BlockManagerMasterEndpoint: Registering block manager 172.18.0.22:40802 with 408.9 MB RAM, BlockManagerId(1, 172.18.0.22, 40802)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 172.18.0.21, partition 6, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, 172.18.0.21, partition 7, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 4 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 5 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 6 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 7 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:54 INFO BlockManagerMasterEndpoint: Registering block manager 172.18.0.21:37754 with 408.9 MB RAM, BlockManagerId(0, 172.18.0.21, 37754)
18/05/10 23:34:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.18.0.22:40802 (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.18.0.21:37754 (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, 172.18.0.22, partition 8, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 8 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, 172.18.0.22, partition 9, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 9 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, 172.18.0.22, partition 10, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 10 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1384 ms on 172.18.0.22 (1/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 1337 ms on 172.18.0.22 (2/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1360 ms on 172.18.0.22 (3/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, 172.18.0.22, partition 11, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 11 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 1372 ms on 172.18.0.22 (4/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, 172.18.0.22, partition 12, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 12 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 😎 in 176 ms on 172.18.0.22 (5/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, 172.18.0.22, partition 13, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 13 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, 172.18.0.22, partition 14, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 14 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 191 ms on 172.18.0.22 (6/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 197 ms on 172.18.0.22 (7/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, 172.18.0.22, partition 15, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 15 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 168 ms on 172.18.0.22 (8/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, 172.18.0.22, partition 16, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 16 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 51 ms on 172.18.0.22 (9/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, 172.18.0.21, partition 17, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 17 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 1524 ms on 172.18.0.21 (10/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, 172.18.0.21, partition 18, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 18 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 1518 ms on 172.18.0.21 (11/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, 172.18.0.21, partition 19, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 19 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, 172.18.0.21, partition 20, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 20 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 1538 ms on 172.18.0.21 (12/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, 172.18.0.22, partition 21, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 21 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 63 ms on 172.18.0.22 (13/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22, 172.18.0.22, partition 22, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 22 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 23.0 in stage 0.0 (TID 23, 172.18.0.22, partition 23, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 23 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 1569 ms on 172.18.0.21 (14/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 91 ms on 172.18.0.22 (15/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 72 ms on 172.18.0.22 (16/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 24.0 in stage 0.0 (TID 24, 172.18.0.21, partition 24, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 24 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 25.0 in stage 0.0 (TID 25, 172.18.0.21, partition 25, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 25 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 17.0 in stage 0.0 (TID 17) in 74 ms on 172.18.0.21 (17/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 18.0 in stage 0.0 (TID 18) in 71 ms on 172.18.0.21 (18/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 26.0 in stage 0.0 (TID 26, 172.18.0.22, partition 26, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 26 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 27.0 in stage 0.0 (TID 27, 172.18.0.21, partition 27, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 27 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 90 ms on 172.18.0.22 (19/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 19.0 in stage 0.0 (TID 19) in 70 ms on 172.18.0.21 (20/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 28.0 in stage 0.0 (TID 28, 172.18.0.22, partition 28, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 28 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 29.0 in stage 0.0 (TID 29, 172.18.0.22, partition 29, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 29 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 21.0 in stage 0.0 (TID 21) in 67 ms on 172.18.0.22 (21/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 30.0 in stage 0.0 (TID 30, 172.18.0.22, partition 30, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 30 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) in 63 ms on 172.18.0.22 (22/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 23.0 in stage 0.0 (TID 23) in 59 ms on 172.18.0.22 (23/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 31.0 in stage 0.0 (TID 31, 172.18.0.21, partition 31, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 31 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 20.0 in stage 0.0 (TID 20) in 86 ms on 172.18.0.21 (24/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 32.0 in stage 0.0 (TID 32, 172.18.0.21, partition 32, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 25.0 in stage 0.0 (TID 25) in 51 ms on 172.18.0.21 (25/50)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 32 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 33.0 in stage 0.0 (TID 33, 172.18.0.22, partition 33, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 33 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 28.0 in stage 0.0 (TID 28) in 48 ms on 172.18.0.22 (26/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 34.0 in stage 0.0 (TID 34, 172.18.0.22, partition 34, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 34 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 26.0 in stage 0.0 (TID 26) in 63 ms on 172.18.0.22 (27/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 35.0 in stage 0.0 (TID 35, 172.18.0.21, partition 35, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 35 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 36.0 in stage 0.0 (TID 36, 172.18.0.22, partition 36, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 36 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 29.0 in stage 0.0 (TID 29) in 59 ms on 172.18.0.22 (28/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 27.0 in stage 0.0 (TID 27) in 68 ms on 172.18.0.21 (29/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 37.0 in stage 0.0 (TID 37, 172.18.0.21, partition 37, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) in 88 ms on 172.18.0.21 (30/50)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 37 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 38.0 in stage 0.0 (TID 38, 172.18.0.21, partition 38, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 38 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 31.0 in stage 0.0 (TID 31) in 59 ms on 172.18.0.21 (31/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 39.0 in stage 0.0 (TID 39, 172.18.0.22, partition 39, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 39 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 40.0 in stage 0.0 (TID 40, 172.18.0.21, partition 40, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 40 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 30.0 in stage 0.0 (TID 30) in 86 ms on 172.18.0.22 (32/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 32.0 in stage 0.0 (TID 32) in 65 ms on 172.18.0.21 (33/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 41.0 in stage 0.0 (TID 41, 172.18.0.21, partition 41, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 41 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 42.0 in stage 0.0 (TID 42, 172.18.0.22, partition 42, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 42 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 35.0 in stage 0.0 (TID 35) in 53 ms on 172.18.0.21 (34/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 34.0 in stage 0.0 (TID 34) in 60 ms on 172.18.0.22 (35/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 43.0 in stage 0.0 (TID 43, 172.18.0.22, partition 43, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 43 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) in 67 ms on 172.18.0.22 (36/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 44.0 in stage 0.0 (TID 44, 172.18.0.21, partition 44, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 44 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 37.0 in stage 0.0 (TID 37) in 54 ms on 172.18.0.21 (37/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 45.0 in stage 0.0 (TID 45, 172.18.0.21, partition 45, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 45 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 38.0 in stage 0.0 (TID 38) in 66 ms on 172.18.0.21 (38/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 46.0 in stage 0.0 (TID 46, 172.18.0.22, partition 46, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 46 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Starting task 47.0 in stage 0.0 (TID 47, 172.18.0.22, partition 47, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 47 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 36.0 in stage 0.0 (TID 36) in 97 ms on 172.18.0.22 (39/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 39.0 in stage 0.0 (TID 39) in 80 ms on 172.18.0.22 (40/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 48.0 in stage 0.0 (TID 48, 172.18.0.21, partition 48, PROCESS_LOCAL, 6393 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 48 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 71 ms on 172.18.0.21 (41/50)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 49.0 in stage 0.0 (TID 49, 172.18.0.22, partition 49, PROCESS_LOCAL, 6450 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 49 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO TaskSetManager: Finished task 42.0 in stage 0.0 (TID 42) in 68 ms on 172.18.0.22 (42/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 43.0 in stage 0.0 (TID 43) in 71 ms on 172.18.0.22 (43/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 41.0 in stage 0.0 (TID 41) in 85 ms on 172.18.0.21 (44/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 44.0 in stage 0.0 (TID 44) in 70 ms on 172.18.0.21 (45/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 45.0 in stage 0.0 (TID 45) in 61 ms on 172.18.0.21 (46/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 48.0 in stage 0.0 (TID 48) in 41 ms on 172.18.0.21 (47/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 46.0 in stage 0.0 (TID 46) in 61 ms on 172.18.0.22 (48/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 47.0 in stage 0.0 (TID 47) in 61 ms on 172.18.0.22 (49/50)
18/05/10 23:34:55 INFO TaskSetManager: Finished task 49.0 in stage 0.0 (TID 49) in 51 ms on 172.18.0.22 (50/50)
18/05/10 23:34:55 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/05/10 23:34:55 INFO DAGScheduler: ShuffleMapStage 0 (start at NativeMethodAccessorImpl.java:-2) finished in 3.644 s
18/05/10 23:34:55 INFO DAGScheduler: looking for newly runnable stages
18/05/10 23:34:55 INFO DAGScheduler: running: Set()
18/05/10 23:34:55 INFO DAGScheduler: waiting: Set(ResultStage 1)
18/05/10 23:34:55 INFO DAGScheduler: failed: Set()
18/05/10 23:34:55 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[2] at start at NativeMethodAccessorImpl.java:-2), which has no missing parents
18/05/10 23:34:55 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.2 KB, free 408.9 MB)
18/05/10 23:34:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1969.0 B, free 408.9 MB)
18/05/10 23:34:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.18.0.21:40602 (size: 1969.0 B, free: 408.9 MB)
18/05/10 23:34:55 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:55 INFO DAGScheduler: Submitting 20 missing tasks from ResultStage 1 (ShuffledRDD[2] at start at NativeMethodAccessorImpl.java:-2)
18/05/10 23:34:55 INFO TaskSchedulerImpl: Adding task set 1.0 with 20 tasks
18/05/10 23:34:55 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 50, 172.18.0.22, partition 0, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 51, 172.18.0.21, partition 1, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID 52, 172.18.0.22, partition 2, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID 53, 172.18.0.21, partition 5, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID 54, 172.18.0.22, partition 3, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID 55, 172.18.0.21, partition 6, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID 56, 172.18.0.22, partition 4, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID 57, 172.18.0.21, partition 7, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 50 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 52 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 54 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 56 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 51 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 53 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 55 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 57 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.18.0.22:40802 (size: 1969.0 B, free: 408.9 MB)
18/05/10 23:34:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.18.0.21:37754 (size: 1969.0 B, free: 408.9 MB)
18/05/10 23:34:56 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.18.0.21:56426
18/05/10 23:34:56 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 172.18.0.22:35050
18/05/10 23:34:56 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 305 bytes
18/05/10 23:34:56 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.18.0.21:40602 in memory (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:56 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.18.0.22:40802 in memory (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:56 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.18.0.21:37754 in memory (size: 2001.0 B, free: 408.9 MB)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID 58, 172.18.0.22, partition 8, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 58 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID 54) in 273 ms on 172.18.0.22 (1/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID 59, 172.18.0.22, partition 9, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 59 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID 56) in 276 ms on 172.18.0.22 (2/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID 60, 172.18.0.21, partition 12, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 60 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 6.0 in stage 1.0 (TID 55) in 280 ms on 172.18.0.21 (3/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID 61, 172.18.0.21, partition 13, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 5.0 in stage 1.0 (TID 53) in 286 ms on 172.18.0.21 (4/20)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 61 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO TaskSetManager: Starting task 16.0 in stage 1.0 (TID 62, 172.18.0.21, partition 16, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 62 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 51) in 334 ms on 172.18.0.21 (5/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID 63, 172.18.0.22, partition 10, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 63 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID 64, 172.18.0.22, partition 11, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 64 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID 52) in 349 ms on 172.18.0.22 (6/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 50) in 357 ms on 172.18.0.22 (7/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID 65, 172.18.0.21, partition 18, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 65 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID 66, 172.18.0.21, partition 19, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 66 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO TaskSetManager: Starting task 14.0 in stage 1.0 (TID 67, 172.18.0.22, partition 14, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 67 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID 68, 172.18.0.22, partition 15, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 68 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 12.0 in stage 1.0 (TID 60) in 81 ms on 172.18.0.21 (8/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 13.0 in stage 1.0 (TID 61) in 80 ms on 172.18.0.21 (9/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 7.0 in stage 1.0 (TID 57) in 361 ms on 172.18.0.21 (10/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 8.0 in stage 1.0 (TID 58) in 104 ms on 172.18.0.22 (11/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 9.0 in stage 1.0 (TID 59) in 101 ms on 172.18.0.22 (12/20)
18/05/10 23:34:56 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID 69, 172.18.0.22, partition 17, NODE_LOCAL, 6200 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 69 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:56 INFO TaskSetManager: Finished task 14.0 in stage 1.0 (TID 67) in 41 ms on 172.18.0.22 (13/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 18.0 in stage 1.0 (TID 65) in 50 ms on 172.18.0.21 (14/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 15.0 in stage 1.0 (TID 68) in 43 ms on 172.18.0.22 (15/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 16.0 in stage 1.0 (TID 62) in 84 ms on 172.18.0.21 (16/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 11.0 in stage 1.0 (TID 64) in 76 ms on 172.18.0.22 (17/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 19.0 in stage 1.0 (TID 66) in 70 ms on 172.18.0.21 (18/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 10.0 in stage 1.0 (TID 63) in 80 ms on 172.18.0.22 (19/20)
18/05/10 23:34:56 INFO TaskSetManager: Finished task 17.0 in stage 1.0 (TID 69) in 35 ms on 172.18.0.22 (20/20)
18/05/10 23:34:56 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/05/10 23:34:56 INFO DAGScheduler: ResultStage 1 (start at NativeMethodAccessorImpl.java:-2) finished in 0.441 s
18/05/10 23:34:56 INFO DAGScheduler: Job 0 finished: start at NativeMethodAccessorImpl.java:-2, took 4.753260 s
18/05/10 23:34:56 INFO ReceiverTracker: Starting 1 receivers
18/05/10 23:34:56 INFO ReceiverTracker: ReceiverTracker started
18/05/10 23:34:56 INFO KafkaInputDStream: Slide time = 1000 ms
18/05/10 23:34:56 INFO KafkaInputDStream: Storage level = Serialized 1x Replicated
18/05/10 23:34:56 INFO KafkaInputDStream: Checkpoint interval = null
18/05/10 23:34:56 INFO KafkaInputDStream: Remember interval = 1000 ms
18/05/10 23:34:56 INFO KafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.KafkaInputDStream@6a32f4a1
18/05/10 23:34:56 INFO PythonTransformedDStream: Slide time = 1000 ms
18/05/10 23:34:56 INFO PythonTransformedDStream: Storage level = Serialized 1x Replicated
18/05/10 23:34:56 INFO PythonTransformedDStream: Checkpoint interval = null
18/05/10 23:34:56 INFO PythonTransformedDStream: Remember interval = 1000 ms
18/05/10 23:34:56 INFO PythonTransformedDStream: Initialized and validated org.apache.spark.streaming.api.python.PythonTransformedDStream@50e32434
18/05/10 23:34:56 INFO ForEachDStream: Slide time = 1000 ms
18/05/10 23:34:56 INFO ForEachDStream: Storage level = Serialized 1x Replicated
18/05/10 23:34:56 INFO ForEachDStream: Checkpoint interval = null
18/05/10 23:34:56 INFO ForEachDStream: Remember interval = 1000 ms
18/05/10 23:34:56 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@5b45a48e
18/05/10 23:34:56 INFO RecurringTimer: Started timer for JobGenerator at time 1526009697000
18/05/10 23:34:56 INFO JobGenerator: Started JobGenerator at 1526009697000 ms
18/05/10 23:34:56 INFO JobScheduler: Started JobScheduler
18/05/10 23:34:56 INFO StreamingContext: StreamingContext started
18/05/10 23:34:56 INFO DAGScheduler: Got job 1 (start at NativeMethodAccessorImpl.java:-2) with 1 output partitions
18/05/10 23:34:56 INFO DAGScheduler: Final stage: ResultStage 2 (start at NativeMethodAccessorImpl.java:-2)
18/05/10 23:34:56 INFO DAGScheduler: Parents of final stage: List()
18/05/10 23:34:56 INFO DAGScheduler: Missing parents: List()
18/05/10 23:34:56 INFO DAGScheduler: Submitting ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:610), which has no missing parents
18/05/10 23:34:56 INFO ReceiverTracker: Receiver 0 started
18/05/10 23:34:56 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 61.7 KB, free 408.8 MB)
18/05/10 23:34:56 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 21.3 KB, free 408.8 MB)
18/05/10 23:34:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.18.0.21:40602 (size: 21.3 KB, free: 408.9 MB)
18/05/10 23:34:56 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (Receiver 0 ParallelCollectionRDD[3] at makeRDD at ReceiverTracker.scala:610)
18/05/10 23:34:56 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
18/05/10 23:34:56 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, 172.18.0.21, partition 0, PROCESS_LOCAL, 7307 bytes)
18/05/10 23:34:56 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 70 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 172.18.0.21:37754 (size: 21.3 KB, free: 408.9 MB)
18/05/10 23:34:56 INFO ReceiverTracker: Registered receiver for stream 0 from 172.18.0.21:56426
18/05/10 23:34:57 INFO JobScheduler: Added jobs for time 1526009697000 ms
18/05/10 23:34:57 INFO JobScheduler: Starting job streaming job 1526009697000 ms.0 from job set of time 1526009697000 ms
18/05/10 23:34:57 INFO SparkContext: Starting job: runJob at PythonRDD.scala:441
18/05/10 23:34:57 INFO DAGScheduler: Registering RDD 6 (call at /usr/lib/spark/spark-2.0.1-bin-hadoop2.6/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230)
18/05/10 23:34:57 INFO DAGScheduler: Got job 2 (runJob at PythonRDD.scala:441) with 1 output partitions
18/05/10 23:34:57 INFO DAGScheduler: Final stage: ResultStage 4 (runJob at PythonRDD.scala:441)
18/05/10 23:34:57 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
18/05/10 23:34:57 INFO DAGScheduler: Missing parents: List()
18/05/10 23:34:57 INFO DAGScheduler: Submitting ResultStage 4 (PythonRDD[10] at RDD at PythonRDD.scala:48), which has no missing parents
18/05/10 23:34:57 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 7.1 KB, free 408.8 MB)
18/05/10 23:34:57 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 4.1 KB, free 408.8 MB)
18/05/10 23:34:57 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.18.0.21:40602 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:57 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:57 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (PythonRDD[10] at RDD at PythonRDD.scala:48)
18/05/10 23:34:57 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
18/05/10 23:34:57 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 71, 172.18.0.21, partition 0, PROCESS_LOCAL, 5248 bytes)
18/05/10 23:34:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 71 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:57 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 172.18.0.21:37754 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:57 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 172.18.0.21:56426
18/05/10 23:34:57 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 83 bytes
18/05/10 23:34:57 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 71) in 590 ms on 172.18.0.21 (1/1)
18/05/10 23:34:57 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
18/05/10 23:34:57 INFO DAGScheduler: ResultStage 4 (runJob at PythonRDD.scala:441) finished in 0.594 s
18/05/10 23:34:57 INFO DAGScheduler: Job 2 finished: runJob at PythonRDD.scala:441, took 0.622861 s
18/05/10 23:34:57 INFO SparkContext: Starting job: runJob at PythonRDD.scala:441
18/05/10 23:34:57 INFO DAGScheduler: Got job 3 (runJob at PythonRDD.scala:441) with 1 output partitions
18/05/10 23:34:57 INFO DAGScheduler: Final stage: ResultStage 6 (runJob at PythonRDD.scala:441)
18/05/10 23:34:57 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 5)
18/05/10 23:34:57 INFO DAGScheduler: Missing parents: List()
18/05/10 23:34:57 INFO DAGScheduler: Submitting ResultStage 6 (PythonRDD[11] at RDD at PythonRDD.scala:48), which has no missing parents
18/05/10 23:34:57 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.1 KB, free 408.8 MB)
18/05/10 23:34:57 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.1 KB, free 408.8 MB)
18/05/10 23:34:57 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.18.0.21:40602 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:57 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:57 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 6 (PythonRDD[11] at RDD at PythonRDD.scala:48)
18/05/10 23:34:57 INFO TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
18/05/10 23:34:57 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 72, 172.18.0.21, partition 1, PROCESS_LOCAL, 5248 bytes)
18/05/10 23:34:57 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 72 on executor id: 0 hostname: 172.18.0.21.
18/05/10 23:34:57 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 172.18.0.21:37754 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:57 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 72) in 140 ms on 172.18.0.21 (1/1)
18/05/10 23:34:57 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
18/05/10 23:34:57 INFO DAGScheduler: ResultStage 6 (runJob at PythonRDD.scala:441) finished in 0.143 s
18/05/10 23:34:57 INFO DAGScheduler: Job 3 finished: runJob at PythonRDD.scala:441, took 0.159626 s
-------------------------------------------
Time: 2018-05-10 23:34:57
-------------------------------------------

18/05/10 23:34:58 INFO JobScheduler: Finished job streaming job 1526009697000 ms.0 from job set of time 1526009697000 ms
18/05/10 23:34:58 INFO JobScheduler: Total delay: 1.032 s for time 1526009697000 ms (execution: 0.869 s)
18/05/10 23:34:58 INFO JobScheduler: Added jobs for time 1526009698000 ms
18/05/10 23:34:58 INFO JobScheduler: Starting job streaming job 1526009698000 ms.0 from job set of time 1526009698000 ms
18/05/10 23:34:58 INFO ReceivedBlockTracker: Deleting batches:
18/05/10 23:34:58 INFO InputInfoTracker: remove old batch metadata:
18/05/10 23:34:58 INFO SparkContext: Starting job: runJob at PythonRDD.scala:441
18/05/10 23:34:58 INFO DAGScheduler: Registering RDD 14 (call at /usr/lib/spark/spark-2.0.1-bin-hadoop2.6/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py:2230)
18/05/10 23:34:58 INFO DAGScheduler: Got job 4 (runJob at PythonRDD.scala:441) with 1 output partitions
18/05/10 23:34:58 INFO DAGScheduler: Final stage: ResultStage 8 (runJob at PythonRDD.scala:441)
18/05/10 23:34:58 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 7)
18/05/10 23:34:58 INFO DAGScheduler: Missing parents: List()
18/05/10 23:34:58 INFO DAGScheduler: Submitting ResultStage 8 (PythonRDD[18] at RDD at PythonRDD.scala:48), which has no missing parents
18/05/10 23:34:58 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 7.1 KB, free 408.8 MB)
18/05/10 23:34:58 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 4.1 KB, free 408.8 MB)
18/05/10 23:34:58 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 172.18.0.21:40602 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:58 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1012
18/05/10 23:34:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 8 (PythonRDD[18] at RDD at PythonRDD.scala:48)
18/05/10 23:34:58 INFO TaskSchedulerImpl: Adding task set 8.0 with 1 tasks
18/05/10 23:34:58 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 73, 172.18.0.22, partition 0, PROCESS_LOCAL, 5248 bytes)
18/05/10 23:34:58 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 73 on executor id: 1 hostname: 172.18.0.22.
18/05/10 23:34:58 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 172.18.0.22:40802 (size: 4.1 KB, free: 408.9 MB)
18/05/10 23:34:58 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 172.18.0.22:35050
18/05/10 23:34:58 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 2 is 83 bytes
18/05/10 23:34:58 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID 73) in 674 ms on 172.18.0.22 (1/1)
18/05/10 23:34:58 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool
18/05/10 23:34:58 INFO DAGScheduler: ResultStage 8 (runJob at PythonRDD.scala:441) finished in 0.676 s
18/05/10 23:34:58 INFO DAGScheduler: Job 4 finished: runJob at PythonRDD.scala:441, took 0.698652 s
18/05/10 23:34:58 INFO SparkContext: Starting job: runJob at PythonRDD.scala:441
18/05/10 23:34:58 INFO DAGScheduler: Got job 5 (runJob at PythonRDD.scala:441) with 1 output partitions
18/05/10 23:34:58 INFO DAGScheduler: Final stage: ResultStage 10 (runJob at PythonRDD.scala:441)
18/05/10 23:34:58 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 9)
18/05/10 23:34:58 INFO DAGScheduler: Missing parents: List()
18/05/10 23:34:58 INFO DAGScheduler: Submitting ResultStage 10 (PythonRDD[19] at RDD at PythonRDD.scala:48), which has no missing parents

avatar
New Contributor

Hi Nick,

Could you solve the problem?
I have the same problem.

Thanks