Member since
06-16-2017
14
Posts
0
Kudos Received
0
Solutions
06-26-2018
05:33 PM
Hi @dbains, I Checked the kafka broker port on my HDP cluster and it was 6667. Although I was confused between Bootstrap Server and Kafka Broker, I was giving zookeeper_ip:2181. It worked after passing ip-address:6667.
... View more
06-22-2018
07:23 AM
While running "structured_kafka_wordcount.py" example given in "https://github.com/apache/spark/blob/v2.3.1/examples/src/main/python/sql/streaming/structured_kafka_wordcount.py" I got following error: "WARN NetworkClient: Bootstrap broker ip-10-28-3-35.ec2.internal:2181 disconnected" I was able to read the content of a topic from kafka as given it the example https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py I submitted these job with command "bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1 examples/src/main/python/sql/streaming/structured_kafka_wordcount.py ip-10-28-3-35.ec2.internal:2181 subscribe fifa2" <code>[root@centos spark2]# bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1 examples/src/main/python/sql/streaming/structured_kafka_wordcount.py ip-10-28-3-35.ec2.internal:2181 subscribe fifa2
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.apache.spark#spark-sql-kafka-0-10_2.11;2.3.1 in central
found org.apache.kafka#kafka-clients;0.10.0.1 in central
found net.jpountz.lz4#lz4;1.3.0 in central
found org.xerial.snappy#snappy-java;1.1.2.6 in central
found org.slf4j#slf4j-api;1.7.16 in central
found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 642ms :: artifacts dl 15ms
:: modules in use:
net.jpountz.lz4#lz4;1.3.0 from central in [default]
org.apache.kafka#kafka-clients;0.10.0.1 from central in [default]
org.apache.spark#spark-sql-kafka-0-10_2.11;2.3.1 from central in [default]
org.slf4j#slf4j-api;1.7.16 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.xerial.snappy#snappy-java;1.1.2.6 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 6 | 0 | 0 | 0 || 6 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 6 already retrieved (0kB/17ms)
18/06/22 06:57:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/06/22 06:57:26 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
18/06/22 06:57:35 WARN NetworkClient: Bootstrap broker ip-10-28-3-35.ec2.internal:2181 disconnected
18/06/22 06:57:35 WARN NetworkClient: Bootstrap broker ip-10-28-3-35.ec2.internal:2181 disconnected
18/06/22 06:57:35 WARN NetworkClient: Bootstrap broker ip-10-28-3-35.ec2.internal:2181 disconnected
18/06/22 06:57:35 WARN NetworkClient: Bootstrap broker ip-10-28-3-35.ec2.internal:2181 disconnected
^Z
[11]+ Stopped bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1 examples/src/main/python/sql/streaming/structured_kafka_wordcount.py ip-10-28-3-35.ec2.internal:2181 subscribe fifa2
... View more
Labels:
06-22-2018
07:20 AM
@Felix Albani I Checked connecting hdf cluster by telnet and was not able to connect it. The error was coming because of security group, as 6667 and 2181 port were not open for communicate with another cluster
... View more
06-20-2018
08:30 AM
I'm trying to Execute the Example program given in Spark Directory on HDP cluster "/spark2/examples/src/main/python/streaming/kafka_wordcount.py" which tries to read kafka topic but gives Zookeeper server timeout error. Spark is installed on HDP Cluster and Kafka is running on HDF Cluster, both are running on different cluster and are in same VPC on AWS Command executed to run spark example on HDP cluster is "bin/spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.3.0.jar examples/src/main/python/streaming/kafka_wordcount.py HDF-cluster-ip-address:2181 topic" <code>-------------------------------------------
Time: 2018-06-20 07:51:56
-------------------------------------------
18/06/20 07:51:56 INFO JobScheduler: Finished job streaming job 1529481116000 ms.0 from job set of time 1529481116000 ms
18/06/20 07:51:56 INFO JobScheduler: Total delay: 0.171 s for time 1529481116000 ms (execution: 0.145 s)
18/06/20 07:51:56 INFO PythonRDD: Removing RDD 94 from persistence list
18/06/20 07:51:56 INFO BlockManager: Removing RDD 94
18/06/20 07:51:56 INFO BlockRDD: Removing RDD 89 from persistence list
18/06/20 07:51:56 INFO BlockManager: Removing RDD 89
18/06/20 07:51:56 INFO KafkaInputDStream: Removing blocks of RDD BlockRDD[89] at createStream at NativeMethodAccessorImpl.java:0 of time 1529481116000 ms
18/06/20 07:51:56 INFO ReceivedBlockTracker: Deleting batches: 1529481114000 ms
18/06/20 07:51:56 INFO InputInfoTracker: remove old batch metadata: 1529481114000 ms
18/06/20 07:51:57 INFO JobScheduler: Added jobs for time 1529481117000 ms
18/06/20 07:51:57 INFO JobScheduler: Starting job streaming job 1529481117000 ms.0 from job set of time 1529481117000 ms
18/06/20 07:51:57 INFO SparkContext: Starting job: runJob at PythonRDD.scala:141
18/06/20 07:51:57 INFO DAGScheduler: Registering RDD 107 (call at /usr/hdp/2.6.5.0-292/spark2/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py:2257)
18/06/20 07:51:57 INFO DAGScheduler: Got job 27 (runJob at PythonRDD.scala:141) with 1 output partitions
18/06/20 07:51:57 INFO DAGScheduler: Final stage: ResultStage 54 (runJob at PythonRDD.scala:141)
18/06/20 07:51:57 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 53)
18/06/20 07:51:57 INFO DAGScheduler: Missing parents: List()
18/06/20 07:51:57 INFO DAGScheduler: Submitting ResultStage 54 (PythonRDD[111] at RDD at PythonRDD.scala:48), which has no missing parents
18/06/20 07:51:57 INFO MemoryStore: Block broadcast_27 stored as values in memory (estimated size 7.0 KB, free 366.0 MB)
18/06/20 07:51:57 INFO MemoryStore: Block broadcast_27_piece0 stored as bytes in memory (estimated size 4.1 KB, free 366.0 MB)
18/06/20 07:51:57 INFO BlockManagerInfo: Added broadcast_27_piece0 in memory on ip-10-29-3-74.ec2.internal:46231 (size: 4.1 KB, free: 366.2 MB)
18/06/20 07:51:57 INFO SparkContext: Created broadcast 27 from broadcast at DAGScheduler.scala:1039
18/06/20 07:51:57 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 54 (PythonRDD[111] at RDD at PythonRDD.scala:48) (first 15 tasks are for partitions Vector(0))
18/06/20 07:51:57 INFO TaskSchedulerImpl: Adding task set 54.0 with 1 tasks
18/06/20 07:51:57 INFO TaskSetManager: Starting task 0.0 in stage 54.0 (TID 53, localhost, executor driver, partition 0, PROCESS_LOCAL, 7649 bytes)
18/06/20 07:51:57 INFO Executor: Running task 0.0 in stage 54.0 (TID 53)
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/20 07:51:57 INFO PythonRunner: Times: total = 40, boot = -881, init = 921, finish = 0
18/06/20 07:51:57 INFO PythonRunner: Times: total = 41, boot = -881, init = 922, finish = 0
18/06/20 07:51:57 INFO Executor: Finished task 0.0 in stage 54.0 (TID 53). 1493 bytes result sent to driver
18/06/20 07:51:57 INFO TaskSetManager: Finished task 0.0 in stage 54.0 (TID 53) in 48 ms on localhost (executor driver) (1/1)
18/06/20 07:51:57 INFO TaskSchedulerImpl: Removed TaskSet 54.0, whose tasks have all completed, from pool
18/06/20 07:51:57 INFO DAGScheduler: ResultStage 54 (runJob at PythonRDD.scala:141) finished in 0.055 s
18/06/20 07:51:57 INFO DAGScheduler: Job 27 finished: runJob at PythonRDD.scala:141, took 0.058062 s
18/06/20 07:51:57 INFO ZooKeeper: Session: 0x0 closed
18/06/20 07:51:57 INFO SparkContext: Starting job: runJob at PythonRDD.scala:141
18/06/20 07:51:57 INFO DAGScheduler: Got job 28 (runJob at PythonRDD.scala:141) with 3 output partitions
18/06/20 07:51:57 INFO DAGScheduler: Final stage: ResultStage 56 (runJob at PythonRDD.scala:141)
18/06/20 07:51:57 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 55)
18/06/20 07:51:57 INFO DAGScheduler: Missing parents: List()
18/06/20 07:51:57 INFO DAGScheduler: Submitting ResultStage 56 (PythonRDD[112] at RDD at PythonRDD.scala:48), which has no missing parents
18/06/20 07:51:57 INFO ReceiverSupervisorImpl: Stopping receiver with message: Error starting receiver 0: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
18/06/20 07:51:57 INFO ReceiverSupervisorImpl: Called receiver onStop
18/06/20 07:51:57 INFO ReceiverSupervisorImpl: Deregistering receiver 0
18/06/20 07:51:57 INFO MemoryStore: Block broadcast_28 stored as values in memory (estimated size 7.0 KB, free 365.9 MB)
18/06/20 07:51:57 INFO MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 4.1 KB, free 365.9 MB)
18/06/20 07:51:57 INFO ClientCnxn: EventThread shut down
18/06/20 07:51:57 INFO BlockManagerInfo: Added broadcast_28_piece0 in memory on ip-10-29-3-74.ec2.internal:46231 (size: 4.1 KB, free: 366.2 MB)
18/06/20 07:51:57 INFO SparkContext: Created broadcast 28 from broadcast at DAGScheduler.scala:1039
18/06/20 07:51:57 INFO DAGScheduler: Submitting 3 missing tasks from ResultStage 56 (PythonRDD[112] at RDD at PythonRDD.scala:48) (first 15 tasks are for partitions Vector(1, 2, 3))
18/06/20 07:51:57 INFO TaskSchedulerImpl: Adding task set 56.0 with 3 tasks
18/06/20 07:51:57 INFO TaskSetManager: Starting task 0.0 in stage 56.0 (TID 54, localhost, executor driver, partition 1, PROCESS_LOCAL, 7649 bytes)
18/06/20 07:51:57 INFO TaskSetManager: Starting task 1.0 in stage 56.0 (TID 55, localhost, executor driver, partition 2, PROCESS_LOCAL, 7649 bytes)
18/06/20 07:51:57 INFO TaskSetManager: Starting task 2.0 in stage 56.0 (TID 56, localhost, executor driver, partition 3, PROCESS_LOCAL, 7649 bytes)
18/06/20 07:51:57 INFO Executor: Running task 1.0 in stage 56.0 (TID 55)
18/06/20 07:51:57 INFO Executor: Running task 2.0 in stage 56.0 (TID 56)
18/06/20 07:51:57 INFO Executor: Running task 0.0 in stage 56.0 (TID 54)
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 0 blocks
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/20 07:51:57 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
18/06/20 07:51:57 ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/06/20 07:51:57 INFO ReceiverSupervisorImpl: Stopped receiver 0
18/06/20 07:51:57 INFO BlockGenerator: Stopping BlockGenerator
18/06/20 07:51:57 INFO PythonRunner: Times: total = 40, boot = -947, init = 987, finish = 0
18/06/20 07:51:57 INFO PythonRunner: Times: total = 40, boot = -947, init = 987, finish = 0
18/06/20 07:51:57 INFO PythonRunner: Times: total = 41, boot = -944, init = 985, finish = 0
18/06/20 07:51:57 INFO Executor: Finished task 1.0 in stage 56.0 (TID 55). 1536 bytes result sent to driver
18/06/20 07:51:57 INFO TaskSetManager: Finished task 1.0 in stage 56.0 (TID 55) in 52 ms on localhost (executor driver) (1/3)
18/06/20 07:51:57 INFO PythonRunner: Times: total = 45, boot = -944, init = 989, finish = 0
18/06/20 07:51:57 INFO PythonRunner: Times: total = 40, boot = -32, init = 72, finish = 0
18/06/20 07:51:57 INFO Executor: Finished task 0.0 in stage 56.0 (TID 54). 1536 bytes result sent to driver
18/06/20 07:51:57 INFO TaskSetManager: Finished task 0.0 in stage 56.0 (TID 54) in 56 ms on localhost (executor driver) (2/3)
18/06/20 07:51:57 INFO PythonRunner: Times: total = 40, boot = -33, init = 73, finish = 0
18/06/20 07:51:57 INFO Executor: Finished task 2.0 in stage 56.0 (TID 56). 1536 bytes result sent to driver
18/06/20 07:51:57 INFO TaskSetManager: Finished task 2.0 in stage 56.0 (TID 56) in 58 ms on localhost (executor driver) (3/3)
18/06/20 07:51:57 INFO TaskSchedulerImpl: Removed TaskSet 56.0, whose tasks have all completed, from pool
18/06/20 07:51:57 INFO DAGScheduler: ResultStage 56 (runJob at PythonRDD.scala:141) finished in 0.063 s
18/06/20 07:51:57 INFO DAGScheduler: Job 28 finished: runJob at PythonRDD.scala:141, took 0.065728 s
-------------------------------------------
Time: 2018-06-20 07:51:57
-------------------------------------------
18/06/20 07:51:57 INFO JobScheduler: Finished job streaming job 1529481117000 ms.0 from job set of time 1529481117000 ms
18/06/20 07:51:57 INFO JobScheduler: Total delay: 0.169 s for time 1529481117000 ms (execution: 0.149 s)
18/06/20 07:51:57 INFO PythonRDD: Removing RDD 102 from persistence list
18/06/20 07:51:57 INFO BlockManager: Removing RDD 102
18/06/20 07:51:57 INFO BlockRDD: Removing RDD 97 from persistence list
18/06/20 07:51:57 INFO KafkaInputDStream: Removing blocks of RDD BlockRDD[97] at createStream at NativeMethodAccessorImpl.java:0 of time 1529481117000 ms
18/06/20 07:51:57 INFO BlockManager: Removing RDD 97
18/06/20 07:51:57 INFO ReceivedBlockTracker: Deleting batches: 1529481115000 ms
18/06/20 07:51:57 INFO InputInfoTracker: remove old batch metadata: 1529481115000 ms
18/06/20 07:51:57 INFO RecurringTimer: Stopped timer for BlockGenerator after time 1529481117400
18/06/20 07:51:57 INFO BlockGenerator: Waiting for block pushing thread to terminate
18/06/20 07:51:57 INFO BlockGenerator: Pushing out the last 0 blocks
18/06/20 07:51:57 INFO BlockGenerator: Stopped block pushing thread
18/06/20 07:51:57 INFO BlockGenerator: Stopped BlockGenerator
18/06/20 07:51:57 INFO ReceiverSupervisorImpl: Waiting for receiver to be stopped
18/06/20 07:51:57 ERROR ReceiverSupervisorImpl: Stopped receiver with error: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
18/06/20 07:51:57 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/06/20 07:51:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/06/20 07:51:57 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
18/06/20 07:51:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/06/20 07:51:57 INFO TaskSchedulerImpl: Cancelling stage 0
18/06/20 07:51:57 INFO DAGScheduler: ResultStage 0 (start at NativeMethodAccessorImpl.java:0) failed in 13.256 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
18/06/20 07:51:57 ERROR ReceiverTracker: Receiver has been stopped. Try to restart it.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$anon$1.run(EventLoop.scala:48)
Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 10000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:84)
at kafka.consumer.ZookeeperConsumerConnector.connectZk(ZookeeperConsumerConnector.scala:171)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:126)
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:143)
at kafka.consumer.Consumer$.create(ConsumerConnector.scala:94)
at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:600)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$anonfun$9.apply(ReceiverTracker.scala:590)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.SparkContext$anonfun$34.apply(SparkContext.scala:2185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... View more
Labels:
12-21-2017
07:13 AM
Hi Robert Levas, Thanks for these help. I used to download kerberos.csv file in previous version but now i'm facing issue. I used to obtain the Kerberos.csv file using command " GET /api/v1/clusters/:cluster_name/kerberos_identities?fields=*&format=CSV " as mentioned by you in previous version of HDP i.e. HDP 2.4, But I'm not able to download it with HDP 2.5.3 and Amabri server 2.2. Is there any change in path of kerberos_identities. I have attached the screenshot where one can see that an empty file has downloaded with no Principals and keytabs.
... View more
09-28-2017
08:24 AM
Hi @Jeff Arnold, I tried to start the failed namenode on standbynamenode with above steps. I faced some error on running these command "sudo -u hdfs hdfs namenode -bootstrapStandby -force" Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x000000008c800000, 1937768448, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 1937768448 bytes for committing reserved memory. # An error report file with more information is saved as:
# /var/log/hadoop/hdfs/hs_err_pid5144.log Before executing the steps that you provided, I was facing these error while restarting namenode on standbynamenode via Ambari: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 408, in <module>
NameNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 103, in start
upgrade_suspended=params.upgrade_suspended, env=env)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 118, in namenode
raise Fail("Could not bootstrap standby namenode")
resource_management.core.exceptions.Fail: Could not bootstrap standby namenode
... View more