Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark kafka streaming code with Kafka on master yarn hangs on stage forever with no logs or errors

Highlighted

Spark kafka streaming code with Kafka on master yarn hangs on stage forever with no logs or errors

New Contributor

I am working on spark-kafka streaming code and it is working fine on master local but not on master yarn. These components run on HDF 3.1 on azure cloud with kerberos authentication.

I am running this from spark-shell as below.

spark-shell --num-executors 4 --executor-cores 8 --queue queue_name --master yarn --executor-memory 2G --driver-memory 1G \ --conf "spark.streaming.receiver.writeAheadLog.enable=true" \ --conf "spark.driver.cores=2" \ --files /home/user/keytab/user-copy-for-kafka.keytab#user-copy-for-kafka.keytab,/home/user/kafka_jaas.conf#kafka_jaas.conf \ --jars /home/user/jars/kafka-clients-1.0.0.jar,/home/user/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar,/home/user/jars/spark-streaming-kafka-0-10_2.11-2.3.0.jar \ --keytab /home/user/keytab/user.keytab \ --principal user@ORG.COM \ --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_jaas.conf -Djava.io.tmpdir=/tmp " \ --driver-java-options "-Djava.security.auth.login.config=./kafka_jaas.conf -Djava.io.tmpdir=/tmp -Dlog4j.configuration=file:///home/user/config/log4j.properties -Dlog_name=/home/user/logs/spark_shell_4.log"


import org.apache.spark.SparkConf import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.{StringType, StructField, StructType, TimestampType} import org.apache.spark.sql.{DataFrame, SparkSession}

//Read data from Kafka:

val kafkaDatademostr = spark.readStream.format("kafka").option("kafka.bootstrap.servers","server-name:8002").option("subscribe","topic-name").option("kafka.security.protocol","SASL_PLAINTEXT").load

//Writing to console:

val runstream = kafkaDatademostr.writeStream.format("console").option("checkpointLocation", "/tmp/user/tmpdemostr8").start()

[Stage 0:> (0 + 3) / 3]2019-06-25 13:20:35 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on

[Stage 0:===================> (1 + 2) / 3]2019-06-25 13:20:51 INFO TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 2) in 16700 ms on

[Stage 0:=======================================> (2 + 1) / 3]

Write process just hangs here.

Everything is working as expected on local , but not on yarn. We suspect that this could be an issue with Checkpoint param , but i couldn't find any useful solution or post that talks about the issue earlier.. Any help is highly appreciated.