Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

New Contributor

Hello,

I am currently developing a simple spark streaming job in Scala on a HDP-2.3.4.7 kerberized platform. I imported the Spark Kafka assembly provided by Hortonworks (spark-streaming-kafka-assembly_2.10-1.5.2.2.3.4.7-4) which includes a Kafka 0.8.2 client. The job is run through a spark-submit in Yarn-cluster mode. I implemented a Kafka producer and a Kafka consumer with a PLAINTEXT SASL authentication.

I set a "jaas.conf" provided as an input of the spark-submit to manage the Kerberos/SASL authentication:

KafkaClient {
  com.sun.security.auth.module.Krb5LoginModule required
  useTicketCache=false
  useKeyTab=true
  principal=myUserName@myRealm
  keyTab=pathToMyKeyTab
  renewTicket=true
  storeKey=true
  serviceName="kafka";
};
Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useTicketCache=false
  useKeyTab=true
  principal=myUserName@myRealm
  keyTab=pathToMyKeyTab
  renewTicket=true
  storeKey=true
  serviceName="zookeeper";
};

I also set the proper Kafka parameters (security.protocol -> PLAINTEXTSASL) on my producer and on my consumer to enable the Kerberos/SASL authentication.

My executor stdout confirms that I am using the PLAINTEXTSASL mode on the Producer (I have no information about the Consumer configuration in the logs)

16/09/30 16:12:59 INFO producer.ProducerConfig: ProducerConfig values: 
    retry.backoff.ms = 500
    buffer.memory = 33554432
    ...
    security.protocol = PLAINTEXTSASL

But I got also this log after

16/09/30 16:13:00 WARN producer.ProducerConfig: The configuration security.protocol = PLAINTEXTSASL was supplied but isn't a known config.

I got the following exception on my executor stderr :

16/09/30 16:13:03 WARN consumer.ConsumerFetcherManager$LeaderFinderThread: [xxx-leader-finder-thread], Failed to find leader for Set([z_xxx,0], [z_xxx,1])
kafka.common.BrokerEndPointNotAvailableException: End point PLAINTEXT not found for broker 0
    at kafka.cluster.Broker.getBrokerEndPoint(Broker.scala:140)
    at kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
    at kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:124)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at kafka.utils.ZkUtils$.getAllBrokerEndPointsForChannel(ZkUtils.scala:124)
    at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

It seems that the PLAINTEXT SASL mode is not activated. There is no event in the Streaming Spark UI like nothing is written on my kafka topic.

I tried also to set the "security.protocol" to "SASL_PLAINTEXT" (as mentioned here) and I got a fatalException :

16/09/30 17:09:06 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 2)
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
...
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.kafka.common.protocol.SecurityProtocol.SASL_PLAINTEXT

I am quite stuck on this exception, any help would be appreciated.

Thanks in advance,

Jean-François

4 REPLIES 4

Re: HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

@Jean-François Vandemoortele Correct value for "security.protocol" is "PLAINTEXTSASL", please change accordingly.

Since you are not seeing any errors from the producer log, can you please try writing to the kafka topic using console producer like below.

/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list <kafka_brokers> --topic <topic> --security-protocol PLAINTEXTSASL

If the console producer is successful, try reading the messages using console consumer like below.

/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper <zk_quorum> --topic <topic> --security-protocol PLAINTEXTSASL --from-beginning

If both read and write are successful using the console tools, then it seems like an issue with the config of custom producer and consumers implemented. can you please provide the config details here?

Regards

Ayub Khan

Re: HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

New Contributor

Hello

@Ayub Pathan

, and thanks for your answer.

I ran some tests. The two kafka console commands you mentionned ran successfully. There were the same "WARN" I encountered but there are non-blocking.

I use the configuration below for the producer:

var producerProps = new Properties()
producerProps.put("client.id", CLIENTIDPRODUCER)
producerProps.put("bootstrap.servers", kBrokerList)
producerProps.put("acks", "all")
producerProps.put("retries", "0")
producerProps.put("batch.size", "16384")
producerProps.put("linger.ms", "1")
producerProps.put("buffer.memory", "33554432")
producerProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
producerProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
producerProps.put("retry.backoff.ms", "500")
producerProps.put("security.protocol", "PLAINTEXTSASL")

My main issue today to investigate my problem is to get the Spark executor stdout logs. For a reason I cannot explain, there are no stdout logs (log length:0) on the executor during the execution and after a kill my job. After stopping my job, I checked with "yarn application -list" and there were no running processes.

Regards, Jean-François

Re: HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

New Contributor

To your point on not being able to see any executor logs, it depends upon which deploy mode you used to submit the job. Try using deploy mode client and you shall see out and err both streams.

Re: HDP 2.3.4 Spark Streaming Kafka Kerberos SASL issues

@Jean-François Vandemoortele

In older kafka clients PLAINTEXTSASL protocol is not available.

Try adding the hortonworks repo and dependencies in your pom (as mentioned here :https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_spark-guide/content/spark-streaming-kafka-kerb.html) and rebuild/re-deploy the app.