Member since
06-09-2017
9
Posts
7
Kudos Received
0
Solutions
10-07-2022
08:42 PM
Hi, I have been following your instruction. If I want to do the same thing, but with pyspark, will the code similar to this?
... View more
07-09-2020
02:23 PM
@sajidiqubal CAn you share more info. The solution is simple, the spart streaming jobs needs to find the kafka-jaas and the corresponding keytab. Make sure both paths are accessible on all machines. So kafka-jaas and the keytab need to be in the local folder and not hdfs. If you need in hdfs, then it needs to be sent it as a part of the spark --files and --keytab arguments ( iirc). In newer versions of kafka, you can add the jaas info as a kafka parameter using sasl.jaas.config. see ex. below. In that case you just need the keytab to be available on all machines. sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \
useKeyTab=true \
storeKey=true \
keyTab="/etc/security/keytabs/kafka_client.keytab" \
principal="kafkaclient1@EXAMPLE.COM";
you will also need a parameter sasl.kerberos.service.name=kafka From your error it looks like the code is not able to find one of the files, the jaas.conf or the keytab. Please check and make sure the file is in the right path and on all yarn nodes.
... View more
01-07-2019
07:18 PM
@lweichberger, I suspect the same about the version mismatch. https://spark.apache.org/docs/2.3.0/streaming-kafka-0-8-integration.html As per the example given in above link, when I create the streaming context ssc = StreamingContext(sc, 5) I get the JavaAbstractMethodError from py4j. Using spark-sql-kafka, I can get the structured stream with new API. But I like to fix the existing version issue.
... View more