Support Questions

Find answers, ask questions, and share your expertise

Using Spark and Kafka through Informatica Streaming

avatar
Explorer

Hi everyone,

I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location.

Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing.

Setup:

  • Cloudera version: 7.2.18 (Public Cloud)

  • Authentication: I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access.

  • This setup works fine within the Informatica Developer tool when using the files on the Informatica server.

Issue 1:

I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS.
I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach.

Question: Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job?

Issue 2:

Occasionally, my job fails with the following error on certain nodes:

 

 
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS]

This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used.

Any tips, best practices, or clarifications would be greatly appreciated!
Thanks in advance for your support.

3 REPLIES 3

avatar
Community Manager

@LSIMS, Welcome to our community! To help you get the best possible answer, I have tagged in our Kafka experts @haridjh who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Explorer

As an update, this is not a Kafka related issue.
The same situation happen with mappings using Hive, HDFS or others.

If someone had ever similar situation please let me know.

avatar
Rising Star

@LSIMS you mentioned it is occasional , does it mean that it is failing only on few nodes ? Can you check with Informatica team on how to pass the kerberos keytab cerds .. I found this Informatica Article on passing the keytab details for spark+kafka setup .

https://docs.informatica.com/data-engineering/data-engineering-integration/10-2-2/big-data-managemen...