question Using Spark and Kafka through Informatica Streaming in Support Questions

Using Spark and Kafka through Informatica Streaming

LSIMS — Sat, 31 May 2025 12:31:56 GMT

Hi everyone,

I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location.

Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing.

Setup:

Cloudera version: 7.2.18 (Public Cloud)
Authentication: I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access.
This setup works fine within the Informatica Developer tool when using the files on the Informatica server.

Issue 1:

I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS.
I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach.

Question: Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job?

Issue 2:

Occasionally, my job fails with the following error on certain nodes:

Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS]

This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used.

Any tips, best practices, or clarifications would be greatly appreciated!
Thanks in advance for your support.

Re: Using Spark and Kafka through Informatica Streaming

VidyaSargur — Fri, 20 Jun 2025 10:06:00 GMT

@LSIMS, Welcome to our community! To help you get the best possible answer, I have tagged in our Kafka experts @haridjh who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Re: Using Spark and Kafka through Informatica Streaming

LSIMS — Fri, 20 Jun 2025 10:26:26 GMT

As an update, this is not a Kafka related issue.
The same situation happen with mappings using Hive, HDFS or others.

If someone had ever similar situation please let me know.

Re: Using Spark and Kafka through Informatica Streaming

haridjh — Thu, 26 Jun 2025 15:54:32 GMT

@LSIMS you mentioned it is occasional , does it mean that it is failing only on few nodes ? Can you check with Informatica team on how to pass the kerberos keytab cerds .. I found this Informatica Article on passing the keytab details for spark+kafka setup .

https://docs.informatica.com/data-engineering/data-engineering-integration/10-2-2/big-data-management-administrator-guide/connections/configuring-hadoop-connection-properties/spark-advanced-properties.html