<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Using Spark and Kafka through Informatica Streaming in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/409076#M252806</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location.&lt;/P&gt;&lt;P&gt;Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing.&lt;/P&gt;&lt;H3&gt;Setup:&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Cloudera version:&lt;/STRONG&gt; 7.2.18 (Public Cloud)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Authentication:&lt;/STRONG&gt; I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;This setup works fine within the Informatica Developer tool when using the files on the Informatica server.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Issue 1:&lt;/H3&gt;&lt;P&gt;I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS.&lt;BR /&gt;I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt; Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job?&lt;/P&gt;&lt;H3&gt;Issue 2:&lt;/H3&gt;&lt;P&gt;Occasionally, my job fails with the following error on certain nodes:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"&gt;&lt;DIV class="sticky top-9"&gt;&lt;DIV class="absolute end-0 bottom-0 flex h-9 items-center pe-2"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;PRE&gt;Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS]&lt;/PRE&gt;&lt;P&gt;This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Any tips, best practices, or clarifications would be greatly appreciated!&lt;/STRONG&gt;&lt;BR /&gt;Thanks in advance for your support.&lt;/P&gt;</description>
    <pubDate>Sat, 31 May 2025 12:31:56 GMT</pubDate>
    <dc:creator>LSIMS</dc:creator>
    <dc:date>2025-05-31T12:31:56Z</dc:date>
    <item>
      <title>Using Spark and Kafka through Informatica Streaming</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/409076#M252806</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm currently building my first Informatica mapping, which is designed to read XML documents from a Kafka topic and store them in an HDFS location.&lt;/P&gt;&lt;P&gt;Since I'm still new to both Informatica and Cloudera, I’d appreciate your guidance on a few issues I’m facing.&lt;/P&gt;&lt;H3&gt;Setup:&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Cloudera version:&lt;/STRONG&gt; 7.2.18 (Public Cloud)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Authentication:&lt;/STRONG&gt; I'm using my user keytab and a KDC/FreeIPA certificate. I’ve also created a jaas_client.conf file that allows Kafka access.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;This setup works fine within the Informatica Developer tool when using the files on the Informatica server.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3&gt;Issue 1:&lt;/H3&gt;&lt;P&gt;I'm struggling to pass these authentication files (keytab, certificate, JAAS config) to the Spark execution context so that Spark can connect to Kafka and HDFS.&lt;BR /&gt;I manually copied the files to the /tmp directory of the master and worker nodes, but I’m unsure if this is the correct approach.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt; Is manually copying these files to Spark nodes the recommended method, or should Informatica handle this automatically when submitting the job?&lt;/P&gt;&lt;H3&gt;Issue 2:&lt;/H3&gt;&lt;P&gt;Occasionally, my job fails with the following error on certain nodes:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"&gt;&lt;DIV class="sticky top-9"&gt;&lt;DIV class="absolute end-0 bottom-0 flex h-9 items-center pe-2"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;PRE&gt;Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via: [TOKEN, KERBEROS]&lt;/PRE&gt;&lt;P&gt;This seems to indicate an authentication failure, possibly related to the way credentials are being propagated or used.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Any tips, best practices, or clarifications would be greatly appreciated!&lt;/STRONG&gt;&lt;BR /&gt;Thanks in advance for your support.&lt;/P&gt;</description>
      <pubDate>Sat, 31 May 2025 12:31:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/409076#M252806</guid>
      <dc:creator>LSIMS</dc:creator>
      <dc:date>2025-05-31T12:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Using Spark and Kafka through Informatica Streaming</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410687#M252888</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/127202"&gt;@LSIMS&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged in our Kafka experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/86141"&gt;@haridjh&lt;/a&gt;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jun 2025 10:06:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410687#M252888</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2025-06-20T10:06:00Z</dc:date>
    </item>
    <item>
      <title>Re: Using Spark and Kafka through Informatica Streaming</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410694#M252890</link>
      <description>&lt;P&gt;As an update, this is not a Kafka related issue.&lt;BR /&gt;The same situation happen with mappings using Hive, HDFS or others.&lt;BR /&gt;&lt;BR /&gt;If someone had ever similar situation please let me know.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jun 2025 10:26:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410694#M252890</guid>
      <dc:creator>LSIMS</dc:creator>
      <dc:date>2025-06-20T10:26:26Z</dc:date>
    </item>
    <item>
      <title>Re: Using Spark and Kafka through Informatica Streaming</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410922#M252939</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/127202"&gt;@LSIMS&lt;/a&gt;&amp;nbsp;you mentioned it is occasional , does it mean that it is failing only on few nodes ? Can you check with Informatica team on how to pass the kerberos keytab cerds .. I found this Informatica Article on passing the keytab details for spark+kafka setup .&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.informatica.com/data-engineering/data-engineering-integration/10-2-2/big-data-management-administrator-guide/connections/configuring-hadoop-connection-properties/spark-advanced-properties.html" target="_blank"&gt;https://docs.informatica.com/data-engineering/data-engineering-integration/10-2-2/big-data-management-administrator-guide/connections/configuring-hadoop-connection-properties/spark-advanced-properties.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jun 2025 15:54:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-Spark-and-Kafka-through-Informatica-Streaming/m-p/410922#M252939</guid>
      <dc:creator>haridjh</dc:creator>
      <dc:date>2025-06-26T15:54:32Z</dc:date>
    </item>
  </channel>
</rss>

