<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: CDP CDE Jobs UI: Providing a custom Python module for PySpark UDFs in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/340956#M233409</link>
    <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/95797"&gt;@stephen_obrien&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for using Cloudera Community. We see your Team is working with our Support Team&lt;SPAN&gt;&amp;nbsp;for the concerned issue. Based on the Support engagement, We shall update the Post accordingly.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards, Smarak&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 08 Apr 2022 08:01:35 GMT</pubDate>
    <dc:creator>smdas</dc:creator>
    <dc:date>2022-04-08T08:01:35Z</dc:date>
    <item>
      <title>CDP CDE Jobs UI: Providing a custom Python module for PySpark UDFs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/340839#M233387</link>
      <description>&lt;P&gt;Hi. I'm trying to reproduce a typical edge-node submission pattern for PySpark jobs using the CDE Jobs UI. To provide a module with custom Python functions that are declared as UDFs, one can specify:&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;spark_session.sparkContext.addPyFile("python_utils.py")&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;or use the &lt;FONT face="lucida sans unicode,lucida sans"&gt;--py-files&lt;/FONT&gt; argument with spark-submit:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;spark-submit --py-files python_utils.py pyspark_main.py&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Pure Python functions can then be imported in this way:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;import pyspark.sql.functions as F&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;def add_udf_column(df):&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;&amp;nbsp; from python_utils import python_func&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;&amp;nbsp; python_udf = F.udf(python_func, StringType())&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;&amp;nbsp; df = df.withColumn("udf_column", python_udf(df["src_column"]))&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;&amp;nbsp; return df&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Attempting something similar using the CDE Jobs UI, Spark cannot seem to find the custom module. My settings are:&lt;/P&gt;&lt;P&gt;Application File: &lt;FONT face="lucida sans unicode,lucida sans"&gt;pyspark_main.py&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Arguments: &lt;FONT face="lucida sans unicode,lucida sans"&gt;--py-files python_utils.py&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Advanced Options:&lt;/P&gt;&lt;P&gt;Python, Egg, Zip files: Added &lt;FONT face="lucida sans unicode,lucida sans"&gt;python_utils.py&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error I'm getting is:&lt;/P&gt;&lt;P&gt;ModuleNotFoundError: No module named 'python_utils'&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any thoughts on how I should provide this file? Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Edit: Testing spark-submit on CDH6.2, it seems that the --py-files flag must be placed before the main script. If the flag is placed afterwards, then the job fails with the ModuleNotFoundError as above.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From the CDE logs it looks like the API is placing the flag after the reference to the main script. From driver.stderr.log:&lt;/P&gt;&lt;P&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;+ '[' -z ']'&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="lucida sans unicode,lucida sans"&gt;+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=***.***.**.*** --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /app/mount/pyspark_main.py '--py-files python_utils.py'&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 07:57:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/340839#M233387</guid>
      <dc:creator>stephen_obrien</dc:creator>
      <dc:date>2026-04-21T07:57:51Z</dc:date>
    </item>
    <item>
      <title>Re: CDP CDE Jobs UI: Providing a custom Python module for PySpark UDFs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/340956#M233409</link>
      <description>&lt;P&gt;Greetings&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/95797"&gt;@stephen_obrien&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for using Cloudera Community. We see your Team is working with our Support Team&lt;SPAN&gt;&amp;nbsp;for the concerned issue. Based on the Support engagement, We shall update the Post accordingly.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Regards, Smarak&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2022 08:01:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/340956#M233409</guid>
      <dc:creator>smdas</dc:creator>
      <dc:date>2022-04-08T08:01:35Z</dc:date>
    </item>
    <item>
      <title>Re: CDP CDE Jobs UI: Providing a custom Python module for PySpark UDFs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/345773#M234640</link>
      <description>&lt;P&gt;&lt;SPAN&gt;To provide a module with custom Python functions that are declared as UDFs, one must specify:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;spark_session.sparkContext.addPyFile("&lt;STRONG&gt;/app/mount/&lt;/STRONG&gt;python_utils.py")&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This file should be included in a resource attached to the job.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;See this post for further examples:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://blog.cloudera.com/managing-python-dependencies-for-spark-workloads-in-cloudera-data-engineering/" target="_blank"&gt;https://blog.cloudera.com/managing-python-dependencies-for-spark-workloads-in-cloudera-data-engineering/&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jun 2022 14:00:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CDP-CDE-Jobs-UI-Providing-a-custom-Python-module-for-PySpark/m-p/345773#M234640</guid>
      <dc:creator>stephen_obrien</dc:creator>
      <dc:date>2022-06-16T14:00:27Z</dc:date>
    </item>
  </channel>
</rss>

