<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ImportError: No module named numpy (after re-deploying) in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52788#M58354</link>
    <description>&lt;P&gt;I have an intermittent issue. I've read the other threads regarding numpy not found on this site&amp;nbsp;and other places on the web to solve my problem, but it keeps coming back after I re-deploy client configurations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am running a Spark job through HUE-&amp;gt;Oozie and using pyspark's MLlib which requires numpy.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Initially, I read the Cloudera docs and blog indicating to install numpy to each node (Anaconda isn't an option for me). I installed numpy on each node using yum as root (I didn't create a virtual environment for this). This worked. However, I later re-deployed the client configurations through CM for reasons unrelated to this issue, and I received the numpy not found error again.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At this point I went to the configuration page for Spark in CM to set the variables:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PYSPARK_PYTHON=/usr/lib64/python2.7&lt;BR /&gt;PYSPARK_DRIVER_PYTHON=/usr/lib64/python2.7&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;in &lt;STRONG&gt;&lt;SPAN class="ph uicontrol"&gt;Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Source: &lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25" target="_self"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Next, I re-deployed client configurations. It started working again. However, yet again after re-deploying later on for reasons unrelated to this issue, I got numpy not found again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So it seems that it just keeps coming back and only lasts for one deployment when it does work. I also looked into checking permissions for the python paths, and I don't see any issues there but I may be missing something.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Could this be related to running it through HUE or Oozie?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Are the environment variables I set nto the correct paths?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any help is appreciated. Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Apr 2026 13:47:07 GMT</pubDate>
    <dc:creator>jpayne1</dc:creator>
    <dc:date>2026-04-21T13:47:07Z</dc:date>
    <item>
      <title>ImportError: No module named numpy (after re-deploying)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52788#M58354</link>
      <description>&lt;P&gt;I have an intermittent issue. I've read the other threads regarding numpy not found on this site&amp;nbsp;and other places on the web to solve my problem, but it keeps coming back after I re-deploy client configurations.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am running a Spark job through HUE-&amp;gt;Oozie and using pyspark's MLlib which requires numpy.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Initially, I read the Cloudera docs and blog indicating to install numpy to each node (Anaconda isn't an option for me). I installed numpy on each node using yum as root (I didn't create a virtual environment for this). This worked. However, I later re-deployed the client configurations through CM for reasons unrelated to this issue, and I received the numpy not found error again.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At this point I went to the configuration page for Spark in CM to set the variables:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PYSPARK_PYTHON=/usr/lib64/python2.7&lt;BR /&gt;PYSPARK_DRIVER_PYTHON=/usr/lib64/python2.7&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;in &lt;STRONG&gt;&lt;SPAN class="ph uicontrol"&gt;Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Source: &lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25" target="_self"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#concept_qzp_p3s_b5__section_ark_lkn_25&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Next, I re-deployed client configurations. It started working again. However, yet again after re-deploying later on for reasons unrelated to this issue, I got numpy not found again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So it seems that it just keeps coming back and only lasts for one deployment when it does work. I also looked into checking permissions for the python paths, and I don't see any issues there but I may be missing something.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Could this be related to running it through HUE or Oozie?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Are the environment variables I set nto the correct paths?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any help is appreciated. Thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 13:47:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52788#M58354</guid>
      <dc:creator>jpayne1</dc:creator>
      <dc:date>2026-04-21T13:47:07Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy (after re-deploying)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52803#M58355</link>
      <description>&lt;P&gt;I think the recommended way to manage this without using Anaconda is to use the Anaconda-based parcel for CDH, which will lay down a basic version of dependencies like numpy and should plumb the necessary configuration to use that.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Mar 2017 05:28:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52803#M58355</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2017-03-29T05:28:34Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy (after re-deploying)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52947#M58356</link>
      <description>&lt;P&gt;Unfortunately, Anaconda isn't an option for me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also added "export" to my safety valve changes for the 2 python variables but numpy still cannot be found.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 18:16:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/52947#M58356</guid>
      <dc:creator>jpayne1</dc:creator>
      <dc:date>2017-03-30T18:16:22Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy (after re-deploying)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/53870#M58357</link>
      <description>&lt;P&gt;In case anyone else has this issue, the documentation for CDH 5.10 is incorrect.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25" target="_blank"&gt;https://www.cloudera.com/documentation/enterprise/latest/topics/spark_python.html#spark_python__section_ark_lkn_25&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It says to set PYSPARK_PYTHON and&amp;nbsp;&lt;SPAN&gt;PYSPARK_DRIVER_PYTHON in&amp;nbsp;&lt;SPAN class="ph uicontrol"&gt;Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh&lt;/SPAN&gt;. I imagine this would be correct if you run Spark in stand-alone mode.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;However, if you run in yarn-client or yarn-cluster, the PYSPARK_PYTHON variable has to be set in YARN. The driver variable isn't relevant. It appears to be only relvent if you want to run it through a notebook. I didn't have to do any of the steps the docs say to do for yarn-cluster either.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;PYSPARK_PYTHON="/usr/bin/python"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488" target="_blank"&gt;http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Change-Python-path/m-p/38333/highlight/true#M1488&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 17:42:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ImportError-No-module-named-numpy-after-re-deploying/m-p/53870#M58357</guid>
      <dc:creator>jpayne1</dc:creator>
      <dc:date>2017-04-19T17:42:34Z</dc:date>
    </item>
  </channel>
</rss>

