<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ImportError: No module named numpy in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90312#M21653</link>
    <description>&lt;P&gt;Befor I post this issue, we have already readed all the same issue's solutions that we can find.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Our cluster is installed with cdh6.2, after install we use HUE to use the cluster. Job is submited via Hue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When spark code need to import numpy,&amp;nbsp; got error below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Traceback (most recent call last):
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/test.py", line 79, in &amp;lt;module&amp;gt;
    from pyspark.ml.linalg import Vectors
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in &amp;lt;module&amp;gt;
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/base.py", line 24, in &amp;lt;module&amp;gt;
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in &amp;lt;module&amp;gt;
ImportError: No module named numpy&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We followed office guied to install anaconda parcel,&amp;nbsp; And setup the&amp;nbsp;&lt;SPAN&gt;Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Setup the&amp;nbsp;Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;Also, setup the&amp;nbsp;&lt;SPAN&gt;YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;But non of these can help to solve the import issue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks for any help.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 14:23:05 GMT</pubDate>
    <dc:creator>kernel8liang</dc:creator>
    <dc:date>2022-09-16T14:23:05Z</dc:date>
    <item>
      <title>ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90312#M21653</link>
      <description>&lt;P&gt;Befor I post this issue, we have already readed all the same issue's solutions that we can find.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Our cluster is installed with cdh6.2, after install we use HUE to use the cluster. Job is submited via Hue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When spark code need to import numpy,&amp;nbsp; got error below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Traceback (most recent call last):
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/test.py", line 79, in &amp;lt;module&amp;gt;
    from pyspark.ml.linalg import Vectors
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in &amp;lt;module&amp;gt;
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/base.py", line 24, in &amp;lt;module&amp;gt;
  File "/var/yarn/nm/usercache/admin/appcache/application_1557739482535_0001/container_1557739482535_0001_01_000001/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in &amp;lt;module&amp;gt;
ImportError: No module named numpy&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We followed office guied to install anaconda parcel,&amp;nbsp; And setup the&amp;nbsp;&lt;SPAN&gt;Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Setup the&amp;nbsp;Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;Also, setup the&amp;nbsp;&lt;SPAN&gt;YARN (MR2 Included) Service Environment Advanced Configuration Snippet (Safety Valve)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;But non of these can help to solve the import issue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks for any help.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 14:23:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90312#M21653</guid>
      <dc:creator>kernel8liang</dc:creator>
      <dc:date>2022-09-16T14:23:05Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90389#M21654</link>
      <description>&lt;P&gt;Please check if numpy is actually installed on all of the nodemanagers, if not, install it using below command (for python2.x) :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;pip install numpy&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If already installed, let us know the following:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Can you execute the same command outside of hue i.e. using Spark2-submit ? Mention the full command here.&lt;/P&gt;&lt;P&gt;2) What spark command you use in Hue?&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 09:42:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90389#M21654</guid>
      <dc:creator>_Umesh</dc:creator>
      <dc:date>2019-05-14T09:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90391#M21655</link>
      <description>&lt;P&gt;use command below, the job can be executed successfully.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export HADOOP_CONF_DIR=/etc/alternatives/hadoop-conf
PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python spark-submit --master yarn --deploy-mode cluster test.py&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In Hue,&amp;nbsp; open a spark snippet , select the py file, then run it. And the same code can also be executed in Hue's nodebook with yarn model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="temp.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/5619i5A47E9AD88BBB73E/image-size/large?v=v2&amp;amp;px=999" role="button" title="temp.png" alt="temp.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 10:18:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90391#M21655</guid>
      <dc:creator>kernel8liang</dc:creator>
      <dc:date>2019-05-14T10:18:59Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90394#M21656</link>
      <description>&lt;P&gt;We installed anaconda vir cdh. which is already actived.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="tt.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/5620i86BA08A4244094D3/image-size/large?v=v2&amp;amp;px=999" role="button" title="tt.png" alt="tt.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the below file:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;/run/cloudera-scm-agent/process/895-spark_on_yarn-SPARK_YARN_HISTORY_SERVER/spark-conf/spark-env.sh&lt;/PRE&gt;&lt;P&gt;we can see:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot from 2019-05-14 18-27-41.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/5621i3FB8AD6DCEB072B3/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot from 2019-05-14 18-27-41.png" alt="Screenshot from 2019-05-14 18-27-41.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2019 10:31:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90394#M21656</guid>
      <dc:creator>kernel8liang</dc:creator>
      <dc:date>2019-05-14T10:31:46Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90427#M21658</link>
      <description>&lt;P&gt;find solution here&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie" target="_blank"&gt;https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2019 03:28:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/90427#M21658</guid>
      <dc:creator>kernel8liang</dc:creator>
      <dc:date>2019-05-15T03:28:05Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/300665#M220341</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/33989"&gt;@kernel8liang&lt;/a&gt;&amp;nbsp;Could you please explain how to implement the solution?&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jul 2020 12:48:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/300665#M220341</guid>
      <dc:creator>Marek</dc:creator>
      <dc:date>2020-07-31T12:48:06Z</dc:date>
    </item>
    <item>
      <title>Re: ImportError: No module named numpy</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/300667#M220343</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the "&lt;SPAN&gt;ImportError: No module named numpy".&amp;nbsp;Would appreciate any further advice how to solve the problem.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not sure how to implement the solution referred in&lt;A href="https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie" target="_blank" rel="noopener"&gt;&amp;nbsp;https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jul 2020 14:03:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ImportError-No-module-named-numpy/m-p/300667#M220343</guid>
      <dc:creator>Marek</dc:creator>
      <dc:date>2020-07-31T14:03:18Z</dc:date>
    </item>
  </channel>
</rss>

