<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to run spark-submit in virtualenv for pyspark? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285702#M211979</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/54172"&gt;@rvillanueva&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please refer article&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Customer/Unable-to-start-Pyspark-jobs-when-running-with-Python-3/ta-p/272990" target="_blank"&gt;https://community.cloudera.com/t5/Customer/Unable-to-start-Pyspark-jobs-when-running-with-Python-3/ta-p/272990&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 16 Dec 2019 15:31:40 GMT</pubDate>
    <dc:creator>kshimpi</dc:creator>
    <dc:date>2019-12-16T15:31:40Z</dc:date>
    <item>
      <title>How to run spark-submit in virtualenv for pyspark?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285519#M211872</link>
      <description>&lt;P&gt;Is there a way to run&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;spark-submit&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(spark v2.3.2 from HDP 3.1.0) while in a virtualenv? Have situation where have python file that uses python3 (and some specific libs) in a virtualenv (to isolate lib versions from rest of system). I would like to run this file with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;/bin/spark-submit, but attempting to do so I get...&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN class="pun"&gt;[&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;me@myserver tests&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;]&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;$ source &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;../&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;venv&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;bin&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;activate&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;;&lt;/SPAN&gt; &lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;bin&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;spark&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;submit sparksubmit&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;test&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;py 
  &lt;/SPAN&gt;&lt;SPAN class="typ"&gt;File&lt;/SPAN&gt; &lt;SPAN class="str"&gt;"/bin/hdp-select"&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;,&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; line &lt;/SPAN&gt;&lt;SPAN class="lit"&gt;255&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;    print &lt;/SPAN&gt;&lt;SPAN class="str"&gt;"ERROR: Invalid package - "&lt;/SPAN&gt; &lt;SPAN class="pun"&gt;+&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; name
                                    &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;^&lt;/SPAN&gt;
&lt;SPAN class="typ"&gt;SyntaxError&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt; &lt;SPAN class="typ"&gt;Missing&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; parentheses &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;in&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; call to &lt;/SPAN&gt;&lt;SPAN class="str"&gt;'print'&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt; &lt;SPAN class="typ"&gt;Did&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; you mean print&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;(&lt;/SPAN&gt;&lt;SPAN class="str"&gt;"ERROR: Invalid package - "&lt;/SPAN&gt; &lt;SPAN class="pun"&gt;+&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; name&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;)?&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;ls&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; cannot access &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;usr&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;hdp&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;//&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;hadoop&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;lib&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt; &lt;SPAN class="typ"&gt;No&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; such file or directory&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;Exception&lt;/SPAN&gt; &lt;SPAN class="kwd"&gt;in&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; thread &lt;/SPAN&gt;&lt;SPAN class="str"&gt;"main"&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;lang&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;IllegalStateException&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; hdp&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;version is not &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;set&lt;/SPAN&gt; &lt;SPAN class="kwd"&gt;while&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; running &lt;/SPAN&gt;&lt;SPAN class="typ"&gt;Spark&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; under HDP&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;,&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; please &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;set&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; through HDP_VERSION &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;in&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; spark&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;env&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;sh or add a java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;opts file &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;in&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; conf with &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;Dhdp&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;version&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;=&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;xxx
    at org&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;apache&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;spark&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;launcher&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;Main&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;main&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;(&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;Main&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;118&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="com"&gt;# also tried...&lt;/SPAN&gt;
&lt;SPAN class="pun"&gt;(&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;venv&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;)&lt;/SPAN&gt; &lt;SPAN class="pun"&gt;[&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;me@myserver tests&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;]&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;$ export HADOOP_CONF_DIR&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;=/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;etc&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;hadoop&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;conf&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;;&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; spark&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;submit &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;--&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;master yarn &lt;/SPAN&gt;&lt;SPAN class="pun"&gt;--&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;deploy&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;mode cluster sparksubmit&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;test&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;py &lt;/SPAN&gt;&lt;SPAN class="lit"&gt;19&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;12&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;12&lt;/SPAN&gt; &lt;SPAN class="lit"&gt;13&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;50&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;20&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; WARN util&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;NativeCodeLoader&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt; &lt;SPAN class="typ"&gt;Unable&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; to load native&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;hadoop library &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;for&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; your platform&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;...&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; using builtin&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;java classes where applicable&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;19&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;12&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;12&lt;/SPAN&gt; &lt;SPAN class="lit"&gt;13&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;50&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;20&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; WARN shortcircuit&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;DomainSocketFactory&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt; &lt;SPAN class="typ"&gt;The&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; short&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;-&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;circuit &lt;/SPAN&gt;&lt;SPAN class="kwd"&gt;local&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; reads feature cannot be used because libhadoop cannot be loaded&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;
&lt;SPAN class="typ"&gt;Exception&lt;/SPAN&gt; &lt;SPAN class="kwd"&gt;in&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; thread &lt;/SPAN&gt;&lt;SPAN class="str"&gt;"main"&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;lang&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;NoClassDefFoundError&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; com&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;sun&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;jersey&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;api&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;client&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;config&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;/&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;ClientConfig&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;    at org&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;apache&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;hadoop&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;yarn&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;client&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;api&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;TimelineClient&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;createTimelineClient&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;(&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;TimelineClient&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="lit"&gt;55&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="pun"&gt;....&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;    at org&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;apache&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;spark&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;deploy&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;SparkSubmit&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;main&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;(&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;SparkSubmit&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;scala&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;)&lt;/SPAN&gt;
&lt;SPAN class="typ"&gt;Caused&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; by&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; java&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;lang&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;ClassNotFoundException&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;:&lt;/SPAN&gt;&lt;SPAN class="pln"&gt; com&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;sun&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;jersey&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;api&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;client&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="pln"&gt;config&lt;/SPAN&gt;&lt;SPAN class="pun"&gt;.&lt;/SPAN&gt;&lt;SPAN class="typ"&gt;ClientConfig&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;Not sure what to make of this or how to proceed further and did not totally understand the error message after googling it.&lt;/P&gt;
&lt;P&gt;Anyone with more experience have any further debugging tips for this or fixes?&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 00:15:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285519#M211872</guid>
      <dc:creator>rvillanueva</dc:creator>
      <dc:date>2019-12-13T00:15:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark-submit in virtualenv for pyspark?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285540#M211891</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/54172"&gt;@rvillanueva&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There seems to be couple of issues:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Issue-1.&lt;/STRONG&gt; The other issue seems to be related to Python3. Because Python3 does not support print statements without parentheses. Thats why you are getting this error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;File "/bin/hdp-select", line 255 print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?&lt;/LI-CODE&gt;&lt;P&gt;Please refer to the following thread for similar discussion.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Support-Questions/Spark-submit-error-with-Python3-on-Hortonworks-sandbox-VM/td-p/230117" target="_blank"&gt;https://community.cloudera.com/t5/Support-Questions/Spark-submit-error-with-Python3-on-Hortonworks-sandbox-VM/td-p/230117&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Support-Questions/HDP3-0-livy-server-cannot-start/td-p/231126" target="_blank"&gt;https://community.cloudera.com/t5/Support-Questions/HDP3-0-livy-server-cannot-start/td-p/231126&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Try using Python2.7&lt;/EM&gt; (Instead of Python 3) because the script "/bin/hdp-select" contains many "print" statements without parentheses. But Python3 expects that all the 'print' statements must be in parentheses.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# grep 'print ' /bin/hdp-select&lt;/LI-CODE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Issue-2.&lt;/STRONG&gt; The following line indicates that somewhere in your code or "../venv/bin/activate" or "sparksubmit.test.py " script you might have set incorrect Path.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ls: cannot access /usr/hdp//hadoop/lib: No such file or directory&lt;/LI-CODE&gt;&lt;P&gt;This is because the correct path should be "/usr/hdp/&lt;STRONG&gt;current&lt;/STRONG&gt;/hadoop/lib".&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;NOTICE&lt;/STRONG&gt;&lt;/EM&gt; the "current" is missing in your case.&lt;BR /&gt;(In your environment looks like some where it is coming as Blank "/usr/hdp&lt;STRONG&gt;//&lt;/STRONG&gt;hadoop/lib")&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Issue-3.&lt;/STRONG&gt; The "ClassNotFoundException" related errors are side effect of the above point where we see that the corret lib directory path is not present because in your printed path "current" is missing in "/usr/hdp/current/hadoop/lib" so the correct JARs are not getting included in the CLASSPATH..&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 06:35:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285540#M211891</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2019-12-13T06:35:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark-submit in virtualenv for pyspark?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285541#M211892</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/54172"&gt;@rvillanueva&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In addition to my previous comment also please refer to:&amp;nbsp;&lt;A href="https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/running-spark-applications/content/setting_path_variables_for_python.html" target="_blank"&gt;https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/running-spark-applications/content/setting_path_variables_for_python.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 06:38:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285541#M211892</guid>
      <dc:creator>jsensharma</dc:creator>
      <dc:date>2019-12-13T06:38:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark-submit in virtualenv for pyspark?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285588#M211925</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/50614"&gt;@jsensharma&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Need to use python3 and would like to continue to do so in the future considering that python2 will stop being maintained in &lt;A href="https://pythonclock.org/" target="_self"&gt;2020&lt;/A&gt;&amp;nbsp;(I would think others would have a similar desire as well) and am currently adding the option&lt;/P&gt;&lt;PRE&gt;export PYSPARK_PYTHON=/path/to/my/virtualenv/bin/python; spark-submit sparksubmit.test.py&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;as a workaround (else, this may be helpful: &lt;A href="https://stackoverflow.com/a/51508990/8236733" target="_self"&gt;https://stackoverflow.com/a/51508990/8236733&lt;/A&gt; or using the --pyfiles option).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2. IDK where that path reference is coming from since&amp;nbsp;&lt;SPAN&gt;"../venv/bin/activate" is just activating a virtualenv and "sparksubmit.test.py" code is just&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;from os import environ
import time
import pprint
import platform

pp = pprint.PrettyPrinter(indent=4)

sparkSession = SparkSession.builder.appName("TEST").getOrCreate()
sparkSession._jsc.sc().setLogLevel("WARN")

print(platform.python_version())

def testfunc(num: int) -&amp;gt; str:
    return "type annotations look ok"
print(testfunc(1))

print("\n\nYou are using %d nodes in this session\n\n" % sparkSession._jsc.sc().getExecutorMemoryStatus().keySet().size())

pp.pprint(sparkSession.sparkContext._conf.getAll())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;but that blank space in&amp;nbsp;"/usr/hdp//hadoop/lib" is interesting to see, especially since I use&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;export HADOOP_CONF_DIR=/etc/hadoop/conf&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;for the HADOOP_CONF_DIR in the terminal when trying to run the command. Furthermore, looking at my (client node) FS, I don't even see that path...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[airflow@airflowetl tests]$ ls -lha /usr/hdp/current/hadoop-
hadoop-client/                  hadoop-httpfs
hadoop-hdfs-client/             hadoop-mapreduce-client/
hadoop-hdfs-datanode/           hadoop-mapreduce-historyserver/
hadoop-hdfs-journalnode/        hadoop-yarn-client/
hadoop-hdfs-namenode/           hadoop-yarn-nodemanager/
hadoop-hdfs-nfs3/               hadoop-yarn-registrydns/
hadoop-hdfs-portmap/            hadoop-yarn-resourcemanager/
hadoop-hdfs-secondarynamenode/  hadoop-yarn-timelinereader/
hadoop-hdfs-zkfc/               hadoop-yarn-timelineserver/
[airflow@airflowetl tests]$ ls -lha /usr/hdp/current/hadoop
ls: cannot access /usr/hdp/current/hadoop: No such file or directory&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;(note I am using HDP v3.1.0)&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 21:06:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285588#M211925</guid>
      <dc:creator>rvillanueva</dc:creator>
      <dc:date>2019-12-13T21:06:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to run spark-submit in virtualenv for pyspark?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285702#M211979</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/54172"&gt;@rvillanueva&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please refer article&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Customer/Unable-to-start-Pyspark-jobs-when-running-with-Python-3/ta-p/272990" target="_blank"&gt;https://community.cloudera.com/t5/Customer/Unable-to-start-Pyspark-jobs-when-running-with-Python-3/ta-p/272990&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2019 15:31:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-run-spark-submit-in-virtualenv-for-pyspark/m-p/285702#M211979</guid>
      <dc:creator>kshimpi</dc:creator>
      <dc:date>2019-12-16T15:31:40Z</dc:date>
    </item>
  </channel>
</rss>

