<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Invoke Livy with pyFiles attribute in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196147#M158201</link>
    <description>&lt;P&gt;&lt;STRONG&gt;Platform: HDP 2.6.4&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;If I set –py-files in pyspark (shell mode), it works fine.  However, if I set pyFiles parameter in Livy’s CURL request, it returns error “No module found”&lt;/P&gt;&lt;P&gt;I was able to replicate this issue on HDP sandbox as well.&lt;/P&gt;&lt;P&gt;Example: &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Create livy/spark session&lt;/STRONG&gt;:&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"kind": "pyspark", "pyFiles" : ["/some hdfs location/splitter.py"]}' -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Submit livy/spark statement&lt;/B&gt;:  Based on the response above, I extracted session id, and it was 71.&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"code": "from splitter import getWords"}' \-H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Check statement status&lt;/B&gt;:&lt;/P&gt;&lt;PRE&gt;curl -X GET -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements
&lt;/PRE&gt;&lt;P&gt;Response:&lt;/P&gt;&lt;PRE&gt;{
  "id": 0,
  "code": "from splitter import getWords",
  "state": "available",
  "output": {
    "status": "error",
    "execution_count": 0,
    "ename": "ImportError",
    "evalue": "No module named splitter",
    "traceback": [
      "Traceback (most recent call last):\n",
      "ImportError: No module named splitter\n"
    ]
  },
  "progress": 1.0
}&lt;/PRE&gt;&lt;P&gt;Any ideas?  pyspark shell works fine, but Livy does not.  Please suggest.  &lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Wed, 23 May 2018 02:18:54 GMT</pubDate>
    <dc:creator>skekatpuray</dc:creator>
    <dc:date>2018-05-23T02:18:54Z</dc:date>
    <item>
      <title>Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196147#M158201</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Platform: HDP 2.6.4&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;If I set –py-files in pyspark (shell mode), it works fine.  However, if I set pyFiles parameter in Livy’s CURL request, it returns error “No module found”&lt;/P&gt;&lt;P&gt;I was able to replicate this issue on HDP sandbox as well.&lt;/P&gt;&lt;P&gt;Example: &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Create livy/spark session&lt;/STRONG&gt;:&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"kind": "pyspark", "pyFiles" : ["/some hdfs location/splitter.py"]}' -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Submit livy/spark statement&lt;/B&gt;:  Based on the response above, I extracted session id, and it was 71.&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"code": "from splitter import getWords"}' \-H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Check statement status&lt;/B&gt;:&lt;/P&gt;&lt;PRE&gt;curl -X GET -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements
&lt;/PRE&gt;&lt;P&gt;Response:&lt;/P&gt;&lt;PRE&gt;{
  "id": 0,
  "code": "from splitter import getWords",
  "state": "available",
  "output": {
    "status": "error",
    "execution_count": 0,
    "ename": "ImportError",
    "evalue": "No module named splitter",
    "traceback": [
      "Traceback (most recent call last):\n",
      "ImportError: No module named splitter\n"
    ]
  },
  "progress": 1.0
}&lt;/PRE&gt;&lt;P&gt;Any ideas?  pyspark shell works fine, but Livy does not.  Please suggest.  &lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 02:18:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196147#M158201</guid>
      <dc:creator>skekatpuray</dc:creator>
      <dc:date>2018-05-23T02:18:54Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196148#M158202</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/47327/skekatpuray.html" nodeid="47327"&gt;@skekatpuray&lt;/A&gt; --py-files is for command line only. Try using spark.submit.pyFiles instead with Livy. You should add this via Spark configurations in "conf" field of REST. Check this link for more information:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html" target="_blank"&gt;https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Perhaps those pyFiles you should add to hdfs and point from hdfs instead from file system level, since those wont be present for Livy locally.&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;&lt;P&gt;*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 05:06:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196148#M158202</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-05-23T05:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196149#M158203</link>
      <description>&lt;P&gt;Sorry, it didn't work.  Here's the request to create session:&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"kind":"pyspark", "conf":{ "spark.submit.pyFiles" : "/user/skekatpu/pw/codebase/splitter.py"} }' -H "Content-Type: application/json" -H "X-Requested-By: root" &lt;A href="http://localhost:8999/sessions" target="_blank"&gt;http://localhost:8999/sessions&lt;/A&gt;&lt;/PRE&gt;&lt;P&gt;I retried it using fully qualified hdfs name (hdfs:///sandbox-hdp.hortonworks.com/user/skekatpu/pw/codebase/splitter.py), still didn't work.  Response:&lt;/P&gt;&lt;PRE&gt;{ "id": 1, "code": "import splitter ", "state": "available", "output": { "status": "error", "execution_count": 1, "ename": "ImportError", "evalue": "No module named splitter", "traceback": [ "Traceback (most recent call last):\n", "ImportError: No module named splitter\n" ] }, "progress": 1.0 }&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 May 2018 07:25:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196149#M158203</guid>
      <dc:creator>skekatpuray</dc:creator>
      <dc:date>2018-05-23T07:25:09Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196150#M158204</link>
      <description>&lt;P&gt;so the splitter.py is in the hdfs directory /user/sketapu/pw/codebase with read/write/execute permissions?&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html" target="_blank"&gt;https://community.hortonworks.com/articles/151164/how-to-submit-spark-application-through-livy-rest.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/46809200/submitting-python-file-in-batch-mode-in-livywithout-hadoop-installed" target="_blank"&gt;https://stackoverflow.com/questions/46809200/submitting-python-file-in-batch-mode-in-livywithout-hadoop-installed&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;0down vote
&lt;/P&gt;&lt;P&gt;For people using incubating mode of livy for first time,kindly check that the template file is renamed with stripping off &lt;CODE&gt;.template&lt;/CODE&gt; in &lt;CODE&gt;livy.conf.template&lt;/CODE&gt;.Then make sure that the following configurations are present in it.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;livy.spark.master = local
livy.file.local-dir-whitelist =/path/to/script/folder/&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Kindly make sure that &lt;CODE&gt;forward slash&lt;/CODE&gt; is present in end of path&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 08:09:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196150#M158204</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2018-05-23T08:09:25Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196151#M158205</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/47327/skekatpuray.html" nodeid="47327"&gt;@skekatpuray&lt;/A&gt; I see you are using &lt;STRONG&gt;&lt;EM&gt;session&lt;/EM&gt;&lt;/STRONG&gt; api instead of &lt;B&gt;batches. &lt;/B&gt;Try running with&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"kind":"pyspark", "conf":{ "pyFiles" : "/user/skekatpu/pw/codebase/splitter.py"} }'-H "Content-Type: application/json"-H "X-Requested-By: root" &lt;A href="http://localhost:8999/batches" target="_blank"&gt;http://localhost:8999/batches&lt;/A&gt;&lt;/PRE&gt;&lt;P&gt;HTH&lt;/P&gt;&lt;P&gt;*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 08:21:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196151#M158205</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-05-23T08:21:24Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196152#M158206</link>
      <description>&lt;P&gt;Finally I was able to get it working.  You need to pass 'spark.yarn.dist.pyFiles' to conf.  An example:&lt;/P&gt;&lt;PRE&gt;curl -X POST --data '{"kind":"pyspark", "conf":{ "spark.yarn.dist.pyFiles" : "hdfs://sandbox-hdp.hortonworks.com:8020/user/skekatpu/pw/codebase"} }' -H "Content-Type: application/json" -H "X-Requested-By: someuserid"http://localhost:8999/sessions&lt;/PRE&gt;&lt;P&gt;...where 'codebase' is an hdfs folder containing .py modules.&lt;/P&gt;&lt;P&gt;Felix:&lt;/P&gt;&lt;P&gt;Yes, we have some flows that work with batches as well, but this particular one needs interactive connectivity to Livy, and hence /sessions needs to be used.  &lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 20:59:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196152#M158206</guid>
      <dc:creator>skekatpuray</dc:creator>
      <dc:date>2018-05-23T20:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: Invoke Livy with pyFiles attribute</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196153#M158207</link>
      <description>&lt;P&gt;Thanks for sharing the solution! &lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 21:02:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Invoke-Livy-with-pyFiles-attribute/m-p/196153#M158207</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-05-23T21:02:27Z</dc:date>
    </item>
  </channel>
</rss>

