- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Invoke Livy with pyFiles attribute
- Labels:
-
Apache Spark
Created 05-22-2018 07:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Platform: HDP 2.6.4
If I set –py-files in pyspark (shell mode), it works fine. However, if I set pyFiles parameter in Livy’s CURL request, it returns error “No module found”
I was able to replicate this issue on HDP sandbox as well.
Example:
Create livy/spark session:
curl -X POST --data '{"kind": "pyspark", "pyFiles" : ["/some hdfs location/splitter.py"]}' -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions
Submit livy/spark statement: Based on the response above, I extracted session id, and it was 71.
curl -X POST --data '{"code": "from splitter import getWords"}' \-H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements
Check statement status:
curl -X GET -H "Content-Type: application/json" -H "X-Requested-By: root"http://localhost:8999/sessions/71/statements
Response:
{ "id": 0, "code": "from splitter import getWords", "state": "available", "output": { "status": "error", "execution_count": 0, "ename": "ImportError", "evalue": "No module named splitter", "traceback": [ "Traceback (most recent call last):\n", "ImportError: No module named splitter\n" ] }, "progress": 1.0 }
Any ideas? pyspark shell works fine, but Livy does not. Please suggest.
Thank you
Created 05-23-2018 01:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally I was able to get it working. You need to pass 'spark.yarn.dist.pyFiles' to conf. An example:
curl -X POST --data '{"kind":"pyspark", "conf":{ "spark.yarn.dist.pyFiles" : "hdfs://sandbox-hdp.hortonworks.com:8020/user/skekatpu/pw/codebase"} }' -H "Content-Type: application/json" -H "X-Requested-By: someuserid"http://localhost:8999/sessions
...where 'codebase' is an hdfs folder containing .py modules.
Felix:
Yes, we have some flows that work with batches as well, but this particular one needs interactive connectivity to Livy, and hence /sessions needs to be used.
Created 05-22-2018 10:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@skekatpuray --py-files is for command line only. Try using spark.submit.pyFiles instead with Livy. You should add this via Spark configurations in "conf" field of REST. Check this link for more information:
Perhaps those pyFiles you should add to hdfs and point from hdfs instead from file system level, since those wont be present for Livy locally.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 05-23-2018 01:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@skekatpuray I see you are using session api instead of batches. Try running with
curl -X POST --data '{"kind":"pyspark", "conf":{ "pyFiles" : "/user/skekatpu/pw/codebase/splitter.py"} }'-H "Content-Type: application/json"-H "X-Requested-By: root" http://localhost:8999/batches
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 05-23-2018 12:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, it didn't work. Here's the request to create session:
curl -X POST --data '{"kind":"pyspark", "conf":{ "spark.submit.pyFiles" : "/user/skekatpu/pw/codebase/splitter.py"} }' -H "Content-Type: application/json" -H "X-Requested-By: root" http://localhost:8999/sessions
I retried it using fully qualified hdfs name (hdfs:///sandbox-hdp.hortonworks.com/user/skekatpu/pw/codebase/splitter.py), still didn't work. Response:
{ "id": 1, "code": "import splitter ", "state": "available", "output": { "status": "error", "execution_count": 1, "ename": "ImportError", "evalue": "No module named splitter", "traceback": [ "Traceback (most recent call last):\n", "ImportError: No module named splitter\n" ] }, "progress": 1.0 }
Created 05-23-2018 01:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so the splitter.py is in the hdfs directory /user/sketapu/pw/codebase with read/write/execute permissions?
0down vote
For people using incubating mode of livy for first time,kindly check that the template file is renamed with stripping off .template
in livy.conf.template
.Then make sure that the following configurations are present in it.
livy.spark.master = local
livy.file.local-dir-whitelist =/path/to/script/folder/
Kindly make sure that forward slash
is present in end of path
Created 05-23-2018 01:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally I was able to get it working. You need to pass 'spark.yarn.dist.pyFiles' to conf. An example:
curl -X POST --data '{"kind":"pyspark", "conf":{ "spark.yarn.dist.pyFiles" : "hdfs://sandbox-hdp.hortonworks.com:8020/user/skekatpu/pw/codebase"} }' -H "Content-Type: application/json" -H "X-Requested-By: someuserid"http://localhost:8999/sessions
...where 'codebase' is an hdfs folder containing .py modules.
Felix:
Yes, we have some flows that work with batches as well, but this particular one needs interactive connectivity to Livy, and hence /sessions needs to be used.
Created 05-23-2018 02:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for sharing the solution!
