Created 03-15-2017 09:16 AM
Hi,
I am trying to execute a python file which is stored in hdfs using livy-server. However I am getting an error as Only local python files supported.
Host: 10.140.178.24
Port:8999
hadoop fs -ls /hp -rw-r--r-- 3 root hdfs 1613 2017-03-15 12:44 /hp/pi.py
I executed the curl command with the above path for the python file
curl -X POST --data '{"file": "/hp/pi.py"}' -H "Content-Type: application/json" 10.140.178.24:8999/batches {"id":12,"state":"running","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
However when I look into the logs, i get only local python files are supported.
curl 10.140.178.24:8999/batches/12/log | python -m json.tool % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2486 100 2486 0 0 258k 0 --:--:-- --:--:-- --:--:-- 269k { "from": 0, "id": 13, "log": [ "Error: Only local python files are supported: Parsed arguments:", " master local", " deployMode client", " executorMemory null", " executorCores null", " totalExecutorCores null", " propertiesFile /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf", " driverMemory null", " driverCores null", " driverExtraClassPath null", " driverExtraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64", " driverExtraJavaOptions null", " supervise false", " queue null", " numExecutors null", " files null", " pyFiles null", " archives null", " mainClass null", " primaryResource hdfs://slave0.acme.com:8020/home/ec2-user/livy/pi.py", " name pi.py", " childArgs []", " jars null", " packages null", " packagesExclusions null", " repositories null", " verbose false", "", "Spark properties used, including those specified through", " --conf and those from the properties file /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf:", " spark.yarn.queue -> default", " spark.history.kerberos.principal -> none", " spark.executor.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64", " spark.yarn.max.executor.failures -> 3", " spark.driver.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64", " spark.yarn.historyServer.address -> slave0.acme.com:18080", " spark.eventLog.enabled -> true", " spark.history.ui.port -> 18080", " spark.history.provider -> org.apache.spark.deploy.history.FsHistoryProvider", " spark.history.fs.logDirectory -> hdfs:///spark-history", " spark.yarn.submit.file.replication -> 3", " spark.yarn.scheduler.heartbeat.interval-ms -> 5000", " spark.yarn.executor.memoryOverhead -> 384", " spark.yarn.containerLauncherMaxThreads -> 25", " spark.yarn.driver.memoryOverhead -> 384", " spark.history.kerberos.keytab -> none", " spark.eventLog.dir -> hdfs:///spark-history", " spark.yarn.preserve.staging.files -> false", " spark.master -> local", "", " .primaryResource", "Run with --help for usage help or --verbose for debug output" ], "total": 52
I am not sure why it is pointing to the local mode.I have read other posts and I have to set spark master as yarn-cluster but I am not sure where and how to set it. Kindly if someone can let me know how to resolve the above issue, it would be great. Any help would be appreciated
Created 03-15-2017 08:53 PM
@ Srinivas Santhanam Not sure if this will help, but have you tried using the --files option to pass the Python script? See the answer here for more details: https://community.hortonworks.com/comments/41935/view.html.
Created 11-13-2017 09:55 AM
I'm not sure it is possible to execute Python files in HDFS; hence the error that only local files are supported. (If you know how to make it work with HDFS let me know!)
To get this to work for me I had to manually upload my Python files to a directory on the Livy server itself. You also have to make sure that the directory in which you put the Python files is listed in the livy.file.local-dir-whitelist
property in livy.conf
on the Livy server. You might also have to restart the Livy server but I'm not sure about that as I wasn't the server admin.
After doing all this you can invoke POST /batches by giving the path to your Python file in the 'file' arg of the request. Make sure you use the "file:" protocol in the path's value. Only one forward slash is needed after the colon; example value: "file:/data/pi.py"