Created 03-15-2017 09:16 AM
Hi,
I am trying to execute a python file which is stored in hdfs using livy-server. However I am getting an error as Only local python files supported.
Host: 10.140.178.24
Port:8999
hadoop fs -ls /hp -rw-r--r-- 3 root hdfs 1613 2017-03-15 12:44 /hp/pi.py
I executed the curl command with the above path for the python file
curl -X POST --data '{"file": "/hp/pi.py"}' -H "Content-Type: application/json" 10.140.178.24:8999/batches
{"id":12,"state":"running","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
However when I look into the logs, i get only local python files are supported.
 curl 10.140.178.24:8999/batches/12/log |  python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2486  100  2486    0     0   258k      0 --:--:-- --:--:-- --:--:--  269k
{
    "from": 0,
    "id": 13,
    "log": [
        "Error: Only local python files are supported: Parsed arguments:",
        "  master                  local",
        "  deployMode              client",
        "  executorMemory          null",
        "  executorCores           null",
        "  totalExecutorCores      null",
        "  propertiesFile          /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf",
        "  driverMemory            null",
        "  driverCores             null",
        "  driverExtraClassPath    null",
        "  driverExtraLibraryPath  /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  driverExtraJavaOptions  null",
        "  supervise               false",
        "  queue                   null",
        "  numExecutors            null",
        "  files                   null",
        "  pyFiles                 null",
        "  archives                null",
        "  mainClass               null",
        "  primaryResource         hdfs://slave0.acme.com:8020/home/ec2-user/livy/pi.py",
        "  name                    pi.py",
        "  childArgs               []",
        "  jars                    null",
        "  packages                null",
        "  packagesExclusions      null",
        "  repositories            null",
        "  verbose                 false",
        "",
        "Spark properties used, including those specified through",
        " --conf and those from the properties file /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf:",
        "  spark.yarn.queue -> default",
        "  spark.history.kerberos.principal -> none",
        "  spark.executor.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  spark.yarn.max.executor.failures -> 3",
        "  spark.driver.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  spark.yarn.historyServer.address -> slave0.acme.com:18080",
        "  spark.eventLog.enabled -> true",
        "  spark.history.ui.port -> 18080",
        "  spark.history.provider -> org.apache.spark.deploy.history.FsHistoryProvider",
        "  spark.history.fs.logDirectory -> hdfs:///spark-history",
        "  spark.yarn.submit.file.replication -> 3",
        "  spark.yarn.scheduler.heartbeat.interval-ms -> 5000",
        "  spark.yarn.executor.memoryOverhead -> 384",
        "  spark.yarn.containerLauncherMaxThreads -> 25",
        "  spark.yarn.driver.memoryOverhead -> 384",
        "  spark.history.kerberos.keytab -> none",
        "  spark.eventLog.dir -> hdfs:///spark-history",
        "  spark.yarn.preserve.staging.files -> false",
        "  spark.master -> local",
        "",
        "    .primaryResource",
        "Run with --help for usage help or --verbose for debug output"
    ],
    "total": 52
I am not sure why it is pointing to the local mode.I have read other posts and I have to set spark master as yarn-cluster but I am not sure where and how to set it. Kindly if someone can let me know how to resolve the above issue, it would be great. Any help would be appreciated
Created 03-15-2017 08:53 PM
@ Srinivas Santhanam Not sure if this will help, but have you tried using the --files option to pass the Python script? See the answer here for more details: https://community.hortonworks.com/comments/41935/view.html.
Created 11-13-2017 09:55 AM
I'm not sure it is possible to execute Python files in HDFS; hence the error that only local files are supported. (If you know how to make it work with HDFS let me know!)
To get this to work for me I had to manually upload my Python files to a directory on the Livy server itself. You also have to make sure that the directory in which you put the Python files is listed in the livy.file.local-dir-whitelist property in livy.conf on the Livy server. You might also have to restart the Livy server but I'm not sure about that as I wasn't the server admin.
After doing all this you can invoke POST /batches by giving the path to your Python file in the 'file' arg of the request. Make sure you use the "file:" protocol in the path's value. Only one forward slash is needed after the colon; example value: "file:/data/pi.py"
 
					
				
				
			
		
