Support Questions

Find answers, ask questions, and share your expertise

Unable to submit a python file in livy-server

avatar
Contributor

Hi,

I am trying to execute a python file which is stored in hdfs using livy-server. However I am getting an error as Only local python files supported.

Host: 10.140.178.24

Port:8999

hadoop fs -ls /hp

-rw-r--r-- 3 root hdfs 1613 2017-03-15 12:44 /hp/pi.py

I executed the curl command with the above path for the python file

curl -X POST --data '{"file": "/hp/pi.py"}' -H "Content-Type: application/json" 10.140.178.24:8999/batches

{"id":12,"state":"running","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}

However when I look into the logs, i get only local python files are supported.

 curl 10.140.178.24:8999/batches/12/log |  python -m json.tool


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2486  100  2486    0     0   258k      0 --:--:-- --:--:-- --:--:--  269k
{
    "from": 0,
    "id": 13,
    "log": [
        "Error: Only local python files are supported: Parsed arguments:",
        "  master                  local",
        "  deployMode              client",
        "  executorMemory          null",
        "  executorCores           null",
        "  totalExecutorCores      null",
        "  propertiesFile          /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf",
        "  driverMemory            null",
        "  driverCores             null",
        "  driverExtraClassPath    null",
        "  driverExtraLibraryPath  /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  driverExtraJavaOptions  null",
        "  supervise               false",
        "  queue                   null",
        "  numExecutors            null",
        "  files                   null",
        "  pyFiles                 null",
        "  archives                null",
        "  mainClass               null",
        "  primaryResource         hdfs://slave0.acme.com:8020/home/ec2-user/livy/pi.py",
        "  name                    pi.py",
        "  childArgs               []",
        "  jars                    null",
        "  packages                null",
        "  packagesExclusions      null",
        "  repositories            null",
        "  verbose                 false",
        "",
        "Spark properties used, including those specified through",
        " --conf and those from the properties file /usr/hdp/current/spark-thriftserver/conf/spark-defaults.conf:",
        "  spark.yarn.queue -> default",
        "  spark.history.kerberos.principal -> none",
        "  spark.executor.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  spark.yarn.max.executor.failures -> 3",
        "  spark.driver.extraLibraryPath -> /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64",
        "  spark.yarn.historyServer.address -> slave0.acme.com:18080",
        "  spark.eventLog.enabled -> true",
        "  spark.history.ui.port -> 18080",
        "  spark.history.provider -> org.apache.spark.deploy.history.FsHistoryProvider",
        "  spark.history.fs.logDirectory -> hdfs:///spark-history",
        "  spark.yarn.submit.file.replication -> 3",
        "  spark.yarn.scheduler.heartbeat.interval-ms -> 5000",
        "  spark.yarn.executor.memoryOverhead -> 384",
        "  spark.yarn.containerLauncherMaxThreads -> 25",
        "  spark.yarn.driver.memoryOverhead -> 384",
        "  spark.history.kerberos.keytab -> none",
        "  spark.eventLog.dir -> hdfs:///spark-history",
        "  spark.yarn.preserve.staging.files -> false",
        "  spark.master -> local",
        "",
        "    .primaryResource",
        "Run with --help for usage help or --verbose for debug output"
    ],
    "total": 52



I am not sure why it is pointing to the local mode.I have read other posts and I have to set spark master as yarn-cluster but I am not sure where and how to set it. Kindly if someone can let me know how to resolve the above issue, it would be great. Any help would be appreciated

2 REPLIES 2

avatar
Expert Contributor

@ Srinivas Santhanam Not sure if this will help, but have you tried using the --files option to pass the Python script? See the answer here for more details: https://community.hortonworks.com/comments/41935/view.html.

avatar

I'm not sure it is possible to execute Python files in HDFS; hence the error that only local files are supported. (If you know how to make it work with HDFS let me know!)

To get this to work for me I had to manually upload my Python files to a directory on the Livy server itself. You also have to make sure that the directory in which you put the Python files is listed in the livy.file.local-dir-whitelist property in livy.conf on the Livy server. You might also have to restart the Livy server but I'm not sure about that as I wasn't the server admin.

After doing all this you can invoke POST /batches by giving the path to your Python file in the 'file' arg of the request. Make sure you use the "file:" protocol in the path's value. Only one forward slash is needed after the colon; example value: "file:/data/pi.py"