Support Questions
Find answers, ask questions, and share your expertise

Using Spark Livy REST

Highlighted

Using Spark Livy REST

I am currently trying to use Livy's REST api to submit spark-streaming jobs to my cluster.

These are the POST call I am using right now.

curl -X POST --data '{"file": "hdfs://tmp/kafka-to-elastic.jar","className": "com.foo.bar.Zaz","args": ["foo","bar","zaz"]}'  -H "Content-Type: application/json" http://livyserver:8998/batches<br>;
curl -X POST --data '{"file": "/tmp/kafka-to-elastic.jar","className": "com.foo.bar.Zaz","args": ["foo","bar","zaz"]}'  -H "Content-Type: application/json" http://livyserver:8998/batches

The .jar file is in the HDFS and file permissions allow reading for all users.

The configuration option spark-master is set to yarn-cluster.

However no matter which file nor class I provide in this POST request, I obtain the following log output and an error:

    {
      "id": 2,
      "state": "error",
      "log": [
        "\tat java.lang.ClassLoader.loadClass(ClassLoader.java:424)",
        "\tat java.lang.ClassLoader.loadClass(ClassLoader.java:357)",
        "\tat java.lang.Class.forName0(Native Method)",
        "\tat java.lang.Class.forName(Class.java:348)",
        "\tat org.apache.spark.util.Utils$.classForName(Utils.scala:175)",
        "\tat org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)",
        "\tat org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)",
        "\tat org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)",
        "\tat org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)",
        "\tat org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)"
      ]
    }

What I am missing here? Why it seems like Livy is unable to locate the jar?

6 REPLIES 6
Highlighted

Re: Using Spark Livy REST

Super Guru

I can't parse exactly what the issue is via the small error output. however i can tell you livy was not intended to be used outside of Zeppelin on HDP 2.5. Yes livy has the capablites to interact with spark without zeppelin, but this was not the intention in hdp 2.5. so my knee jerk reaction is it may not work based on the integration pattern set for hdp 2.5

Highlighted

Re: Using Spark Livy REST

Expert Contributor

Try "-d" instead of "--data"

Highlighted

Re: Using Spark Livy REST

Super Guru

You will have to install a separate Livy server from the one that is for Zeppelin only.

curl -k --user "<hdinsight user>:<user password>" -v -H <content-type> -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches'

http://livy.io/quickstart.html

Highlighted

Re: Using Spark Livy REST

Expert Contributor

Could you please explain why a separate livy server would be needed?

Highlighted

Re: Using Spark Livy REST

Super Guru

Livy is configured just for Zeppelin and you don't want to change any settings that could break Zeppelin.

Livy is a pretty small REST process and safest to install it separately.

That may change in HDP 2.6, we don't know. Or Livy could be hidden or removed. Since you want to use after updates, safer to install your own. If HDP announces a Livy for general use in the future, then you can switch.

Re: Using Spark Livy REST

@Jose Luis Navarro Vicente

This is the working Spark Pi example for your reference.

curl -X POST -H "Content-Type: application/json"  localhost:8998/batches --data '{ "conf": {"spark.master":"yarn-cluster"}, "file": "file:///usr/hdp/current/spark-client/lib/spark-examples-1.6.1.2.5.0.0-817-hadoop2.7.1.2.5.0.0-817.jar", "className": "org.apache.spark.examples.SparkPi", "name": "Scala Livy Pi Example", "executorCores":1, "executorMemory":"512m", "driverCores":1, "driverMemory":"512m", "queue":"default", "args":["100"]}'