Created 12-02-2016 01:48 PM
I am currently trying to use Livy's REST api to submit spark-streaming jobs to my cluster.
These are the POST call I am using right now.
curl -X POST --data '{"file": "hdfs://tmp/kafka-to-elastic.jar","className": "com.foo.bar.Zaz","args": ["foo","bar","zaz"]}' -H "Content-Type: application/json" http://livyserver:8998/batches<br>;
curl -X POST --data '{"file": "/tmp/kafka-to-elastic.jar","className": "com.foo.bar.Zaz","args": ["foo","bar","zaz"]}' -H "Content-Type: application/json" http://livyserver:8998/batches
The .jar file is in the HDFS and file permissions allow reading for all users.
The configuration option spark-master is set to yarn-cluster.
However no matter which file nor class I provide in this POST request, I obtain the following log output and an error:
{ "id": 2, "state": "error", "log": [ "\tat java.lang.ClassLoader.loadClass(ClassLoader.java:424)", "\tat java.lang.ClassLoader.loadClass(ClassLoader.java:357)", "\tat java.lang.Class.forName0(Native Method)", "\tat java.lang.Class.forName(Class.java:348)", "\tat org.apache.spark.util.Utils$.classForName(Utils.scala:175)", "\tat org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)", "\tat org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)", "\tat org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)", "\tat org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)", "\tat org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)" ] }
What I am missing here? Why it seems like Livy is unable to locate the jar?
Created 12-08-2016 03:26 PM
I can't parse exactly what the issue is via the small error output. however i can tell you livy was not intended to be used outside of Zeppelin on HDP 2.5. Yes livy has the capablites to interact with spark without zeppelin, but this was not the intention in hdp 2.5. so my knee jerk reaction is it may not work based on the integration pattern set for hdp 2.5
Created 12-21-2016 03:26 AM
Try "-d" instead of "--data"
Created 12-21-2016 03:31 AM
You will have to install a separate Livy server from the one that is for Zeppelin only.
curl -k --user "<hdinsight user>:<user password>" -v -H <content-type> -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches'
Created 12-21-2016 07:47 PM
Could you please explain why a separate livy server would be needed?
Created 12-22-2016 04:32 PM
Livy is configured just for Zeppelin and you don't want to change any settings that could break Zeppelin.
Livy is a pretty small REST process and safest to install it separately.
That may change in HDP 2.6, we don't know. Or Livy could be hidden or removed. Since you want to use after updates, safer to install your own. If HDP announces a Livy for general use in the future, then you can switch.
Created 01-05-2017 08:57 PM
This is the working Spark Pi example for your reference.
curl -X POST -H "Content-Type: application/json" localhost:8998/batches --data '{ "conf": {"spark.master":"yarn-cluster"}, "file": "file:///usr/hdp/current/spark-client/lib/spark-examples-1.6.1.2.5.0.0-817-hadoop2.7.1.2.5.0.0-817.jar", "className": "org.apache.spark.examples.SparkPi", "name": "Scala Livy Pi Example", "executorCores":1, "executorMemory":"512m", "driverCores":1, "driverMemory":"512m", "queue":"default", "args":["100"]}'