Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Master Guru

Running Livy on HDP 2.5

10696-livy1.png

Ingest Metrics REST API From Livy with Apache NiFi / HDF

10697-getlivyhttpstatus1.png

Use GetHTTP To Ingest The Status On Your Batch From Livy

10698-getlivybatchesstatus.png

Running Livy

The first step, we download Livy from github. To install on HDP 2.5, is simple. I found a node that wasn't too busy and put the project their.

To run, it's simple:

export SPARK_HOME=/usr/hdp/current/spark-client/ 
export HADOOP_CONF_DIR=/etc/hadoop/conf 
nohup ./livy-server &

That's it, you have a basic unprotected Livy instance running. This is important, there's no security on there. You should either put Knox in front of this or enable Livy's security.

I wanted to submit a Scala Spark Batch Job. So I wrote a quick one below to have something to call.

Source Code for Example Spark 1.6.2 Batch Application:

Step 1: GetFile

Store File /opt/demo/sparkrun.js with JSON to trigger Spark through Livy.

{"file": "/apps/Links.jar","className": "com.dataflowdeveloper.links.Links"}

Step 2: PostHTTP Make the call to Livy REST API to submit Spark job.

Step 3: PutHDFS Store results of call to Hadoop HDFS

Livy Logs

16/12/21 22:50:25 INFO LivyServer: Using spark-submit version 1.6.2
16/12/21 22:50:25 WARN RequestLogHandler: !RequestLog
16/12/21 22:50:25 INFO WebServer: Starting server on http://tspanndev11.field.hortonworks.com:8998
16/12/21 22:51:20 INFO SparkProcessBuilder: Running '/usr/hdp/current/spark-client/bin/spark-submit' '--name' 'Livy' '--class' 'com.dataflowdeveloper.links.Links' 'hdfs://hadoopserver:8020/opt/demo/links.jar' '/linkprocessor/379875e9-5d99-4f88-82b1-fda7cdd7bc98.json'
16/12/21 22:51:20 INFO SessionManager: Registering new session 0

Spark Compiled JAR File Must Be Deployed to HDFS and Be Readable

hdfs dfs -put Links.jar /appshdfs dfs -chmod 777 /apps/Links.jar

Checking YARN for Our Application

yarn application --list

Submitting a Scala Spark Job Normal Style

/bin/spark-submit --class "com.dataflowdeveloper.links.Links" --master yarn --deploy-mode cluster /opt/demo/Links.jar

Deploying a Scala Spark Application Built With SBT

scp target/scala-2.10/links.jar user@server:/opt/demo

Reference:

Liv REST API

To Submit to Livy from the Command Line

curl -X POST --data '{"file": "/opt/demo/links.jar","className": "com.dataflowdeveloper.links.Links","args": ["/linkprocessor/379875e9-5d99-4f88-82b1-fda7cdd7bc98.json"]}'  
-H "Content-Type: application/json" http://server:8998/batches

NIFI Template

livy.xml

19,125 Views
Comments
avatar
Super Collaborator

Nice! BTW, HDP 2.5 has Livy built-in. Can be found under Spark service in Ambari.

avatar
Master Guru

That Livy is only for Zeppelin, it's not safe to use that

In HDP 2.6, there will be a Livy available for general usage.

avatar
Expert Contributor

How to submit a python spark job with kerberos keytab and principal ?

avatar
Master Guru

Livy supports that is now a full citizen in HDP. I have not tried it, but post a question.

avatar
Master Guru

default port is 8999

avatar
New Contributor

Hi Team,

We have Apache Livy running on EMR . As part of our POC we need to submit spark jobs via Livy . For this we need to build connection from NiFi -> Livy which will submit a session.
While connecting to Livy via NiFi we are having issues in ExecuteSparkInteractive , here is the error :
LivySessionController[id=5c0f3fbf-f3a8-1dc8-0000-0000432c09c1] Livy Session Manager Thread run into an error, but continues to run: java.lang.NullPointerException

We have used telnet command from NiFi nodes directly to connect with Livy and that was working fine but the connection from NiFi is resisting.
From the node we are able to submit the Livy session / Spark Job.

Thanks,

Haider Naveed

avatar
Community Manager

@HaiderNaveed As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks!