About falbani

falbani · ‎07-16-2018

@Mukesh Chouhan AFAIK you can't submit your jars along with the code using the livy api. You need to place them in HDFS or Livy local file system in advanced. Please if the above answers have helped remember to login and mark as Accepted.

falbani · ‎07-16-2018

I see your executor has the full path to the phoenix client jars. From local mode to yarn/client mode the most relevant change is that the executors will run on cluster worker nodes. Please try running your code like this: spark-submit \ --class com.test.SmokeTest \ --master yarn\ --deploy-mode client \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 4 \ --num-executors 2 \ --conf "spark.executor.extraClassPath=phoenix-4.7.0.2.6.2.0-205-spark2.jar:phoenix-client.jar:hbase-client.jar:phoenix-spark2-4.7.0.2.6.2.0-205.jar:hbase-common.jar:hbase-protocol.jar:phoenix-core-4.7.0.2.6.2.0-205.jar" \ --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.2.0-205-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/lib/hbase-client.jar:/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.2.0-205.jar:/usr/hdp/current/phoenix-client/lib/hbase-common.jar:/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/phoenix-client/lib/phoenix-core-4.7.0.2.6.2.0-205.jar" \ --jars /usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.2.0-205-spark2.jar,/usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/hbase-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.2.0-205.jar,/usr/hdp/current/phoenix-client/lib/hbase-common.jar,/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar,/usr/hdp/current/phoenix-client/lib/phoenix-core-4.7.0.2.6.2.0-205.jar \ --verbose \ /tmp/test-1.0-SNAPSHOT.jar And let me know if that works. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-15-2018

@Mukesh Chouhan the above example is pointing to hdfs location for the jars. Is it possible for you to do the same and upload your jars to an hdfs location. Then point to the hdfs location as I'm doing above? HTH

falbani · ‎07-15-2018

@Mukesh Chouhan Try adding jars using the jars option while posting to session like described in the livy rest documentation: https://livy.incubator.apache.org/docs/latest/rest-api.html curl -X POST -d '{"conf": {"jars": "hdfs://localhost:8020/tmp/package.jar"}}' -H "Content-Type: application/json" localhost:8998/sessions HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-15-2018

@Melchicédec NDUWAYO I tried deleting one comment and ended up deleting full thing, sorry. I have a backup of the instructions I provided initially. Later we found this was related to CORS and that Livy does not support it yet. So alternatives are using an http proxy or your web application to redirect the requests. web page --> your web app backend --> livy I recommend you use google Postman as it has a feature to generate jquery code. Here is the jquery code to start a session: var settings = { "async": true, "crossDomain": true, "url": "http://localhost:8999/sessions", "method": "POST", "headers": { "Content-Type": "application/json", "X-Requested-By": "user", "Cache-Control": "no-cache", }, "processData": false, "data": "{\"kind\": \"spark\"}" } $.ajax(settings).done(function (response) { console.log(response); }); Of course on the done function instead of printing to console you need to parse the json and grab the session id to use later as a variable. Then you will need to wait until session state transitions to idle, in oder to do this you need to execute the following request in regular intervals until state = idle: var settings = { "async": true, "crossDomain": true, "url": "http://c24-node2:8999/sessions/<session_id>", "method": "GET", "headers": { "Cache-Control": "no-cache", "Postman-Token": "60246911-890b-4f81-8635-da52e54ef633" } } $.ajax(settings).done(function (response) { console.log(response); }); Above done function needs to parse the json and check if state is idle Once state is idle we can submit code using the following jquery code: var settings = { "async": true, "crossDomain": true, "url": "http://c24-node2:8999/sessions/<session_id>/statements", "method": "POST", "headers": { "Content-Type": "application/json", "X-Requested-By": "user", "Cache-Control": "no-cache", "Postman-Token": "aece2722-cbdd-48a2-ae9a-821aba97a2b6" }, "processData": false, "data": "{\"code\": \"1 + 1\"}" } $.ajax(settings).done(function (response) { console.log(response); }); The above is pushing code "1+1" - Same as before with session we need to grab the result in done function and save the statement id to use later to retrieve the results. With session id and statement id we can retrieve the results of the execution by running the following jquery: var settings = { "async": true, "crossDomain": true, "url": "http://c24-node2:8999/sessions/<session_id>/statements/<statment_id>", "method": "GET", "headers": { "Content-Type": "application/json", "X-Requested-By": "user", "Cache-Control": "no-cache", "Postman-Token": "3be8a12e-af07-4567-83ff-e37b48974e76" } } $.ajax(settings).done(function (response) { console.log(response); }); The result of the above execution should be similar to this: { "id": 1, "code": "1 + 1", "state": "available", "output": { "status": "ok", "execution_count": 1, "data": { "text/plain": "res1: Int = 2" } }, "progress": 1 } Hopefully this will help you. Feel free to update this thread if you have any questions. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-11-2018

@Sedat Kestepe The supported spark versions with HDP 2.6.3 are spark 2.2.0/1.6.3. Other versions may or may not work, and we definitely don't recommend using other versions especially in production environments. Spark client does not need to be installed in all the cluster worker nodes, only on the edge nodes that submit the application to the cluster. As far as jar files and whether those are included or not in your application. I agree with above statement, you should avoid adding hadoop/spark library jars to your application as good practice to avoid version mismatch issues. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-11-2018

@Michael Graml By default all service users belong to public group. Service users are the users these services run as like hdfs, yarn, hive, hbase, ranger and so on. Hope that helps answer your question. *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-10-2018

@Daniel Müller Zeppelin is going to reuse the same spark application already lunched. So driver and executors will be the same ones you used to execute for first time. If you noticed, between paragraphs, if you reuse a variable/value declared/initialized before zeppelin is not going to be re-computed again because the same is already in memory. The best place to check is the spark ui as it will tell you exactly if any actual job is being lunched, if you don't see any jobs lunched you will know is using it from cache/memory. HTH

falbani · ‎07-10-2018

@Daniel Müller First time you launch any action in spark interpreter a spark application is launched in yarn cluster. If you are concerned about the timings you could check: 1. Spark interpreter log under /var/log/zeppelin/zeppelin-spark* 2. The Spark UI is another great place to see the actual jobs submitted by driver thru task manager. To get there you go to the RM UI, locate the zepplin spark application and click on Application Master link. Between these 2 hopefully you will have more information as to why is taking this long. Perhaps adding more memory or number of executors could help reduce the times, but you can consider these things after you have checked above and have a better understanding of what is happening. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-10-2018

@Team Spark I recommend you try to find a small subset of data where you see the count does not match, for example do monthly, then daily and then by hours to try to narrow down and be able to find hopefully which rows are perhaps missing on postgre. This will provide more information as you can review the rows data and hopefully find something. HTH

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: How to post a Spark Job as JAR via Livy intera...

Re: spark-submit - NoSuchMethodError :saveToPhoen...

Re: How to post a Spark Job as JAR via Livy intera...

Re: How to post a Spark Job as JAR via Livy intera...

Re: apache livy and ajax post requests: Method Not...

Re: Spark on Yarn: Do nodes need Spark installed?

Re: "Public" Group in Ranger

Re: Apache Zeppelin (HDP 2.6) - First iteration of...

Re: Apache Zeppelin (HDP 2.6) - First iteration of...

Re: PostgreSQL count higher than Spark dataframe