Created on 05-26-2017 12:12 PM - edited 08-18-2019 12:56 AM
Hi All,
We have setup HDP 2.6 with Zeppelin 0.7 and have kerberos and Ranger (admin, usersync) installed in our cluster. Zeppelin and Livy Server are installed on the same server and below is a screen shot of our configuration
The livy.superuser value is set to the same value as zeppelin.livy.principal user (zeppelin-gettranshdpdev)
Below are the different keytab valuesKeytab name: FILE:livy.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 05/26/2017 06:12:47 livy/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 1 05/26/2017 06:12:47 livy/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 1 05/26/2017 06:12:47 livy/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 1 05/26/2017 06:12:47 livy/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 1 05/26/2017 06:12:47 livy/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM Keytab name: FILE:zeppelin.server.kerberos.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 05/22/2017 08:30:25 zeppelin-gettranshdpdev@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 zeppelin-gettranshdpdev@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 zeppelin-gettranshdpdev@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 zeppelin-gettranshdpdev@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 zeppelin-gettranshdpdev@TRANSPORTATION-HDPDEV.GE.COM Keytab name: FILE:spnego.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 05/22/2017 08:30:25 HTTP/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 HTTP/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 HTTP/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 HTTP/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM 2 05/22/2017 08:30:25 HTTP/ip-10-228-3-142.ec2.internal@TRANSPORTATION-HDPDEV.GE.COM
I checked Ambari and below are the configs for the proxy
Below is a stack trace of the exception
org.springframework.web.client.HttpClientErrorException: 400 Bad Request at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91) at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:667) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:620) at org.springframework.security.kerberos.client.KerberosRestTemplate.doExecuteSubject(KerberosRestTemplate.java:202) at org.springframework.security.kerberos.client.KerberosRestTemplate.access$100(KerberosRestTemplate.java:67) at org.springframework.security.kerberos.client.KerberosRestTemplate$1.run(KerberosRestTemplate.java:191) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.springframework.security.kerberos.client.KerberosRestTemplate.doExecute(KerberosRestTemplate.java:187) at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:580) at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:498) at org.apache.zeppelin.livy.BaseLivyInterprereter.callRestAPI(BaseLivyInterprereter.java:406) at org.apache.zeppelin.livy.BaseLivyInterprereter.createSession(BaseLivyInterprereter.java:191) at org.apache.zeppelin.livy.BaseLivyInterprereter.initLivySession(BaseLivyInterprereter.java:98) at org.apache.zeppelin.livy.BaseLivyInterprereter.open(BaseLivyInterprereter.java:80) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482) at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
Created 06-23-2017 06:27 PM
Few things to check:
1) Just making sure that by 'hadoop.proxyuser.zeppelin-clustername.hosts = * ' in your description, you mean ' hadoop.proxyuser.zeppelin-gettranshdpdev.hosts = * ', correct?
2) Have you enabled zeppelin's authentication? If so, the user that you are logged in as into zeppelin's UI is present on your cluster as an actual linux user?
3) That user's HDFS directory should be present i.e. /user/xyz in HDFS
4) It will be helpful to post the paragraph that you are trying to run, just to make sure its not an errorneous paragraph
5) If everything above is correct, it would be helpful to follow these steps to send some curl requests to livy server: http://gethue.com/how-to-use-the-livy-spark-rest-job-server-for-interactive-spark-2-2/
This will help to isolate the problem i.e. whether livy server is having issue or zeppelin's livy interpreter has issues?
Created 06-29-2017 07:26 AM
@Kshitij Badani I work with Jayadeep, we have made sure of all the points that you mentioned. For point 5) we are able to use livy commands from command prompt.
output:
* upload completely sent off: 81 out of 81 bytes < HTTP/1.1 201 Created < Date: Thu, 29 Jun 2017 07:10:37 GMT < WWW-Authenticate: Negotiate YGoGCSqGSIb3EgECAgIAb1swWaADAgEFoQMCAQ+iTTBLoAMCARKiRARCwjfJg+Z8lYE1nmmiIPQB0gb3flO96lTm/elABws1vT02CKl+KcHkCHUObklGVgZwebtCN73AhZSQy60+d2LnYdWG < Set-Cookie: hadoop.auth="u=talend&p=talend@TRANSPORTATION-HDPDEV.GE.COM&t=kerberos&e=1498756237672&s=Kkj7P3Ig2g06wogRIzZQimhX1gQ="; HttpOnly < Content-Type: application/json; charset=UTF-8 < Location: /batches/34 < Content-Length: 100 < Server: Jetty(9.2.16.v20160414) < * Closing connection 0 {"id":34,"state":"starting","appId":null,"appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}[talend@ip-10-235-3-142 ~]$ curl -u: --negotiate -H "X-Requested-By: user" http://10.228.3.142:9889/batches/34 {"id":34,"state":"dead","appId":"application_1498720050151_0001","appInfo":{"driverLogUrl":"http://ip-10-235-0-154.ec2.internal:8188/applicationhistory/logs/ip-10-228-1-148.ec2.internal:45454/container_e82_1498720050151_0001_02_000001/container_e82_1498720050151_0001_02_000001/talend","sparkUiUrl":"http://ip-10-235-2-223.ec2.internal:8088/proxy/application_1498720050151_0001/"},"log":["\t ApplicationMaster RPC port: -1","\t queue: default","\t start time: 1498720241709","\t final status: UNDEFINED","\t tracking URL: http://10.228.3.142:9889/batches/34 user: talend","17/06/29 03:10:41 INFO ShutdownHookManager: Shutdown hook called","17/06/29 03:10:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-c3c670df-280a-46e0-82fd-7ecc4efc5ef2","YARN Diagnostics:","User application exited with status 1"]}
The problem is only from livy zeppelin. The commands that we are trying in livy is as below:
%livy.pyspark
print ("Hello")
The log that we are getting is as below:
org.apache.zeppelin.livy.LivyException: Session 24 is finished, appId: null, log: [java.lang.Exception: No YARN application is found with tag livy-session-24-mbc0jh8y in 60 seconds. Please check your cluster status, it is may be very busy., com.cloudera.livy.utils.SparkYarnApp.com$cloudera$livy$utils$SparkYarnApp$getAppIdFromTag(SparkYarnApp.scala:182) com.cloudera.livy.utils.SparkYarnApp$anonfun$1$anonfun$4.apply(SparkYarnApp.scala:248) com.cloudera.livy.utils.SparkYarnApp$anonfun$1$anonfun$4.apply(SparkYarnApp.scala:245) scala.Option.getOrElse(Option.scala:120) com.cloudera.livy.utils.SparkYarnApp$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:245) com.cloudera.livy.Utils$anon$1.run(Utils.scala:95)] at org.apache.zeppelin.livy.BaseLivyInterprereter.createSession(BaseLivyInterprereter.java:221) at org.apache.zeppelin.livy.BaseLivyInterprereter.initLivySession(BaseLivyInterprereter.java:110) at org.apache.zeppelin.livy.BaseLivyInterprereter.open(BaseLivyInterprereter.java:92) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Can you please help us to get this working.
Created 06-29-2017 04:44 PM
I see that Livy is not able to launch the yarn application. Can you paste your livy server log? It requires 3 Yarn containers to launch a livy spark app, so please also check if your cluster is busy or not?
Created 06-29-2017 06:19 PM
The cluster is completely free and there are no other jobs running. we have about 1TB memory, 256Vcores, 8 data nodes.
livy server log has below contents: Predominant one is "ERROR RSCClient: Failed to connect to context"
Please let us know you thoughts.
17/06/29 14:02:11 INFO InteractiveSession$: Creating LivyClient for sessionId: 135 17/06/29 14:02:11 WARN RSCConf: Your hostname, ip-10-228-2-223.ec2.internal, resolves to a loopback address, but we couldn't find any external IP address! 17/06/29 14:02:11 WARN RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address. 17/06/29 14:02:11 INFO InteractiveSessionManager: Registering new session 135 17/06/29 14:02:12 INFO ContextLauncher: 17/06/29 14:02:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/06/29 14:02:12 INFO ContextLauncher: Exception in thread "main" java.lang.IllegalArgument Exception: For input string: "yes" 17/06/29 14:02:12 INFO ContextLauncher: at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:238) 17/06/29 14:02:12 INFO ContextLauncher: at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:226) 17/06/29 14:02:12 INFO ContextLauncher: at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:31) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:337) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:337) 17/06/29 14:02:12 INFO ContextLauncher: at scala.Option.map(Option.scala:145) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.SparkConf.getBoolean(SparkConf.scala:337) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.util.Utils$.isDynamicAllocationEnabled(Utils.scala:2283) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:56) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1185) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.yarn.Client.main(Client.scala) 17/06/29 14:02:12 INFO ContextLauncher: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 17/06/29 14:02:12 INFO ContextLauncher: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 17/06/29 14:02:12 INFO ContextLauncher: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 17/06/29 14:02:12 INFO ContextLauncher: at java.lang.reflect.Method.invoke(Method.java:498) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:161) 17/06/29 14:02:12 INFO ContextLauncher: at java.security.AccessController.doPrivileged(Native Method) 17/06/29 14:02:12 INFO ContextLauncher: at javax.security.auth.Subject.doAs(Subject.java:422) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:161) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 17/06/29 14:02:12 INFO ContextLauncher: at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 14:02:12 WARN ContextLauncher: Child process exited with code 1. 17/06/29 14:02:12 ERROR RSCClient: Failed to connect to context. java.io.IOException: Child process exited with code 1. at com.cloudera.livy.rsc.ContextLauncher$ChildProcess$1.run(ContextLauncher.java:416) at com.cloudera.livy.rsc.ContextLauncher$ChildProcess$2.run(ContextLauncher.java:490) at java.lang.Thread.run(Thread.java:745) 17/06/29 14:02:12 INFO RSCClient: Failing pending job 2889f619-dc65-4364-9203-c1caf581ea7e due to shutdown. 17/06/29 14:02:12 INFO InteractiveSession: Failed to ping RSC driver for session 135. Killing application. 17/06/29 14:02:12 INFO InteractiveSession: Stopping InteractiveSession 135... 17/06/29 14:03:11 ERROR SparkYarnApp: Error whiling refreshing YARN state: java.lang.Exception: No YARN application is found with tag livy-session-135-nl80pq2a in 60 seconds. Please check your cluster status, it is may be very busy. 17/06/29 14:03:11 INFO InteractiveSession: Stopped InteractiveSession 135. 17/06/29 14:03:11 WARN InteractiveSession: (Fail to get rsc uri,java.util.concurrent.ExecutionException: java.io.IOException: Child process exited with code 1.)
Created 06-29-2017 06:24 PM
Not sure, but looks like this : In either livy interpreter configs or spark configs in ambari - somewhere it requires a boolean value (true or false) and you might have configured it to "yes". Can you check your configs whether somewhere you have used "yes" string?
Created 06-25-2017 06:50 AM
- Make sure you have also the below proxyuser at core-site.xml
hadoop.proxyuser.livy.hosts=* hadoop.proxyuser.livy.groups=*
- Create a new livy interpreter and check how this helps
Created 06-26-2017 05:53 AM
Also check the livy server log