Created 06-20-2017 04:59 AM
I have setup a Livy Interpreter through Zeppelin and am trying to run the simple
%livy.pyspark sc.version
Cannot Start Spark
but to no avail.
%spark
sc.version
res10: String = 1.6.2
however, returns the version just fine.
The livy interpreter configs look like such:
livy.spark.master yarn-cluster zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/.... zeppelin.livy.concurrentSQL false zeppelin.livy.create.session.retries 120 zeppelin.livy.keytab /<location_of_keytab>/zsk.keytab zeppelin.livy.principal <zeppelin_principal> zeppelin.livy.url http://<hostname>:8998
I have followed the instructions provided here https://community.hortonworks.com/articles/80059/how-to-configure-zeppelin-livy-interpreter-for-sec.... in entirety.
The cluster is Kerberized as well as Zeppelin being synced to Active Directory. Also, the Resource Managers are in HA and I am seeing a few errors in the livy log regarding refused connections to :8032 (Default Port for RM Admin) See Below for Stacktrace:
WARN Client: Failed to connect to server: <Hostname>/<IP>:8032: retries get failed due to exceeded maximum allowed retries number: 0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745) at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618) at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy16.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy17.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:436) at com.cloudera.livy.sessions.SessionManager$$anonfun$2.apply(SessionManager.scala:108) at com.cloudera.livy.sessions.SessionManager$$anonfun$2.apply(SessionManager.scala:105) at scala.collection.immutable.List.foreach(List.scala:318) at com.cloudera.livy.sessions.SessionManager.checkAppState(SessionManager.scala:105) at com.cloudera.livy.sessions.SessionManager$SessionAppStateMonitor.run(SessionManager.scala:142) 17/03/21 15:53:51 INFO ConfiguredRMFailoverProxyProvider: Failing over to rm2
Any help would be appreciated! Thank you very much!
Edit: Including some more of the logs from livy-livy-server.out
INFO: 17/03/22 08:17:44 INFO Client: INFO: client token: Token { kind: YARN_CLIENT_TOKEN, service: } INFO: diagnostics: AM container is launched, waiting for AM container to Register with RM INFO: ApplicationMaster host: N/A INFO: ApplicationMaster RPC port: -1 INFO: queue: default INFO: start time: 10188663176 INFO: final status: UNDEFINED INFO: tracking URL: http://<hostname>:8088/proxy/application_10134091314_0007/ INFO: user: crodgers@DOMAIN.ORG INFO Client: Application report for application_10134091314_0007 (state: ACCEPTED) INFO Client: Application report for application_10134091314_0007 (state: ACCEPTED) INFO Client: Application report for application_10134091314_0007 (state: ACCEPTED) INFO RSCAppListener: Disconnect with app application_10134091314_0007 WARN RSCClient: Client RPC channel closed unexpectedly. INFO RSCClient: Failing pending job 12b64fd8-62ac-4dcb-9a05-6c68b81b8420 due to shutdown.
2nd Edit: Including Resource Manager Logs:
For more detailed output, check the application tracking page: http://<hostname>:8088/cluster/app/applica tion_1490134091314_0008 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e18_1490134091314_0008_01_000001 Exit code: 15 Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer (DefaultLinuxContainerRuntime.java:109) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContai ner(DelegatingLinuxContainerRuntime.java:89) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : run as user is rodgersc@CORPORATE.ACT.ORG main : requested yarn user is rodgersc@CORPORATE.ACT.ORG Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /DATA1/hadoop/yarn/local/nmPrivate/application_1490134091314_0008/container_e18_1490134091314_0008_01_000001/container_e18_1490134091314_0008_01_000001.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths...
Created 06-20-2017 04:59 AM
@Colton Rodgers, can you please confirm if 'livy.impersonation.enabled' is set to true? There was a known spark issue ( SPARK-13478) where spark application was hitting GSSException while starting HiveContext when livy impersonation was enabled. Due to this issue livy paragraphs were failing with "Can not start Spark".
Workaround:
remove hive-site.xml from /etc/spark/conf ( on the livy server host)
and set livy.impersonation.enabled to false.
Created 06-20-2017 04:59 AM
The below post also talks about similar issue:
Please see if this helps.
Created 06-20-2017 04:59 AM
I do not have Ranger KMS installed, but thank you for the article!
Created 06-20-2017 04:59 AM
@Colton Rodgers, can you please confirm if 'livy.impersonation.enabled' is set to true? There was a known spark issue ( SPARK-13478) where spark application was hitting GSSException while starting HiveContext when livy impersonation was enabled. Due to this issue livy paragraphs were failing with "Can not start Spark".
Workaround:
remove hive-site.xml from /etc/spark/conf ( on the livy server host)
and set livy.impersonation.enabled to false.
Created 06-20-2017 04:59 AM
The Livy Logs are showing that the HiveContext is starting fine, but for sanity's sake I tried your method of configuration and after the changes and restarts, it is still outputting the same error.
Created 06-20-2017 04:59 AM
@Colton Rodgers, can you please share livy interpreter logs ?
Created 06-20-2017 04:59 AM
While the work around did not work exactly, it pointed us in the correct direction of working with the impersonations and making sure that those were all set correctly. Thank you.
Created 06-20-2017 04:59 AM
@Colton Rodgers, Good to know that it helped.Thanks for accepting answer.
Created 06-20-2017 04:59 AM
Hi @Colton Rodgers I have the same problem than your. Please let me know if you find the solution.
Thanks