Support Questions

Find answers, ask questions, and share your expertise

Livy Zeppelin Interpreter 'Cannot Start Spark'

avatar
Rising Star

I have setup a Livy Interpreter through Zeppelin and am trying to run the simple

%livy.pyspark sc.version

Cannot Start Spark

but to no avail.

%spark

sc.version

res10: String = 1.6.2

however, returns the version just fine.

The livy interpreter configs look like such:

livy.spark.master			yarn-cluster
zeppelin.interpreter.localRepo		/usr/hdp/current/zeppelin-server/local-repo/....
zeppelin.livy.concurrentSQL		false
zeppelin.livy.create.session.retries	120
zeppelin.livy.keytab			/<location_of_keytab>/zsk.keytab
zeppelin.livy.principal			<zeppelin_principal>
zeppelin.livy.url			http://<hostname>:8998

I have followed the instructions provided here https://community.hortonworks.com/articles/80059/how-to-configure-zeppelin-livy-interpreter-for-sec.... in entirety.

The cluster is Kerberized as well as Zeppelin being synced to Active Directory. Also, the Resource Managers are in HA and I am seeing a few errors in the livy log regarding refused connections to :8032 (Default Port for RM Admin) See Below for Stacktrace:

WARN Client: Failed to connect to server: <Hostname>/<IP>:8032: retries get failed due to exceeded maximum allowed retries number: 0
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:650)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:745)
        at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
        at org.apache.hadoop.ipc.Client.call(Client.java:1449)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy16.getApplicationReport(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:191)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
        at com.sun.proxy.$Proxy17.getApplicationReport(Unknown Source)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:436)
        at com.cloudera.livy.sessions.SessionManager$$anonfun$2.apply(SessionManager.scala:108)
        at com.cloudera.livy.sessions.SessionManager$$anonfun$2.apply(SessionManager.scala:105)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at com.cloudera.livy.sessions.SessionManager.checkAppState(SessionManager.scala:105)
        at com.cloudera.livy.sessions.SessionManager$SessionAppStateMonitor.run(SessionManager.scala:142)
17/03/21 15:53:51 INFO ConfiguredRMFailoverProxyProvider: Failing over to rm2

Any help would be appreciated! Thank you very much!

Edit: Including some more of the logs from livy-livy-server.out

INFO: 17/03/22 08:17:44 INFO Client:
INFO:    client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
INFO:    diagnostics: AM container is launched, waiting for AM container to Register with RM
INFO:    ApplicationMaster host: N/A
INFO:    ApplicationMaster RPC port: -1
INFO:    queue: default
INFO:    start time: 10188663176
INFO:    final status: UNDEFINED
INFO:    tracking URL: http://<hostname>:8088/proxy/application_10134091314_0007/
INFO:    user: crodgers@DOMAIN.ORG
INFO     Client: Application report for application_10134091314_0007 (state: ACCEPTED)
INFO	 Client: Application report for application_10134091314_0007 (state: ACCEPTED)
INFO	 Client: Application report for application_10134091314_0007 (state: ACCEPTED)
INFO 	 RSCAppListener: Disconnect with app application_10134091314_0007
WARN  	 RSCClient: Client RPC channel closed unexpectedly.
INFO  	 RSCClient: Failing pending job 12b64fd8-62ac-4dcb-9a05-6c68b81b8420 due to shutdown.

2nd Edit: Including Resource Manager Logs:

For more detailed output, check the application tracking page: http://<hostname>:8088/cluster/app/applica
tion_1490134091314_0008 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e18_1490134091314_0008_01_000001
Exit code: 15
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer
(DefaultLinuxContainerRuntime.java:109)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContai
ner(DelegatingLinuxContainerRuntime.java:89)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is rodgersc@CORPORATE.ACT.ORG
main : requested yarn user is rodgersc@CORPORATE.ACT.ORG
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /DATA1/hadoop/yarn/local/nmPrivate/application_1490134091314_0008/container_e18_1490134091314_0008_01_000001/container_e18_1490134091314_0008_01_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
1 ACCEPTED SOLUTION

avatar
Guru

@Colton Rodgers, can you please confirm if 'livy.impersonation.enabled' is set to true? There was a known spark issue ( SPARK-13478) where spark application was hitting GSSException while starting HiveContext when livy impersonation was enabled. Due to this issue livy paragraphs were failing with "Can not start Spark".

Workaround:

remove hive-site.xml from /etc/spark/conf ( on the livy server host)

and set livy.impersonation.enabled to false.

View solution in original post

8 REPLIES 8

avatar

The below post also talks about similar issue:

https://community.hortonworks.com/questions/75377/zeppelin-livy-interpreter-not-working-with-kerbero...

Please see if this helps.

avatar
Rising Star

I do not have Ranger KMS installed, but thank you for the article!

avatar
Guru

@Colton Rodgers, can you please confirm if 'livy.impersonation.enabled' is set to true? There was a known spark issue ( SPARK-13478) where spark application was hitting GSSException while starting HiveContext when livy impersonation was enabled. Due to this issue livy paragraphs were failing with "Can not start Spark".

Workaround:

remove hive-site.xml from /etc/spark/conf ( on the livy server host)

and set livy.impersonation.enabled to false.

avatar
Rising Star

The Livy Logs are showing that the HiveContext is starting fine, but for sanity's sake I tried your method of configuration and after the changes and restarts, it is still outputting the same error.

avatar
Guru

@Colton Rodgers, can you please share livy interpreter logs ?

avatar
Rising Star

While the work around did not work exactly, it pointed us in the correct direction of working with the impersonations and making sure that those were all set correctly. Thank you.

avatar
Guru

@Colton Rodgers, Good to know that it helped.Thanks for accepting answer.

avatar
Rising Star

Hi @Colton Rodgers I have the same problem than your. Please let me know if you find the solution.

Thanks