Created 11-30-2018 03:58 PM
For quite some time I've been facing an issue with Zeppelin which seem to be unable to launch IPython. I followed this guide and this one. Pyspark interpreter is correctly set with the right python path and IPython activated by default. However, when I try to run any of the examples in the guides such as:
%ipyspark import pandas as pd df = pd.DataFrame({'name':['a','b','c'], 'count':[12,24,18]}) z.show(df)
I get the following error from the logs which doesn't tell much:
INFO ({pool-3-thread-2} IPythonInterpreter.java[setAdditionalPythonPath]:103) - setAdditionalPythonPath: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:135) - Python Exec: python3 INFO ({pool-3-thread-2} IPythonInterpreter.java[checkIPythonPrerequisite]:195) - IPython prerequisite is meet INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:146) - Launching IPython Kernel at port: 39753 INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:147) - Launching JVM Gateway at port: 36511 INFO ({pool-3-thread-2} IPythonInterpreter.java[setupIPythonEnv]:315) - PYTHONPATH:/usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started WARN ({Exec Default Executor} IPythonInterpreter.java[onProcessFailed]:394) - Exception happens in Python Process org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48) at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200) at java.lang.Thread.run(Thread.java:745) INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started WARN ({pool-3-thread-2} PySparkInterpreter.java[open]:134) - Fail to open IPySparkInterpreter java.lang.RuntimeException: Fail to open IPythonInterpreter at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:157) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:129) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Fail to launch IPython Kernel in 30 seconds at org.apache.zeppelin.python.IPythonInterpreter.launchIPythonKernel(IPythonInterpreter.java:297) at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:154) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO ({pool-3-thread-2} PySparkInterpreter.java[open]:140) - IPython is not available, use the native PySparkInterpreter
I'm using HDP3.0.1 which comes with Zeppelin 0.8.0. All the nodes have python 3.7.1 installed with the latest version of jupyter and grpcio. From Zeppelin notebook I checked ipython and python version:
%pyspark import sys import IPython print(IPython.__version__) print(sys.version)
7.2.0
3.7.1 (default, Nov 29 2018, 17:37:37)
I can start IPython from any node without a problem and Zeppelin can correctly get IPython's version. I tried to find if there are other logs than Zeppelin reporting the error but couldn't find anything.
Any idea of what could be preventing the launch of IPython kernel from Zeppelin?
Created 01-22-2019 04:22 PM
Any idea on this particular issue? A jupyter server is now running smoothly on this cluster but Zeppelin still refuses to cooperate.