Member since
10-08-2015
108
Posts
62
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4131 | 06-03-2017 12:11 AM | |
5366 | 01-24-2017 01:02 PM | |
5880 | 12-27-2016 11:38 AM | |
3172 | 12-20-2016 09:52 AM | |
2316 | 12-07-2016 02:15 AM |
12-19-2016
03:37 AM
1 Kudo
Right, spark of HDP fix this issue as we backport this to HDP 2.5
... View more
12-14-2016
08:24 AM
yarn app is started, could you check the yarn app log ? appid : application_1481021618182_0004
... View more
12-11-2016
04:46 AM
Zeppelin 0.6.1 has serveral critical bugs for spark interpreter, please try zeppelin-0.6.2
... View more
12-09-2016
01:46 PM
8 Kudos
Zeppelin is A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. The latest version of zeppelin is 0.6.2 when this article is written, although the community has made lots of effort to improve it. Sometimes you would still meet some weird issues due to environment issue, wrong configuration or zeppelin’s bug of itself. This article would try to illustrate to you to how to diagnose zeppelin if you meet some issues that you can't figure out what’s wrong. Zeppelin Architecture Before I go to details I’d like to give an illustration of zeppelin’s architecture. So that we can understand where to diagnose. The above is a diagram of zeppelin’s diagram. Overall it has 3 layers:
Frontend Zeppelin Server Interpreter Process I would not talk about the details, but just want you to have an overall picture of what components zeppelin has, and usually we hit issues on zeppelin server and interpreter process. Next I will talk about them one by one. Diagnose Zeppelin Server The most efficient tool to diagnose one software/library is log, log and log again. Usually you can figure out what’s wrong in log. Zeppelin Server’ log is in folder $ZEPPELIN_LOG_DIR, it is in /var/log/zeppelin for HDP and it is $ZEPPLEIN_HOME/logs if you use apache zeppelin distribution and doesn’t set ZEPPELIN_LOG_DIR. The log file name is zeppelin-<user>-<host>.log, there’s other files under the log dir, I will talk about them in the next section. Zeppelin use log4j and its default log level is INFO. log4j.properties is located in /etc/zeppelin/conf for HDP, and in $ZEPPELIN_HOME/conf for apache zeppelin distribution if you didn’t specify ZEPPELIN_CONF_DIR. You can update log4j.properties to change log level. First change log4j.appender.dailyfile.Threshold to DEBUG, then add package level log setting. Here's my log4j.properties for your reference log4j.rootLogger = INFO, dailyfile
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n
log4j.appender.dailyfile.DatePattern=.yyyy-MM-dd
log4j.appender.dailyfile.Threshold = DEBUG
log4j.appender.dailyfile = org.apache.log4j.DailyRollingFileAppender
log4j.appender.dailyfile.File = ${zeppelin.log.file}
log4j.appender.dailyfile.layout = org.apache.log4j.PatternLayout
log4j.appender.dailyfile.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n
log4j.logger.org.apache.zeppelin.interpreter.InterpreterFactory=DEBUG
log4j.logger.org.apache.zeppelin.notebook.Paragraph=DEBUG
log4j.logger.org.apache.zeppelin.scheduler=DEBUG
log4j.logger.org.apache.zeppelin.livy=DEBUG
log4j.logger.org.apache.zeppelin.flink=DEBUG
log4j.logger.org.apache.zeppelin.spark=DEBUG
log4j.logger.org.apache.zeppelin.interpreter.util=DEBUG
log4j.logger.org.apache.zeppelin.interpreter.remote=DEBUG
Diagnose Interpreter Process According my experience, most of problems happen on the interpreter process side. There’s 2 kinds of scenario. Interpreter process fail to launch. Interpreter process can launch but fail to run paragraph. Zeppelin would launch interpreter process by calling interpreter.sh which is located in $ZEPPELIN_HOME/bin. And each interpreter process has one log file located in $ZEPPELIN_LOG_DIR I mentioned before. The log file pattern is zeppelin-interpreter-<interpreter_name>-<user>-<host>.log. Interpreter process share the same log4j.properties with zeppelin-server, so you can change log configuration as I mentioned above. Usually you can check the interpreter log file to figure out what’s wrong. But sometimes there’s no such log file, usually this is because interpeter.sh fail to launch the interpreter process. For this case, you need to modify log4j.properties as above (change log4j.appender.dailyfile.Threshold to DEBUG and chage the log level of log4j.logger.org.apache.zeppelin.interpreter.remote to DEBUG). And it is very useful to add the following line to zeppelin-env.sh so that you can see the spark submit command in log. export SPARK_PRINT_LAUNCH_COMMAND=true The following is the output in my machine, you can see the spark-submit command, which configuration we use and what classpath we use. Usually you can get all the context to figure out what’s wrong. INFO [2016-12-09 11:50:31,640] ({pool-2-thread-2} RemoteInterpreterManagedProcess.java[start]:120) - Run interpreter process [/Users/jzhang/github/zeppelin/bin/interpreter.sh, -d, /Users/jzhang/github/zeppelin/interpreter/spark, -p, 56009, -l, /Users/jzhang/github/zeppelin/local-repo/2C4XVCNK1]
DEBUG [2016-12-09 11:50:31,642] ({pool-2-thread-2} RemoteInterpreterUtils.java[checkIfRemoteEndpointAccessible]:53) - Remote endpoint 'localhost:56009' is not accessible (might be initializing): Connection refused
DEBUG [2016-12-09 11:50:31,853] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:189) - Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java -cp /Users/jzhang/github/zeppelin/interpreter/spark/*:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/lib/*:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/classes/:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/test-classes/:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes/:/Users/jzhang/github/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.0-SNAPSHOT.jar:/Users/jzhang/Java/lib/spark-2.0.2/conf/:/Users/jzhang/Java/lib/spark-2.0.2/assembly/target/scala-2.11/jars/*:/Users/jzhang/Java/lib/hadoop-2.7.2/etc/hadoop/ -Xmx1g -Dlog4j.debug=true -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///Users/jzhang/github/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/Users/jzhang/github/zeppelin/logs/zeppelin-interpreter-spark-jzhang-jzhangMBPr.local.log org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=::/Users/jzhang/github/zeppelin/interpreter/spark/*:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/lib/*::/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/classes:/Users/jzhang/github/zeppelin/zeppelin-interpreter/target/test-classes:/Users/jzhang/github/zeppelin/zeppelin-zengine/target/test-classes:/Users/jzhang/github/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.0-SNAPSHOT.jar --conf spark.driver.extraJavaOptions=-Dlog4j.debug=true -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///Users/jzhang/github/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/Users/jzhang/github/zeppelin/logs/zeppelin-interpreter-spark-jzhang-jzhangMBPr.local.log --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer /Users/jzhang/github/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.0-SNAPSHOT.jar 56009
DEBUG [2016-12-09 11:50:31,853] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:189) - ========================================
.... Advanced Diagnose Approach Sometimes logs may still not be sufficient for you, then you need to debug the zeppelin server process and interpreter process. In that case, you need to configure the following enviroment variable in zeppelin-env.sh export ZEPPELIN_JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"
export ZEPPELIN_INTP_JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=6006" So that you can remote debug the zeppelin-server and interpreter process like any other java process. (port 5005 for zeppelin server process, port 6006 for interpreter process)
... View more
Labels:
12-07-2016
02:22 AM
livy currently don't support spark context sharing between interpreters. You can use zeppelin's builtin spark interpreter for achieve this. (%spark, %spark.sql)
... View more
12-07-2016
02:15 AM
for now %livy.sql can only access tables registered %livy.spark, but not %livy.pyspark and %livy.sparkr.
... View more
12-06-2016
12:59 PM
Do you run it in yarn-cluster mode ? Set livy.spark.master as yarn-cluster in interpreter setting page
... View more
12-05-2016
04:11 AM
1 Kudo
Please specify com.databricks:spark-csv_2.10:1.4.0 in the interpreter setting page
... View more
12-03-2016
11:35 PM
The session might be expired, Can you restart livy interpreter ? And if you still get the error, check the RM UI to find the yarn app log.
... View more
12-03-2016
11:20 AM
Do you kerberos your cluster using ambari ? There're serveral configuration you need to make it work.
... View more