Created 08-16-2017 01:27 PM
Hi,
I finished a fresh HDP 2.6 installation last week and since then I cannot use any spark interpreters (%spark and %spark2)
I tried to run the spark samples directly on the server and it worked well (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/run-spark2...)
But when I want to use it from zeppelin, I get this error :
Verify Spark Version (should be 2.x) %spark2.spark spark.version org.apache.zeppelin.interpreter.InterpreterException: Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; at org.apache.spark.util.Utils$.getDefaultPropertiesFile(Utils.scala:2086) at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118) at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:149) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:260) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:425) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
It get exactly the same error when I use %spark and %spark2
I did some researches about this error, and it seemed to be a compatibility problem between spark versions, but I saw nothing that helped me for zeppelin on HDP.
I tried to follow this tuto and evething worked excepted the zeppelin part :
https://fr.hortonworks.com/blog/try-apache-spark-2-1-zeppelin-hortonworks-data-cloud/
More informations about my cluster :
3 nodes on centos7
Zeppelin, spark, spark2 and livy server are on the same host
my zeppelin-env.sh file :
# export JAVA_HOME= export JAVA_HOME={{java64_home}} # export MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode. export MASTER=yarn-client export SPARK_YARN_JAR={{spark_jar}} # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16" export ZEPPELIN_JAVA_OPTS="-Dhdp.version={{full_stack_version}} -Dspark.executor.memory={{executor_mem}} -Dspark.executor.instances={{executor_instances}} -Dspark.yarn.queue={{spark_queue}}" # export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options. # export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true) # export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default. export ZEPPELIN_LOG_DIR={{zeppelin_log_dir}} # export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default. export ZEPPELIN_PID_DIR={{zeppelin_pid_dir}} # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory. # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false" # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json # export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default. # export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0. # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote). # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth? # export ZEPPELIN_NOTEBOOK_PUBLIC # Make notebook public by default when created, private otherwise export ZEPPELIN_INTP_CLASSPATH_OVERRIDES="{{external_dependency_conf}}" #### Spark interpreter configuration #### ## Use provided spark installation ## ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit ## #export SPARK_HOME= # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries #export SPARK_HOME={{spark_home}} # export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G". export SPARK_APP_NAME=Zeppelin-Spark # (optional) The name of spark application. ## Use embedded spark binaries ## ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries. ## however, it is not encouraged when you can define SPARK_HOME ## # Options read in YARN client mode # export HADOOP_CONF_DIR # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR. export HADOOP_CONF_DIR=/etc/hadoop/conf # Pyspark (supported with Spark 1.2.1 and above) # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI # export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers. # export PYTHONPATH export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}" ## Spark interpreter options ## ## # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of SQLContext if set true. true by default. # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL concurrently if set true. false by default. # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF collection, and sql if set true. true by default. # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL result to display. 1000 by default. # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000 #### HBase interpreter configuration #### ## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set # export HBASE_HOME= # (require) Under which HBase scripts and configuration should be # export HBASE_CONF_DIR= # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml # export ZEPPELIN_IMPERSONATE_CMD # Optional, when user want to run interpreter as end web user. eg) 'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
If someone already had this problem or has an idea to correct it, I would be really thankfull if you thanksfull !
Thanks in advance,
Félicien
Created on 08-16-2017 05:11 PM - edited 08-17-2019 07:18 PM
Did you modify the zeppelin-env.sh at all? That command you are running should work fine, so I'm wondering if something was modified within your configs as part of the install.
I am running HDP 2.6 and was able to run a simple test (as you did above). It worked as expected. I've included the commands that I ran as well as the parameter settings within my Zeppelin interpreter. Hopefully this is helpful as a comparison for you.
Created 08-18-2017 11:49 AM
Thanks again @Dan Zaratsian,
but can you also show me your spark2 environment (visible from the spark2 history server) ? Maibe it can help me find a problem in others configurations or in the jar versions..
Created 08-18-2017 08:07 AM
Hi @Dan Zaratsian,
Thanks for your answer, I did not change anything in the file, ans my spark2 properties are exactly the same than yours...
I also tried to uninstall the services and put them back again and it did'nt change anything
Created 08-21-2017 08:14 PM
@Félicien Catherin have you set CLASSPATH variable in zeppelin-env, if yes, please comment it and try again.