About jeremyjjea

jeremyjjea · ‎01-22-2019

Any idea on this particular issue? A jupyter server is now running smoothly on this cluster but Zeppelin still refuses to cooperate.

jeremyjjea · ‎01-22-2019

Thank you for your fast answer! Indeed it works after tweaking zeppelin's spark interpreter parameters and changing: master: yarn-cluster to master: yarn spark.submit.deployMode: cluster

jeremyjjea · ‎01-22-2019

By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode. I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start. There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?

jeremyjjea · ‎12-04-2018

I had the exact same problem and solve it this way: First get a kerberos ticket on the machine with hiveserver then start hive: sudo kinit -kt /etc/security/keytabs/hive.service.keytab hive/host1.xxx.local@EXAMPLE:COM sudo beeline -u "jdbc:hive2://host1.xxx.local:2181,host2.xxx.local:2181,host3.xxx.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"

jeremyjjea · ‎11-30-2018

For quite some time I've been facing an issue with Zeppelin which seem to be unable to launch IPython. I followed this guide and this one. Pyspark interpreter is correctly set with the right python path and IPython activated by default. However, when I try to run any of the examples in the guides such as: %ipyspark import pandas as pd df = pd.DataFrame({'name':['a','b','c'], 'count':[12,24,18]}) z.show(df) I get the following error from the logs which doesn't tell much: INFO ({pool-3-thread-2} IPythonInterpreter.java[setAdditionalPythonPath]:103) - setAdditionalPythonPath: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:135) - Python Exec: python3 INFO ({pool-3-thread-2} IPythonInterpreter.java[checkIPythonPrerequisite]:195) - IPython prerequisite is meet INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:146) - Launching IPython Kernel at port: 39753 INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:147) - Launching JVM Gateway at port: 36511 INFO ({pool-3-thread-2} IPythonInterpreter.java[setupIPythonEnv]:315) - PYTHONPATH:/usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started WARN ({Exec Default Executor} IPythonInterpreter.java[onProcessFailed]:394) - Exception happens in Python Process org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48) at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200) at java.lang.Thread.run(Thread.java:745) INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started WARN ({pool-3-thread-2} PySparkInterpreter.java[open]:134) - Fail to open IPySparkInterpreter java.lang.RuntimeException: Fail to open IPythonInterpreter at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:157) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:129) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Fail to launch IPython Kernel in 30 seconds at org.apache.zeppelin.python.IPythonInterpreter.launchIPythonKernel(IPythonInterpreter.java:297) at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:154) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) INFO ({pool-3-thread-2} PySparkInterpreter.java[open]:140) - IPython is not available, use the native PySparkInterpreter I'm using HDP3.0.1 which comes with Zeppelin 0.8.0. All the nodes have python 3.7.1 installed with the latest version of jupyter and grpcio. From Zeppelin notebook I checked ipython and python version: %pyspark import sys import IPython print(IPython.__version__) print(sys.version) 7.2.0 3.7.1 (default, Nov 29 2018, 17:37:37) I can start IPython from any node without a problem and Zeppelin can correctly get IPython's version. I tried to find if there are other logs than Zeppelin reporting the error but couldn't find anything. Any idea of what could be preventing the launch of IPython kernel from Zeppelin?

jeremyjjea · ‎10-22-2018

Found it. In Ambari > Hive > Config > Database I had: Database URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true. You need to change localhost for the host you are trying to connect to, in my case: Database URL: jdbc:mysql://hadoopslave01.*****.*****/metastore?createDatabaseIfNotExist=true. You will probably need to change the bind-adress in your mysql configuration file as well: On ubuntu go to /etc/mysql/mysql.conf.d/mysqld.cnf and change bind-adress: 127.0.0.1 to bind-adress: 0.0.0.0 Then restart mysql (on Ubuntu service mysql restart) and in Ambari > Hive > Config > Database test your connection to the metastore.

jeremyjjea · ‎10-22-2018

I found this document explaining how to connect spark to hive metastore. I added the following lines in Ambari in custom spark2-default.conf as well as in /usr/hdp/2.6.5.0-292/spark2/conf/spark-default.conf and /usr/hdp/3.0.1.0-187/spark2/conf/spark-default.conf: spark.sql.hive.hiveserver2.jdbc.url: jdbc:hive2://hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 spark.datasource.hive.warehouse.metastoreUri: thrift://hadoopslave01.*****.*****:9083 spark.datasource.hive.warehouse.load.staging.dir: /tmp/hive spark.hadoop.hive.llap.daemon.service.hosts: @llap0 spark.hadoop.hive.zookeeper.quorum: hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181 Still the same error. The upgrade tries to find the metastore on hadoopmaster instead of hadoopslave01

jeremyjjea · ‎10-19-2018

I'm upgrading ambari and HDP from 2.6.5 to 3.0.1 and currently struggling with Spark2 history server as the upgrade file points toward the wrong node for hive metastore. Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 102, in <module> JobHistoryServer().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute method(env) File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 993, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 55, in start spark_service('jobhistoryserver', upgrade_type=upgrade_type, action='start') File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/spark_service.py", line 106, in spark_service user = params.hive_user) File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run returns=self.resource.returns) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/hdp/current/hive-client/bin/schematool -dbType mysql -createCatalog spark -catalogDescription 'Default catalog, for Spark' -ifNotExists -catalogLocation hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse' returned 1. /usr/hdp/3.0.1.0-187/hive/conf/hive-env.sh: line 48: [: !=: unary operator expected SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Create catalog spark at location hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse Metastore connection URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Fri Oct 19 14:17:44 CEST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. Underlying cause: java.sql.SQLException : Access denied for user 'hive'@'localhost' (using password: YES) SQL Error code: 1045 Use --verbose for detailed stacktrace. *** schemaTool failed *** The hive metastore is on the node hadoopslave01 and I have no problem to connect to it using the user 'hive'@'localhost'. It seems that Spark has the wrong adress and goes for the node hadoopmaster. To test this hypothesis I created a mysql user 'hive'@'localhost' on hadoopmaster and tried to resume the upgrade which this time created a new (empty) hive metastore on hadoopmaster (and proceed to crash because of missing tables). In both /usr/hdp/2.6.5.0-292/spark2/conf and /usr/hdp/3.0.1.0-187/spark2/conf/ hive-site.xml reads: <name>hive.metastore.uris</name> <value>thrift://hadoopslave01.*****.*****:9083</value> I'd say that the problem comes from the following line in the script but I can't find where to change it: Metastore connection URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true

jeremyjjea · ‎10-11-2018

Thank you both for your help!

jeremyjjea · ‎10-10-2018

Thank you for your answer. I just tried and indeed I can invoke beeline from hive user. But what about the scripts that point 7. mentions? Should I write in beeline: $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false-cp /usr/hdp/current/hive-server2-hive2/lib/derby-10.10.2.0.jar:/usr/hdp/current/hive-server2-hive2/lib/*:/usr/hdp/current/hadoop/*:/usr/hdp/current/hadoop/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*:/usr/hdp/current/hadoop-hdfs/*:/usr/hdp/current/hadoop-hdfs/lib/*:/usr/hdp/current/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.0.0.0-1634.jar:/usr/hdp/current/hive-client/conf/conf.server:/usr/hdp/current/hive-metastore/lib/hive-metastore.jar:/usr/hdp/current/hive-metastore/lib/libthrift-0.9.3.jar:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hive-client/lib/hive-common.jar:/usr/hdp/current/hive-client/lib/commons-cli-1.2.jar:/usr/hdp/current/hadoop-client/lib/* org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool -execute

Online	Offline
Last Visited	‎09-12-2019 12:11 PM

Member Since	‎08-31-2018 09:42 AM
Last Visited	‎09-12-2019 12:11 PM
Posts	15
Kudos received	4

Cloudera Community

Re: Spark2 history server upgrade points toward wr...

Re: Zeppelin unable to start IPython kernel

Re: Spark in yarn-cluster mode on Zeppelin

Spark in yarn-cluster mode on Zeppelin

Re: GSS initiate failed (state=08S01,code=0) (Kerb...

Zeppelin unable to start IPython kernel

Re: Spark2 history server upgrade points toward wr...

Re: Spark2 history server upgrade points toward wr...

Spark2 history server upgrade points toward wrong ...

Re: Hive upgrade HDP3 and Beeline

Re: Hive upgrade HDP3 and Beeline