Member since
08-31-2018
15
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1613 | 10-22-2018 10:08 AM |
01-22-2019
04:22 PM
1 Kudo
Any idea on this particular issue? A jupyter server is now running smoothly on this cluster but Zeppelin still refuses to cooperate.
... View more
01-22-2019
12:23 PM
Thank you for your fast answer! Indeed it works after tweaking zeppelin's spark interpreter parameters and changing: master: yarn-cluster to master: yarn
spark.submit.deployMode: cluster
... View more
01-22-2019
10:24 AM
By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode. I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start. There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181 but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Apache Zeppelin
12-04-2018
02:03 PM
1 Kudo
I had the exact same problem and solve it this way: First get a kerberos ticket on the machine with hiveserver then start hive: sudo kinit -kt /etc/security/keytabs/hive.service.keytab hive/host1.xxx.local@EXAMPLE:COM
sudo beeline -u "jdbc:hive2://host1.xxx.local:2181,host2.xxx.local:2181,host3.xxx.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
... View more
11-30-2018
03:58 PM
1 Kudo
For quite some time I've been facing an issue with Zeppelin which seem to be unable to launch IPython. I followed this guide and this one. Pyspark interpreter is correctly set with the right python path and IPython activated by default. However, when I try to run any of the examples in the guides such as: %ipyspark
import pandas as pd
df = pd.DataFrame({'name':['a','b','c'], 'count':[12,24,18]})
z.show(df) I get the following error from the logs which doesn't tell much: INFO ({pool-3-thread-2} IPythonInterpreter.java[setAdditionalPythonPath]:103) - setAdditionalPythonPath: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:135) - Python Exec: python3
INFO ({pool-3-thread-2} IPythonInterpreter.java[checkIPythonPrerequisite]:195) - IPython prerequisite is meet
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:146) - Launching IPython Kernel at port: 39753
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:147) - Launching JVM Gateway at port: 36511
INFO ({pool-3-thread-2} IPythonInterpreter.java[setupIPythonEnv]:315) - PYTHONPATH:/usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN ({Exec Default Executor} IPythonInterpreter.java[onProcessFailed]:394) - Exception happens in Python Process
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
at java.lang.Thread.run(Thread.java:745)
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN ({pool-3-thread-2} PySparkInterpreter.java[open]:134) - Fail to open IPySparkInterpreter
java.lang.RuntimeException: Fail to open IPythonInterpreter
at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:157)
at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:129)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Fail to launch IPython Kernel in 30 seconds
at org.apache.zeppelin.python.IPythonInterpreter.launchIPythonKernel(IPythonInterpreter.java:297)
at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:154)
at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
INFO ({pool-3-thread-2} PySparkInterpreter.java[open]:140) - IPython is not available, use the native PySparkInterpreter
I'm using HDP3.0.1 which comes with Zeppelin 0.8.0. All the nodes have python 3.7.1 installed with the latest version of jupyter and grpcio. From Zeppelin notebook I checked ipython and python version: %pyspark
import sys
import IPython
print(IPython.__version__)
print(sys.version)
7.2.0 3.7.1 (default, Nov 29 2018, 17:37:37) I can start IPython from any node without a problem and Zeppelin can correctly get IPython's version. I tried to find if there are other logs than Zeppelin reporting the error but couldn't find anything. Any idea of what could be preventing the launch of IPython kernel from Zeppelin?
... View more
Labels:
- Labels:
-
Apache Zeppelin
10-22-2018
10:08 AM
1 Kudo
Found it. In Ambari > Hive > Config > Database I had: Database URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true. You need to change localhost for the host you are trying to connect to, in my case: Database URL: jdbc:mysql://hadoopslave01.*****.*****/metastore?createDatabaseIfNotExist=true. You will probably need to change the bind-adress in your mysql configuration file as well: On ubuntu go to /etc/mysql/mysql.conf.d/mysqld.cnf and change bind-adress: 127.0.0.1 to bind-adress: 0.0.0.0 Then restart mysql (on Ubuntu service mysql restart) and in Ambari > Hive > Config > Database test your connection to the metastore.
... View more
10-22-2018
08:46 AM
I found this document explaining how to connect spark to hive metastore. I added the following lines in Ambari in custom spark2-default.conf as well as in /usr/hdp/2.6.5.0-292/spark2/conf/spark-default.conf and /usr/hdp/3.0.1.0-187/spark2/conf/spark-default.conf: spark.sql.hive.hiveserver2.jdbc.url: jdbc:hive2://hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
spark.datasource.hive.warehouse.metastoreUri: thrift://hadoopslave01.*****.*****:9083
spark.datasource.hive.warehouse.load.staging.dir: /tmp/hive
spark.hadoop.hive.llap.daemon.service.hosts: @llap0
spark.hadoop.hive.zookeeper.quorum: hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181 Still the same error. The upgrade tries to find the metastore on hadoopmaster instead of hadoopslave01
... View more
10-19-2018
02:42 PM
I'm upgrading ambari and HDP from 2.6.5 to 3.0.1 and currently struggling with Spark2 history server as the upgrade file points toward the wrong node for hive metastore. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 102, in <module>
JobHistoryServer().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
method(env)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 993, in restart
self.start(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 55, in start
spark_service('jobhistoryserver', upgrade_type=upgrade_type, action='start')
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/spark_service.py", line 106, in spark_service
user = params.hive_user)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/hdp/current/hive-client/bin/schematool -dbType mysql -createCatalog spark -catalogDescription 'Default catalog, for Spark' -ifNotExists -catalogLocation hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse' returned 1. /usr/hdp/3.0.1.0-187/hive/conf/hive-env.sh: line 48: [: !=: unary operator expected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Create catalog spark at location hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse
Metastore connection URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Fri Oct 19 14:17:44 CEST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Access denied for user 'hive'@'localhost' (using password: YES)
SQL Error code: 1045
Use --verbose for detailed stacktrace.
*** schemaTool failed *** The hive metastore is on the node hadoopslave01 and I have no problem to connect to it using the user 'hive'@'localhost'. It seems that Spark has the wrong adress and goes for the node hadoopmaster. To test this hypothesis I created a mysql user 'hive'@'localhost' on hadoopmaster and tried to resume the upgrade which this time created a new (empty) hive metastore on hadoopmaster (and proceed to crash because of missing tables). In both /usr/hdp/2.6.5.0-292/spark2/conf and /usr/hdp/3.0.1.0-187/spark2/conf/ hive-site.xml reads: <name>hive.metastore.uris</name>
<value>thrift://hadoopslave01.*****.*****:9083</value> I'd say that the problem comes from the following line in the script but I can't find where to change it: Metastore connection URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
... View more
Labels:
- Labels:
-
Apache Spark
10-10-2018
03:03 PM
Thank you for your answer. I just tried and indeed I can invoke beeline from hive user. But what about the scripts that point 7. mentions? Should I write in beeline: $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false-cp /usr/hdp/current/hive-server2-hive2/lib/derby-10.10.2.0.jar:/usr/hdp/current/hive-server2-hive2/lib/*:/usr/hdp/current/hadoop/*:/usr/hdp/current/hadoop/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*:/usr/hdp/current/hadoop-hdfs/*:/usr/hdp/current/hadoop-hdfs/lib/*:/usr/hdp/current/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.0.0.0-1634.jar:/usr/hdp/current/hive-client/conf/conf.server:/usr/hdp/current/hive-metastore/lib/hive-metastore.jar:/usr/hdp/current/hive-metastore/lib/libthrift-0.9.3.jar:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hive-client/lib/hive-common.jar:/usr/hdp/current/hive-client/lib/commons-cli-1.2.jar:/usr/hdp/current/hadoop-client/lib/* org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool -execute
... View more