Member since 
    
	
		
		
		08-31-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                15
            
            
                Posts
            
        
                4
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2045 | 10-22-2018 10:08 AM | 
			
    
	
		
		
		01-22-2019
	
		
		04:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Any idea on this particular issue? A jupyter server is now running smoothly on this cluster but Zeppelin still refuses to cooperate. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-22-2019
	
		
		12:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you for your fast answer!  Indeed it works after tweaking zeppelin's spark interpreter parameters   and changing:  master: yarn-cluster  to  master: yarn
spark.submit.deployMode: cluster
     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-22-2019
	
		
		10:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 By default with ambari installation, Zeppelin is set to have yarn client mode for Spark Interpreter which means the driver runs in the same host of Zeppelin Server. This incur high memory pressure on the Zeppelin Server host especially when Spark Interpreter is ran in isolated mode.  I'm trying to switch to yarn-cluster mode which would let yarn decide on where spark driver should be executed depending of the available resources in the cluster. This mode is supported by Zeppelin since the version 0.8.0 but I'm facing the following issue https://issues.apache.org/jira/browse/ZEPPELIN-3633. Basically, the node where yarn decided to run spark driver doesn't have zeppelin installed so is unable to start.   There is a fix on Zeppelin's github https://github.com/apache/zeppelin/pull/3181  but I can't find the files that I need to change. Any chance that this can be fixed easily or should I just install zeppelin on every nodes? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
- 
						
							
		
			Apache YARN
- 
						
							
		
			Apache Zeppelin
			
    
	
		
		
		12-04-2018
	
		
		02:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I had the exact same problem and solve it this way:  First get a kerberos ticket on the machine with hiveserver then start hive:  sudo kinit -kt /etc/security/keytabs/hive.service.keytab hive/host1.xxx.local@EXAMPLE:COM 
 sudo beeline -u "jdbc:hive2://host1.xxx.local:2181,host2.xxx.local:2181,host3.xxx.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-30-2018
	
		
		03:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 For quite some time I've been facing an issue with Zeppelin which seem to be unable to launch IPython. I followed this guide and this one. Pyspark interpreter is correctly set with the right python path and IPython activated by default. However, when I try to run any of the examples in the guides such as:  %ipyspark
import pandas as pd
df = pd.DataFrame({'name':['a','b','c'], 'count':[12,24,18]})
z.show(df)  I get the following error from the logs which doesn't tell much:  INFO ({pool-3-thread-2} IPythonInterpreter.java[setAdditionalPythonPath]:103) - setAdditionalPythonPath: /usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:135) - Python Exec: python3
INFO ({pool-3-thread-2} IPythonInterpreter.java[checkIPythonPrerequisite]:195) - IPython prerequisite is meet
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:146) - Launching IPython Kernel at port: 39753
INFO ({pool-3-thread-2} IPythonInterpreter.java[open]:147) - Launching JVM Gateway at port: 36511
INFO ({pool-3-thread-2} IPythonInterpreter.java[setupIPythonEnv]:315) - PYTHONPATH:/usr/hdp/current/spark2-client/python/lib/pyspark.zip:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/zeppelin-server/interpreter/lib/python:/usr/hdp/current/spark2-client//python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client//python/:/usr/hdp/current/spark2-client//python:/usr/hdp/current/spark2-client//python/lib/py4j-0.8.2.1-src.zip
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN ({Exec Default Executor} IPythonInterpreter.java[onProcessFailed]:394) - Exception happens in Python Process
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
	at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
	at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
	at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
	at java.lang.Thread.run(Thread.java:745)
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
INFO ({pool-3-thread-2} IPythonInterpreter.java[launchIPythonKernel]:293) - Wait for IPython Kernel to be started
WARN ({pool-3-thread-2} PySparkInterpreter.java[open]:134) - Fail to open IPySparkInterpreter
java.lang.RuntimeException: Fail to open IPythonInterpreter
	at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:157)
	at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
	at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:129)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Fail to launch IPython Kernel in 30 seconds
	at org.apache.zeppelin.python.IPythonInterpreter.launchIPythonKernel(IPythonInterpreter.java:297)
	at org.apache.zeppelin.python.IPythonInterpreter.open(IPythonInterpreter.java:154)
	at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:66)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
	at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
INFO ({pool-3-thread-2} PySparkInterpreter.java[open]:140) - IPython is not available, use the native PySparkInterpreter
  I'm using HDP3.0.1 which comes with Zeppelin 0.8.0. All the nodes have python 3.7.1 installed with the latest version of jupyter and grpcio. From Zeppelin notebook I checked ipython and python version:  %pyspark
import sys
import IPython
print(IPython.__version__)
print(sys.version)  
 7.2.0   3.7.1 (default, Nov 29 2018, 17:37:37)    I can start IPython from any node without a problem and Zeppelin can correctly get IPython's version. I tried to find if there are other logs than Zeppelin reporting the error but couldn't find anything.  Any idea of what could be preventing the launch of IPython kernel from Zeppelin? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Zeppelin
			
    
	
		
		
		10-22-2018
	
		
		10:08 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Found it.  In Ambari > Hive > Config > Database I had:  Database URL: jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true.   You need to change localhost for the host you are trying to connect to, in my case:   Database URL: jdbc:mysql://hadoopslave01.*****.*****/metastore?createDatabaseIfNotExist=true.  You will probably need to change the bind-adress in your mysql configuration file as well:  On ubuntu go to /etc/mysql/mysql.conf.d/mysqld.cnf and change   bind-adress: 127.0.0.1   to  bind-adress: 0.0.0.0  Then restart mysql (on Ubuntu service mysql restart) and in Ambari > Hive > Config > Database test your connection to the metastore. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-22-2018
	
		
		08:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I found this document explaining how to connect spark to hive metastore. I added the following lines in Ambari in custom spark2-default.conf as well as in /usr/hdp/2.6.5.0-292/spark2/conf/spark-default.conf and /usr/hdp/3.0.1.0-187/spark2/conf/spark-default.conf:  spark.sql.hive.hiveserver2.jdbc.url: jdbc:hive2://hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
spark.datasource.hive.warehouse.metastoreUri: thrift://hadoopslave01.*****.*****:9083 
spark.datasource.hive.warehouse.load.staging.dir: /tmp/hive
spark.hadoop.hive.llap.daemon.service.hosts: @llap0 
spark.hadoop.hive.zookeeper.quorum: hadoopslave01.*****.*****:2181,hadoopmaster.*****.*****:2181,hadoopslave02.*****.*****:2181  Still the same error. The upgrade tries to find the metastore on hadoopmaster instead of hadoopslave01   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-19-2018
	
		
		02:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I'm upgrading ambari and HDP from 2.6.5 to 3.0.1 and currently struggling with Spark2 history server as the upgrade file points toward the wrong node for hive metastore.  Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 102, in <module>
    JobHistoryServer().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
    method(env)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 993, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/job_history_server.py", line 55, in start
    spark_service('jobhistoryserver', upgrade_type=upgrade_type, action='start')
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/SPARK2/package/scripts/spark_service.py", line 106, in spark_service
    user = params.hive_user)
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
    returns=self.resource.returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/hdp/current/hive-client/bin/schematool -dbType mysql -createCatalog spark -catalogDescription 'Default catalog, for Spark' -ifNotExists -catalogLocation hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse' returned 1. /usr/hdp/3.0.1.0-187/hive/conf/hive-env.sh: line 48: [: !=: unary operator expected
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Create catalog spark at location hdfs://hadoopmaster.****.****:8020/apps/spark/warehouse
Metastore connection URL:	 jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Fri Oct 19 14:17:44 CEST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Access denied for user 'hive'@'localhost' (using password: YES)
SQL Error code: 1045
Use --verbose for detailed stacktrace.
*** schemaTool failed ***  The hive metastore is on the node hadoopslave01 and I have no problem to connect to it using the user 'hive'@'localhost'. It seems that Spark has the wrong adress and goes for the node hadoopmaster. To test this hypothesis I created a mysql user 'hive'@'localhost' on hadoopmaster and tried to resume the upgrade which this time created a new (empty) hive metastore on hadoopmaster (and proceed to crash because of missing tables).  In both /usr/hdp/2.6.5.0-292/spark2/conf and /usr/hdp/3.0.1.0-187/spark2/conf/ hive-site.xml reads:  <name>hive.metastore.uris</name>
      <value>thrift://hadoopslave01.*****.*****:9083</value>  I'd say that the problem comes from the following line in the script but I can't find where to change it:  Metastore connection URL:	 jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		10-10-2018
	
		
		03:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you for your answer.  I just tried and indeed I can invoke beeline from hive user. But what about the scripts that point 7. mentions? Should I write in beeline:  $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false-cp /usr/hdp/current/hive-server2-hive2/lib/derby-10.10.2.0.jar:/usr/hdp/current/hive-server2-hive2/lib/*:/usr/hdp/current/hadoop/*:/usr/hdp/current/hadoop/lib/*:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/current/hadoop-mapreduce-client/lib/*:/usr/hdp/current/hadoop-hdfs/*:/usr/hdp/current/hadoop-hdfs/lib/*:/usr/hdp/current/hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-3.1.0.3.0.0.0-1634.jar:/usr/hdp/current/hive-client/conf/conf.server:/usr/hdp/current/hive-metastore/lib/hive-metastore.jar:/usr/hdp/current/hive-metastore/lib/libthrift-0.9.3.jar:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hive-client/lib/hive-common.jar:/usr/hdp/current/hive-client/lib/commons-cli-1.2.jar:/usr/hdp/current/hadoop-client/lib/* org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool -execute 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        



