Member since 
    
	
		
		
		07-10-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                68
            
            
                Posts
            
        
                30
            
            
                Kudos Received
            
        
                5
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4988 | 02-20-2018 11:18 AM | |
| 4065 | 09-20-2017 02:59 PM | |
| 19245 | 09-19-2017 02:22 PM | |
| 4267 | 08-03-2017 10:34 AM | |
| 2984 | 07-28-2017 10:01 AM | 
			
    
	
		
		
		11-29-2018
	
		
		01:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 By Ambari platform you mean? You can use either zeppelin or superset which you were using.  Zeppelin has a lot of interpreters and it can connect to hive/spark/mysql.  https://zeppelin.apache.org/supported_interpreters.html  Visualization in superset is easier, you can create hive tables using that csv or create mysql using that.  You can then add that database in superset and add existing tables.   https://superset.incubator.apache.org/tutorial.html#connecting-to-a-new-database 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-29-2018
	
		
		06:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Ftoon Kedwan I think you've got the concept wrong  In superset, you add datasources/databases and tables assuming they are already present in your environment. (it doesn't create those for you).  For example, you'll have a mysql db somewhere and you'll have to provide an sqlalchemy url to add it. You can then go to add tables and add tables which already exist in the database.  While adding database/datasource, there's no check whether the physical entity is present. (unless can do explicit test connection while adding), so you were able to add it (not create it).       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2018
	
		
		10:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 You can set -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps flags in YARN_NODEMANAGER_OPTS and then view nodemanager logs in GC visualizer like gceasy.io.   This error occurs when all objects are referenced/live and subsequent GC cycles can't reclaim > 2% of heap space.   https://plumbr.io/outofmemoryerror/gc-overhead-limit-exceeded 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-20-2018
	
		
		11:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hive Views are a logical construct with no associated storage. Neither are they permanent, they exist only for the session.  I don't think you'll be able to see a directory in hdfs hive warehouse corresponding to the view.  See below links for reference:  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView  https://community.hortonworks.com/content/supportkb/48761/what-is-a-hive-view.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-30-2017
	
		
		09:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Check whether SPARK_HOME in interpreter settings points to correct pyspark.  Is it set to below value?     SPARK_HOME  /usr/hdp/current/spark2-client/     Where are you setting spark properties, in spark-env.sh or via Zeppelin? Check this thread:  https://issues.apache.org/jira/browse/ZEPPELIN-295  Do spark.driver.memory=4G, spark.driver.cores=2.  Check spark.memory.fraction (If it's set to 0.75, reduce it to 0.6) https://issues.apache.org/jira/browse/SPARK-15796  Check logs-> do tail -f /var/log/zeppelin/zeppelin-interpreter-spark2-spark-zeppelin-{HOSTNAME}.log in zeppelin host. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-19-2017
	
		
		12:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I also tried that once and it didn't seem to work for some reason.  Please try using 'screen' utility.  https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/  Use Ctrl+a+c to create a new one, run your script there without nohup, that is ./run_beeline_hql.sh and detach from that session by using Ctrl+a+d.  The process will keep running in the background which you can check by ps. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-19-2017
	
		
		12:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Try including the option --driver com.mysql.jdbc.Driver in the import command. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-17-2017
	
		
		12:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Good to hear that anobi. I could not find how to restrict sessions to a particular value.  However if you set this spark.sql.hive.thriftServer.singleSession  true.  Only 1 session can be run. This doesn't scale very well.  Please run spark.conf.getAll(), you may find other properties related to num sessions.  Also please accept/upvote  any answers if they helped you in concept.  Thank You 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-16-2017
	
		
		01:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @anobi do Did you try setting spark.sql.thriftServer.incrementalCollect true?  I am not running multiple queries at a time, so maybe because of that I'm not seeing this, Try decreasing number of simultaneous sessions after setting incremental to true. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-16-2017
	
		
		05:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 You may also be facing a bug. Check below links and your spark version.  https://issues.apache.org/jira/browse/SPARK-18857  https://forums.databricks.com/questions/344/how-does-the-jdbc-odbc-thrift-server-stream-query.html  https://stackoverflow.com/questions/35046692/spark-incremental-collect-to-a-partition-causes-outofmemory-in-heap  Regardless, please try with spark.sql.thriftServer.incrementalCollect true in thrift conf or start thrift-server with that.  It is set to False by default, this would be an important thing to check and has a direct implication on driver heap (if you're in fact running out of that). Read link below:  http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













