Member since 
    
	
		
		
		09-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                70
            
            
                Posts
            
        
                79
            
            
                Kudos Received
            
        
                20
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2897 | 02-27-2018 08:03 AM | |
| 2556 | 02-27-2018 08:00 AM | |
| 3472 | 10-09-2016 07:59 PM | |
| 1306 | 10-03-2016 07:27 AM | |
| 1291 | 06-17-2016 03:30 PM | 
			
    
	
		
		
		04-29-2016
	
		
		08:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello Pedro  Spark core is a general purpose in memory analytics engine. Adding to spark core things like sparkSQL or SparkML you can do many interesting analytics or Datascience modelling, in a programatic or sql fashion. Maybe this tutorial can help you in your first steps.  http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/  http://hortonworks.com/blog/data-science-hadoop-spark-scala-part-2/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-14-2016
	
		
		08:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello nelson I don't think you need the Hive configuration explicitly set anymore. aka this part   "-Djavax.jdo.option.ConnectionURL=jdbc:mysql://testip/hive?createDatabaseIfNotExist=true -Dhive.metastore.uris=thrift://testip:9083 " 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2016
	
		
		09:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hello Nelson  Instead of putting the Hive info in different properties could you try to add the hive-site.xml : (--files=/etc/hive/conf/hive-site.xml) just to make sure all is consistent. Without this spark could launch a embedded metastore causing the out of memory condition.  Could you also share a little bit the app , what type of data ORC,CSV etc... Size of he table  let's see if this helps 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-13-2016
	
		
		08:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Hello Sumit  Increasing the zookeeper session timeout is often a quick first fix to GC pause "killing" in Hbase. In the longer run If you have GC pauses is because your process is trying to find memory.   There can be architectural approaches to this problem: For example does this happen during heavy writes loads in which case you consider doing bulk load when possible.   You can also look at your hbase configuration what is your overall allocated memory for Hbase and how is distributed for writes and reads. Do you flush your memstore often, does this lead to many compactions?   Lastly you can look at GC tuning. I won't dive into this one but Lars has done a nice introduction blog post on this here:http://hadoop-hbase.blogspot.ie/2014/03/hbase-gc-tuning-observations.html  Hope any of this helps 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-12-2016
	
		
		01:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Hello sunile  Zeppelin in 2.4 has a bug that has been fixed since then. If you issue any query with a "()" the new parser for prefix will get lost. In your log if your query is "%hive select count(*)" from table you will see the query being sent look like "elect count(*) from table"  This is because the parser looks for a "(prefix_name)" and here mistakes "(*)" for a prefix.   The workaround is use %hive(default) or wait for next release 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2016
	
		
		11:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Hello Kiran  Spark is not yet a GA feature in Hive, still very much in dev phase. You can however use SparkSQL to issue queries in a hive context to use Hive tables. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-06-2016
	
		
		07:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Hello arunkumar  As a general rule it will come back to what you are trying to achieve and how you want to service data. Remember that Hbase's performance is directly derived from the rowkey and hence how you access data. Hbase will split up data in regions served by region servers and on a lower level data will be split by Column Family. A single entry however will be served by the same region. At high level the difference between tall-narrow and flat-wide comes back to scans vs gets. Since Hbase has an ordered on the rowkey storage policy and full scans are costly. A Tall-narrow approach would be to have a more complex rowkey giving adjacency of similar elements and allowing to do focused scans for  logical group of entries. A Flat-wide approach would ahve much more information in the entry itself, you "get" the entry through the rowkey and the entry would have sufficient information to do your compute or answer your query.  hope this helps 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-04-2016
	
		
		12:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As you see when you increased the user-limit-factor it allocated more containers and you got 200% of the queue, now if you where to give 2,5 you would get the full queue. For the second part  if you want the ituser queue to release the extra containers to service the price queue, you can either wait for it to happen naturally as the job rolls out or better set the yarn preemption mechanism.http://hortonworks.com/blog/better-slas-via-resource-preemption-in-yarns-capacityscheduler/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-04-2016
	
		
		09:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello Alena  On top of the queue distribution and elasticity there are other elements that can be configured to help share the ressources. For example you have .root.it.user-limit-factor=1 which means a user cannot use more than 100% of the allocated queue capacity, this can limit or negate the optionnal elasticity given to a queue. Try setting it to 2 and then 3 so see the result.  regards 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-29-2016
	
		
		08:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 Hello Santosh  When creating a Phoenix table through the DSL, as you are doing, phoenix will handle all the magic before pushing using Hbase as a store. In this scenario you will get a complex rowkey in the order you have written it: Market_key-Product_Key-Period-Key so the order in which you declare in your statement is important as it will be the order of your complex rowkey. Furthermore to separate the values Phoenix will use a 0 byte in between each element of the key, or use size encoding info if applicable. So, for example if you have (varchar,u_long, varchar) primary key, the rowkey for values like 'X',1,'Y'will be : ('X', 0x00) (0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01), ('Y')  Lastly the Primary Key elements becoming the rowkey they will not be Hbase columns of your table so keep that in mind will designing it if has importance.  
						
					
					... View more