Member since 
    
	
		
		
		01-09-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                401
            
            
                Posts
            
        
                163
            
            
                Kudos Received
            
        
                80
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2595 | 06-21-2017 03:53 PM | |
| 4290 | 03-14-2017 01:24 PM | |
| 2388 | 01-25-2017 03:36 PM | |
| 3838 | 12-20-2016 06:19 PM | |
| 2101 | 12-14-2016 05:24 PM | 
			
    
	
		
		
		04-26-2016
	
		
		06:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Kevin Sievers   Hi Kevin,  your commands look good to me, somehow he does not take the number of reduce tasks though. You are right Hadoop should be MUCH faster. But the one reduce task and even weirder one mapper seem to be the problem   And I assure you it runs with a lot of mappers and 40 reducers and is loading and transforming around 300 GB of data in 20 minutes on an 7 datanode cluster.  So basically I have NO idea why he does only one mapper, I have no idea why he has the second Reducer AT ALL. I have no idea why he ignores the mapred.reduce.tasks parameter?   I think a support ticket might be in order.  set hive.tez.java.opts = "-Xmx3600m";
set hive.tez.container.size = 4096;
set mapred.reduce.tasks=120;
CREATE EXTERNAL TABLE  STAGING ...
...
insert into TABLE TARGET partition (day = 20150811) SELECT * FROM STAGING distribute by DT ;
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-06-2017
	
		
		04:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 "--create-hcatalog-table " This tells hive to create table. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-05-2015
	
		
		08:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As @bsaini explained this property determines no of open handlers at given time for a datanode.   Two factors which one can look before changing this property are:  1. Use Cases  2. HDP services being used  For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-02-2015
	
		
		07:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 HBase master reads the list of files of the regions of tables in a couple of cases:   (1) CatalogJanitor process. This runs every hbase.catalogjanitor.interval (5mins by default). This is for garbage collecting regions that have been split or merge. The catalog janitor checks whether the daugther regions (after a split) still has references to the parent region. Once the references are compacted, parent can be deleted. Notice that this process should only access recently split or merged regions.   (2) HFile/WAL cleaner. This runs every hbase.master.cleaner.interval (1 min by default). This is for garbage collecting data files (hfiles) and WAL files. Data files in HBase can be referenced by more than one region, table and shared across snapshots and live tables and there is also a minimum time (TTL) that the hfile/WAL will be kept around. That is why the master is responsible for doing reference counting and garbage collecting the data files. This is possibly the most expensive NN operation among the other ones in this list.    (3) Region Balancer.  The balancer takes locality into account for balancing decisions. That is why the balancer will do file listing to find the locality of blocks of files in the regions. The locality of files is kept in a local cache for (hard coded unfortunately) 240 minutes.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		06:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 ipc.server.tcpnodelay controls use of Nagle's algorithm on any server component that makes use of Hadoop's common RPC framework.  That means that full deployment of a change in this setting would require a restart of any component that uses that common RPC framework.  That's a broad set of components, including all HDFS, YARN and MapReduce daemons.  It probably also includes other components in the wider ecosystem. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-30-2015
	
		
		07:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks Steve.   In our case, we are looking to set it at RM level, not necessarily even at app/AM level. So, AM fails for any reason, just don't retry AM on the same host, pick something else. Based on error, it might be good option to blacklist at RM level to not send further AMs there.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-14-2016
	
		
		05:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							, how can i access the jira 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-19-2017
	
		
		10:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser .   Is there any option to configure YARN logger to produce text and not TFile binary format? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		01:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 HDP service logs are available in ambari log search.  the back end is solr so you can pull all or only relevant info based on your requirements.  Also for service level metrics, ambari stores these now in grafana. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-02-2015
	
		
		11:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks @ravi@hortonworks.com 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
- Next »
 
        













