Member since 
    
	
		
		
		09-25-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                109
            
            
                Posts
            
        
                36
            
            
                Kudos Received
            
        
                8
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3458 | 04-03-2018 09:08 PM | |
| 5364 | 03-14-2018 04:01 PM | |
| 12689 | 03-14-2018 03:22 PM | |
| 4237 | 10-30-2017 04:29 PM | |
| 2194 | 10-17-2017 04:49 PM | 
			
    
	
		
		
		09-24-2019
	
		
		02:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 HI @hadoopguy   Yes there is an impact you will have longer processing time and the operations will be queued.   You have to carefully handle the timeout in your jobs.      Best,  @helmi_khalifa  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-30-2017
	
		
		07:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @deepak rathod yes, you are using HDP-2.3.2.0. You need to upgrade to HDP-2.6.2.0.  Here is the doc:  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/index.html  https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.2.0/bk_ambari-upgrade/content/ambari_upgrade_guide.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2017
	
		
		04:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Neha G  In a kerberized cluster there are 2 types of keytabs or principals headless and service principals.   Headless principals are not bound to a specific host or node and are presented like @ SRV.COM   Service principals are bound to a specific service and host or node, and are presented like with syntax: /@ SRV.COM   So when you initialize the hdfs.headless.keytab is as DoAs so the user will take hdfs permissions 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-18-2017
	
		
		11:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 For users having hive insert query with dynamic partition, and partitions (> 10) on a column, you may notice that your query is generating too many small files per partition.   INSERT OVERWRITE TABLE dB.Test partition(column5)
select
   column1
  ,column2
  ,column3
  ,column4
  ,column5
from
  Test2;
  For Example, if your table has 2000 partitions, and your query is generating 1009 reducers (hive.exec.reducers.max), then you might end up with 2 million small files.   To Understand "How Does Tez determine the number of reducers"   refer: https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html)  This could result into issues with:  1. HDFS Namenode performance:  Refer: https://community.hortonworks.com/articles/15104/small-files-in-hadoop.html  2. File Merge Operation failing due java.lang.OutOfMemoryError: GC overhead limit exceeded .  » File Merge 
, java.lang.OutOfMemoryError: GC overhead limit exceeded 
  at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149) 
  at java.lang.StringCoding.decode(StringCoding.java:193) 
  at java.lang.String.(String.java:414) 
  at com.google.protobuf.LiteralByteString.toString(LiteralByteString.java:148) 
  at com.google.protobuf.ByteString.toStringUtf8(ByteString.java:572) 
  at org.apache.hadoop.security.proto.SecurityProtos$TokenProto.getService(SecurityProtos.java:274) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:848) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:833) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1285) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1435) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1546) 
  at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1555) 
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:621) 
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) 
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
  at java.lang.reflect.Method.invoke(Method.java:497) 
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) 
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) 
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) 
  at com.sun.proxy.$Proxy13.getListing(Unknown Source) 
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2136) 
  at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:1100) 
  at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:1075) 
  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:304) 
  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) 
  at org.apache.hadoop.hive.shims.Hadoop23Shims$1.listStatus(Hadoop23Shims.java:148) 
  at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217) 
  at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75) 
  at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:309) 
  at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.processPaths(CombineHiveInputFormat.java:596) 
  at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:473) 
  at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571) 
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 
  To avoid this issue, set the following property:  set hive.optimize.sort.dynamic.partition=true;  When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		01-03-2017
	
		
		07:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Great tip.   For people new to the Tez lexicon: 
AM = application master
DAG = directed acyclic graph 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-22-2016
	
		
		08:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Steps:  Oozie server timezone    
 For Ambari users, login and navigate to Custom Oozie-site.   Ambari > Oozie > configs > custom oozie-site  
 Add Property "oozie.processing.timezone=GMT-0500"   oozie.processing.timezone Default value=UTC  Oozie server timezone. Valid values are UTC and GMT(+/-)####, for example 'GMT+0530' would be India timezone. All dates parsed and generated dates by Oozie Coordinator/Bundle will be done in the specified timezone. The default value of 'UTC' should not be changed under normal circumstances. If for any reason is changed, note that GMT(+/-)#### timezones do not observe DST changes.      
 Save and Restart Oozie Service   Oozie Web Console  To View the Job in EST Timezone on Oozie Web Console, follow below steps.  
 Open Oozie Web Console on http://<oozieUrl>:11000/oozie/  Navigate to "Settings" Tab  Select from dropdown menu Timezone: EST       
 Navigate to "* Jobs" Tab and refresh to see the jobs in EST timezone   Oozie Coordinator properties, use the time in EST and append "-500" to it.  start="2016-12-22T15:46-0500"   end="2016-12-22T18:00-0500" 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-11-2017
	
		
		03:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  what should be the url/command when we need to access hadoop jobs for a specified time duration ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-04-2016
	
		
		04:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Matt Burgess   2 things Resolved the issue:  1. start with the "jdbc:hive2" prefix   jdbc:hive2://host.name.net:10000/;principal=hive/_HOST@EXAMPLE.COM  2. Add following property to hive-site.xml that is passed under HiveConnectionPool  "Hive Configuration Resources" property.      <property>
      <name>hadoop.security.authentication</name>
      <value>kerberos</value>
    </property> 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-24-2016
	
		
		09:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Saumil Mayani   NM manager stores apps/containers info inside /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery but I'm not aware of any tool which can read these files. Either you can parse RM and NM logs to find the rough idea of containers count. Also I would recommend you to increase the NM heapsize from 1G to 3G and restart the NM service. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-17-2016
	
		
		08:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							@Saumil Mayani
 Please try setting below parameters and see if that fix the issue.  export HADOOP_USER_CLASSPATH_FIRST=true  
export HADOOP_CLASSPATH= /full-jar-path/xyz.jar 
						
					
					... View more