Member since 
    
	
		
		
		09-02-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                523
            
            
                Posts
            
        
                89
            
            
                Kudos Received
            
        
                42
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2724 | 08-28-2018 02:00 AM | |
| 2696 | 07-31-2018 06:55 AM | |
| 5684 | 07-26-2018 03:02 AM | |
| 2981 | 07-19-2018 02:30 AM | |
| 6463 | 05-21-2018 03:42 AM | 
			
    
	
		
		
		11-15-2017
	
		
		07:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @hparteaga     The correct way is  1. In Cloudera manager -> Add Sentry Service and make sure it has Hue  2. Login to Hue -> Go to Security Menu -> it will have sub menu called either Sentry table (or) Hive table. The below link will explain why either sentry table or hive table. Use this option to set the db, table, column level authentication        http://community.cloudera.com/t5/Security-Apache-Sentry/Hive-Tables-instead-Sentry-Tables/m-p/48740#M190 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-13-2017
	
		
		07:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @cdhhadoop     Try the below, it may help you     Cm -> Yarn -> Configuration -> "Java Heap Size of NodeManager in Bytes" Get the current value like 1GB or 2GB, etc... Increase one extra GB, ex: if it is 1GB, increase it to 2GB     (or)      Cm -> Yarn -> Configuration -> "Garbage Collection Duration Monitoring Period" Increase it from 5 mins to 10 mins     restart yarn as needed 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-07-2017
	
		
		08:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @epowell     The issue might be related to the below jira which is opened a long back still in open status     https://issues.apache.org/jira/browse/HDFS-3447     as an alternate way to connect to hdfs, go to hdfs-site.xml and get dfs.nameservices and try to connect to hdfs using namespace as follows, it may help you     hdfs://<ClusterName>-ns/<hdfs_path>      Note: I didn't get a chance to explore this... also not sure how it will respond in old cdh version    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-06-2017
	
		
		07:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @gaurav796     The difference is        insertInto: To overwrite any existing data     Mode comes with additional options, like      mode("append"):  Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data.     Note: I didn't get a chance to explore this before reply 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-03-2017
	
		
		08:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @dubislv     Pls follow this steps    1. Ex: Impala -> instances -> Role Groups -> Create (as needed and choose the existing group)    2. Ex: Impala -> instances -> Role Groups -> click on already existing group (in your case Impala Daemon Default Group) -> Select the host -> Action for Selected -> Move to Different Role Group -> select the newly created group          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-02-2017
	
		
		08:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @ganeshkumarj     a. mapred.map.tasks - The default number of map tasks per job is 2. Ignored when mapred.job.tracker is "local". You can modify using set mapred.map.tasks = <value>    b. mapred.reduce.tasks - The default number of reduce tasks per job is 1. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". you can modify using set mapred.reduce.tasks = <value>     https://hadoop.apache.org/docs/r1.0.4/mapred-default.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-18-2017
	
		
		07:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @cdhhadoop     as mentioned, you will get warning if b > a  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2017
	
		
		01:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @cdhhadoop     Get the value of  a. CM -> HDFS -> Configuration -> DataNode Block Count Thresholds  b. CM -> HDFS -> WebUI -> Namenode Web UI -> Click on datanode menu -> Get the block count of your node     if b > a then you will get block count warning     also cloudera advice says "presence of many small files" also create this warning     action:   1. if it is not disturbing anything then you can ignore this warning but just keep an on eye on block pool usage percentage from 'b'  2. you can increase block count thresholds in 'a'  3. you can cleanup unwanted data, but if your trash folder maintains old data (for ex: 24 hrs) then you will see the result after 24 hours  4. add additional data nodes and apply rebalance  etc       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-06-2017
	
		
		09:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @desind     To add on to your point, the cluster setup is applicable to all the mapreduce job, so it may impact other non-mapreduce jobs.      In fact I am not against setup higher value in cluster itself, but you can do that based on how many jobs requires higher values and performance, etc       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-04-2017
	
		
		01:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @wchagas     One common reason to disable the firewall is, as we know HDFS maintains replication in different nodes/racks but it shouldn't take any extra time for that. Setting firewall using SElinux may disturb this (or) lead to performance issue. So the general recommendation is to disable the firewall. But I believe in some cases users are still using hadoop with firewall for security reasons (if the business really demands).     Regarding your question about security, you can follow the other recommended securities like kerberos, sentry, etc (depends upon your needs). 
						
					
					... View more