Member since 
    
	
		
		
		09-02-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                523
            
            
                Posts
            
        
                89
            
            
                Kudos Received
            
        
                42
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2719 | 08-28-2018 02:00 AM | |
| 2693 | 07-31-2018 06:55 AM | |
| 5673 | 07-26-2018 03:02 AM | |
| 2975 | 07-19-2018 02:30 AM | |
| 6457 | 05-21-2018 03:42 AM | 
			
    
	
		
		
		07-23-2018
	
		
		10:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @martinbo     Regarding multiple user creation on multiple nodes, you have to use configuration tools like puppet, chef, ansible, etc     You were asking only about creating a new user in each node, but in real time, your requirement will be extended as follows:    1. Create/modify user at each node  2. Setup temporary password if you don't have sso  3. Create/modify multiple user-groups at each node (admin group, developer group, tester group, analyst, etc)  4. Assign each user to the corresponding user-groups  5. Create a home directory to the each user, setup quota if needed  6. Setup permission & owner to each home directory (as other user should not access)  7. etc     There are so many other activities we can do with this tool, but i've listed few based on your requirement... hope it will help       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-19-2018
	
		
		02:30 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @scratch28     You can use clouder navigator to generate this report     Login as full admin to CM      Cloudera Management Service -> Navigator Metadata Server -> Cloudera Navigator (menu) -> search for 'impala' -> choose from left side options. You can choose upto last 365 days (or) custom period 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-03-2018
	
		
		04:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    @Rod     CDH 6.0 will support Spark 2.2.0 and Spark 2.2.0 will support Spark SQL as per below link     https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#sql     Not sure this is the answer for your question, if not, please give some more details 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2018
	
		
		05:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @KeepCalmNCode     You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx)     If your block.size > small file  then  you will not find any difference     Ex: All the below will give the same result  445 MB > 1 MB   400 MB > 1 MB  300 MB > 1 MB  200 MB > 1 MB  100 MB > 1 MB  10 MB > 1 MB  2 MB > 1 MB     may be you will find difference in file size when you set the block.size < small file          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-13-2018
	
		
		05:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sumit     The link that you have provided says "If you have selected IAM authentication, no additional steps are needed" - which includes editing the core-site.xml     You have confirmed that added "AWS Credentials" in "External Accounts" from "Administration" menu - which is IAM role based authentication.      so you don't need to do both 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-12-2018
	
		
		05:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sumit     1. What is your Cloudera Version? If you are using Cloudera 5.10 or above then you can follow the instruction from the link that i've given above espectially 'Adding AWS Credentials' . You didn't mentioned whether you have already tried or not  2. Not sure you have restarted your cluster after the configuration change  3. If you follow this option, it will be applicable for all the users  4. I don't know which blog that you are following and how old it is... if you are using cloudera then use cloudera document          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-09-2018
	
		
		01:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sumit     Pls refer the below link, it will explain    S3 as storage for Impala tables  S3 as a source or destination for HDFS and Hive/Impala replication and for cluster storage  Where to update the credentials, etc   https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_auth_aws.html#concept_tmd_nsh_2y       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-09-2018
	
		
		01:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @AcharkiMed     Basically this is a recommendation, still it will be considered as Mandatory or Optional depends upon the environment that you are using    Ex:  *) For Prod Env - It is mandatory. Otherwise it will create performance difference if there is any switch between active and standby NN    *) For test/POC Env - It is optional if you don't have right choice 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-21-2018
	
		
		03:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @sim6     I hope you have more than 3 data nodes     Generally there two types of "data missing" issues are possible for many reasons  a. ReplicaNotFoundException  b. BlockMissingException     If your issue is related to BlockMissingException and if you have backup data in your DR environment then you are good otherwise it might be a problem, but for ReplicaNotFoundException, please make sure all your datanodes are healthy and commissioned state. In fact, namenode suppose to handle this automatically whenever a hit occurs on that data.. if not, you can also try hdfs rebalance (or) NN restart may fix this issue, but you don't need to try this option unless some user report any issue on the particular data. In your case no one reported yet and you found it, so you can ignore it for now 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-09-2018
	
		
		10:13 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    @hendry     There could be two possibilities for this scenario     1. May be the hive and impala tables are referring to the two different files. But chances are less for this scenario unless any minor mistakes in the tables (or) some other internal error     You can confirm this by   > describe formatted db.tablename   Run this command from both hive and impala and get the location and compare     2. Your file has duplicate records. I mean some key values are same but other columns may have different value. So it may return different value when you filter. So check your data in detail          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













