Member since 
    
	
		
		
		09-02-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                523
            
            
                Posts
            
        
                89
            
            
                Kudos Received
            
        
                42
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2723 | 08-28-2018 02:00 AM | |
| 2695 | 07-31-2018 06:55 AM | |
| 5674 | 07-26-2018 03:02 AM | |
| 2977 | 07-19-2018 02:30 AM | |
| 6459 | 05-21-2018 03:42 AM | 
			
    
	
		
		
		08-28-2018
	
		
		10:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Matt_     It may possible if your JAVA_HOME is not referring to the right path.     export JAVA_HOME=<the right path -or- (usually /usr/java)>     please check java path at node13 and set the right path and try again, it may help you    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-28-2018
	
		
		02:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @AWT     If you have your data is in hdfs and If your CM version is same in all your cluster/environment (if you are using different CM Login), then the easy way is  ClouderaManager -> Backup(menu) -> Peers -> Add Peer  ClouderaManager -> Backup(menu) -> Replication Schedules -> Create schedule     or you can use distcp 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2018
	
		
		04:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @rupertlssmith        You have to initialize sc depends upon how you are executing your code. If you are using spark-shell command line then you don't need to initilize sc as it will be initialized by default when you login but if you are developing code in other 3rd party tools and executing then you have to initilialize as follows:     You can add the below lines before you call rddFromParquetHdfsFile     import org.apache.spark.SparkConf  import org.apache.spark.SparkContext  val conf = new SparkConf().setAppName("your topic").setMaster("yarn-client")  val sc = new SparkContext(conf) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-20-2018
	
		
		05:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @DanielWhite     In General, there could be so many reasons for these kind of issues,      1. Both the source and destination clusters must have a Cloudera Enterprise license     https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_bdr_replication_intro.html#concept_exl_dwt_bx     2.  Pls refer the below link to understand the supported/unsupported scenarios. Ex: for unsupported: The clusters use different Kerberos realms. etc..     https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_bdr_replication_intro.html#concept_rt2_1wt_bx 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2018
	
		
		04:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @kratka     There are different methos, below is the default settings,  if you go to CM-> host -> select a host -> resources (menu)  It will show you how many resources (cpu, mem, etc) has to be allocated for Yarn, Impala, hdfs, etc per node     You can control them using the below,  a. CM -> Yarn -> Config -> click on "Nodemanager" (left) and "Resource Management" (left) -> Consider cpu or mem as needed  b. CM -> Impala -> Config -> Click on "Imapal Daemon" (left) and "Resource Management" (left) -> consider only cpu.shares, mem_limit as needed.     If no luck then you can use dynamic resource pooling and create different job queue for MR & Impala 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2018
	
		
		11:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @vratmuri     Oh then you can use cloudera API     Link for cloudera API reference:  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_intro_api.html     Link for specific to service properties (you may need to explore little for impala query). It may help you  https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cm_intro_api.html#xd_583c10bfdbd326ba--7f25092b-13fba2465e5--7f20__example_txn_qcw_yr       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-02-2018
	
		
		05:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @vratmuri     You can get it from Cloudera Manager.     Go to CM -> Impala -> Queries (tab) -> choose the time frame and click on 'export' button 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-31-2018
	
		
		06:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @chriswalton007     There are different types of latencies      Namenode RPC latency  Journal Node FSync latency  network latency  etc     There are few points here,  1. Your network latency will vary based on the traffic in your cluster. It may create trouble during peak hours.  2. The latency issue may leads to follows: As we know, the master daemons will always wait for the update from child daemons in every few seconds, and master will consider the child is not available/dead in case of any delay and look for alternate. It is unnecessary unless it is a real issue with child.  3. As far as NN RPC is the concern, in a HA cluster, both active and standby NN has to talk each other and it should be in sync with in few seconds. If they are not in sync and if something went wrong on active NN, the standby become active but it may not be up to date and it will lead to confusion.     end of the day, every seconds are matter in a distributted cluster.      But in your use case, not sure you are going to use cloudera director if so, the link that you have shared says it will not allow to create a mixed cloud/on-premise cluster. But if you are going to use a different tool and it will allow you to configure the mixed cloud/on-prem then you can go ahead based on the below...  1. If you are going to try this for a non-prod environment first  2. if you have less work loads    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2018
	
		
		03:02 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @HJ     Not sure which version of impala, CDH you are using     NANVL in-build function is available in oracle but impala does not support (as of CDH 5.14 version)     Below is the link for CDH 5.14 shows all the available conditional functions, which has NVL, NVL2 but not NANVL. You may need to write your custom function to meet this part     https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_conditional_functions.html       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-24-2018
	
		
		02:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @yongie     switch to hdfs user and try again 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













