Member since 
    
	
		
		
		07-17-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                143
            
            
                Posts
            
        
                16
            
            
                Kudos Received
            
        
                17
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1976 | 07-03-2019 02:49 AM | |
| 2186 | 04-22-2019 03:13 PM | |
| 1712 | 01-30-2019 10:21 AM | |
| 9474 | 07-25-2018 09:45 AM | |
| 8560 | 05-31-2018 10:21 AM | 
			
    
	
		
		
		02-27-2018
	
		
		11:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi all,    I have read in the Mahout Installation docs that it was deprecated since CDH 5.5 and it will be removed at CDH 6.0 as I see in the Deprecated items.  Any idea about the why? and the alternatives?    Thanks in advance. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		02-27-2018
	
		
		10:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @kundansonuj    I think it(s almost same issue reported in this Apache JIRA ticket:  https://issues.apache.org/jira/browse/IMPALA-5399    Any way check my answer (https://issues.apache.org/jira/browse/IMPALA-5399#comment-16166044) I have already use it and it's work..    Good luck. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-27-2018
	
		
		08:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @lizard    By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.  source: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html    NB: Do you remove all the HDFS Trash files in paths (/user/impala/Trash/*, (/user/hdfs/Trash/*...).    Good luck man. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-27-2018
	
		
		07:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You are welcome @PedroGaVal  Yes you are absolutely right man. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-27-2018
	
		
		04:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @gimp077    I think there is two ways to do it:    1- You can put the output of impala-query in HDFS after you get it in a system file with PUT HDFS command:      sudo -u hdfs hdfs dfs -put "${3}" hdfs_path    2- You can use a directe insert into a result_table (stored in HDFS) just before your select statement:      INSERT INTO result_tables YOUR_QUERY    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-27-2018
	
		
		04:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @PedroGaVal    In effect, Impala is a query engine, that you can pass the queries through it to interogate the data stored in HDFS or KUDU files.  And when you use KUDU you don't need a UDFs! because the Impala/KUDU support the UPDATE/DELETE statements.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-30-2018
	
		
		03:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hmm I understand,  Thank you @Todd Lipcon for the answers,    So, now there is no way to do a query like this in a mixed cluster ?!  Else I'll try do a join in an intermediate table before  doing the update query to avoid the imbricate join. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-30-2018
	
		
		12:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,    All the 10 KUDU tablets servers and also KUDO master server in my cluster supports the SSE4.2 (Ex: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, Intel(R) Xeon(R) CPU E5506  @ 2.13GHz, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz...).    And I'm already working with KUDU, and the most of queries are good, also the UPDATE without JOIN (with a HDFS table) works fine.    Also the client of impala daemons where I execute the concerned UPDATE query supported the SSE4.2 feature. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-30-2018
	
		
		08:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,    I have a small cluster with 20 nodes, (10 of them has the SSE4.2 in CPU), so I have 20 HDFS DNs, and 10 KUDU tablets servers (10 are common).  When I try to execute the bellow query:     
 UPDATE t1 SET t1.num = t2.id
FROM db1.table1 t1
JOIN db2.table2 t2
WHERE t1.name= t2.name
AND t1.active IN (1,2); 
 Knowing that table1 is a KUDU table and table2 is HDFS/parquet table.    I had this error message:     
 WARNINGS: Unable to create Kudu client: Not implemented: The CPU on this system (Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz) does not support the SSE4.2 instruction set which is required for running Kudu. If you are running inside a VM, you may need to enable SSE4.2 pass-through. 
   NB: I use CDH v5.12, Impala v2.9 and Kudu v1.4.    Why I had this issue and is there another form to do the same query without problem ?  Thanks in advance. 
   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Impala
 - 
						
							
		
			Apache Kudu
 - 
						
							
		
			HDFS
 
			
    
	
		
		
		11-01-2017
	
		
		10:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @EricL,    Here is the profile of query and ODBC log files:    NN - ODBC logs - query 200k - 10s - 1.73s without log  Remote Server - ODBC logs - query 200k - 48s - 41s without log    Thanks in advance. 
						
					
					... View more