Member since 
    
	
		
		
		05-02-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                360
            
            
                Posts
            
        
                65
            
            
                Kudos Received
            
        
                22
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 15721 | 02-20-2018 12:33 PM | |
| 2051 | 02-19-2018 05:12 AM | |
| 2385 | 12-28-2017 06:13 AM | |
| 7925 | 09-28-2017 09:25 AM | |
| 13516 | 09-25-2017 11:19 AM | 
			
    
	
		
		
		11-06-2017
	
		
		07:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Team Spark   Seems the target has 2 columns where as your insert query has 3 columns. Check the no of columns in the select clause. Then it should work fine. Its not the problem with dynamic partition. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-30-2017
	
		
		12:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Saurabh   It happens sometimes because of limitation of no of lines displayed in CLI.  try this --> hive -e "show create table sample_db.i0001_ivo_hdr;" > ddl.txt 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-25-2017
	
		
		10:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Biswajit Chakraborty   One way of doing this is to reduce the no of records which is flowing from spark to hive.  Use a filter condition and reduce the no of records flowing and try to insert in multiple inserts. That should work and reduce the record flowing into memory. Also when you are inserting from spark to hive im not sure why but there are high chances that the data moving to shuffles might be very high. If possible attach the complete logs.Hope it helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2017
	
		
		12:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Rush   You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-16-2017
	
		
		01:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Rashi Jain   To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-14-2017
	
		
		10:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@viswanath  When you are running the insert statement for a static partition then the lock is obtained over the folder which would be created for the static partition. Now your select query would running over the entire table or the entire folder created for the table which includes all the partition including the one which you are overwriting. In such case it comes to simple file handling. When a file is getting written then it locks the file. The same applies here as well. Hope it helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-11-2017
	
		
		09:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Saravanan Ramaraj  Technically you cant compare RDBMS with Hive atleast for now. Once way of doing it capture stats of the table in the hive properties through you can see improvement in performance. If you wanted to do analysis on column then you may have to run   Analyze table dbname.tblname partition compute statistics for columns column_name;  By performing column statistics you can experience a better performance.  Hive works well in terms of large computing for which it is specifically designed. But comparing with RDBMS is like comparing apples with oranges. Hope it helps! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-28-2017
	
		
		09:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @Gayathri Devi  sample nested case which can be used in hive.  Select case 
when hour(split(split(hbid,"#")[1],"_")[1])== 0 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2") 
when hour(split(split(hbid,"#")[1],"_")[1]) ==1 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==3 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
else 'NA'
end as column1 from table_name;   If it helps then please accept it as the best answer! Happy Hadooping!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-27-2017
	
		
		12:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Gayathri Devi   You have missed end in the statement. This should work.  case when hour(split(split(hbid,"#")[1],"_")[1])==0<br>then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2") else 'NA' end    Hope it helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-26-2017
	
		
		11:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Gayathri Devi  I couldn't think of any built in function in hive to handle this scenario.   The other way of doing this is by using something like below:  select from_unixtime(unix_timestamp(current_timestamp)+7200);  7200 implies seconds needed for 2 hours as mentioned in your quetion. You can alter it based on your need.  Instead of the hardcoded value you can pass variable or row_number()over() * 3600 can be used to generate sequentially.  Hope it Helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













