Member since 
    
	
		
		
		05-02-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                360
            
            
                Posts
            
        
                65
            
            
                Kudos Received
            
        
                22
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 15719 | 02-20-2018 12:33 PM | |
| 2049 | 02-19-2018 05:12 AM | |
| 2382 | 12-28-2017 06:13 AM | |
| 7925 | 09-28-2017 09:25 AM | |
| 13514 | 09-25-2017 11:19 AM | 
			
    
	
		
		
		12-12-2018
	
		
		04:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Its good approach but the only point which I could find as disadvantage is multiple hops to achieve the desired result. Instead of performing joins we can apply windowing function to achieve the same in a single hop assuming you unique value column and last modified date in your scenario.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-11-2018
	
		
		07:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @harsha vardhan Could you explain a bit more on that?   Yes you can override the queue whenever you want. But it also depends on the user/groups access as well. If the user is assigned to specific groups and if the groups are not assigned/given privileges to access any other queue then it will not be possible unless proper access are given to user groups.   But if you have access to multiple queues then , you can have a parameter passed as a queue name to the sqoop job and if the queue name has to be changed, then  you can do that with the combination of shell+sqoop.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-23-2018
	
		
		06:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Swaapnika Guntaka  When you are deleting a data from HDFS all the data will be moved to Trash. But there is a time span between which the trash will be flushed out on regular frequency. If its flushed out then there is no way to recover the data unless you DR in place which is possible only in production environment.  Hope it Helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-12-2018
	
		
		07:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @johny
 gate
  Below query works but its kind of dirty. Hope it Helps!!  select * from a
left join 
(select*,lag(col3)over (partition by col1 order by col2) as lag_val from a) tblb 
on tbl b.col1=a.col1 and a.col2=tblb.lag_val
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-06-2018
	
		
		12:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Subramaniam Ramasubramanian  You would have to start by looking into the executor failures. As you said that this jobs was working fine earlier and recently you were facing this issue. In that case I believe the maximum executor failures was set to 10 and it was working fine. But now the no of executor failures started increasing more than 10. Executor failures may be due to resource unavailability as well. So you may need to consider the cluster resource/ memory availability at the time of your job execution as well. Hope it helps! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-16-2018
	
		
		06:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Timothy Spann   If open source is given importance then I would go with Hive using merge, though I haven't tried with merge with huge volume I believe that it would work decent.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-15-2018
	
		
		05:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Timothy Spann   I would go with either attunity & or some utility/framework which can be modified depending on the use case. These kind of frameworks reduces time and effort. Multiple tables can be processed in parallel with less effort. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2018
	
		
		05:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @hippagun  It wont work.  though its ORC hive will be able to differentiate the columns based on the delimiter which you have specified during the table creation. So no matter whether you re-create it it wont work.  There are two option which you can do now:  1) Create another external table with the additional columns. Write a simple query to load the records from old to the new table specifying null the newly added columns. Once it is done drop the old table. Going forward you can make use of this table. It will be suitable for ORC  2) The other way is, If the schema of the table changes frequently then its better to go with avro table as the schema changes can be handled easily. You have to follow the above step just for the first time. But whenever there is any changes in the schema in future then you need to alter the schema file and nothing else is needed.   You can refer to this Link to get the understanding of the handling the schema changes in avro file.  Hope it helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-20-2018
	
		
		12:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Ravikiran Dasari  If it is for knowledge purpose then what Im going to give has no more information then the previous answers. But if you are looking for something related to work then this answer might help a bit.  Have a file watcher which looks for a file with the particular pattern,  which has to be ftp'ed to the desired location. Once the file arrives you can move the file to HDFS server. This can be accomplished by a simple shell script which requires basic knowledge on shell and nothing more.  Also this can accomplished by either push or pull. If you have any other downstream jobs which has to be executed once the file arrives in hdfs then I would recommend to go with pull approach so that you can execute any other hadoop/hive/pig/spark jobs in hdfs server.  Hope it helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-19-2018
	
		
		05:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Lanic  When you submit a job, its YARN which gives an information about the resources. So the driver gets the information from name node regarding the HDFS data location, needed to execute the job. Then based on the nearest available resource which are closer to the data will be taken into consideration where the jobs will be executed. Its the name node which gives Yarn about the information of the HDFS data location. Once all the jobs are completed then the communication about all the jobs status will be updated and corresponding metastore will be brought in sync.   Hope it Helps!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













