Member since 
    
	
		
		
		06-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                61
            
            
                Posts
            
        
                20
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2607 | 01-21-2017 06:18 PM | |
| 3101 | 08-19-2016 06:24 AM | |
| 2032 | 06-09-2016 03:23 AM | |
| 3777 | 05-27-2016 08:27 AM | 
			
    
	
		
		
		02-02-2017
	
		
		02:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 yes similar to this https://community.hortonworks.com/questions/79103/what-is-the-best-way-to-store-small-files-in-hadoo.html#comment-80387 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-29-2017
	
		
		06:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshiThanks a lot for your help and for guiding 🙂 . thanks for explaining in detail 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-29-2017
	
		
		03:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks so much for answering , i think i am closer to answer  please elaborate below solutions as advised by you , i am not much familiar with hive   
 Put data into Hive. There are ways to put xml data into hive. At a very dirty level you have an xpath udf to work on xml data in Hive, or you can package it luxuriously by converting xml to avro and then using serde to map the filelds to column names. (let me know if you want to go over this in more detail and I can help you there)  Combine bunch of files, zip it up and upload to hdfs. This option is good, if your access is very cold (once in a while) and you are going to access the files physically (like hadoop fs -get)   FYI below:  Please advice on how to store and read archive data from hive  while storing data in hive , should i save it as .har in hdfs ?  our application generatee small size xml files which are stored on NAS and XML associated metadata in DB .  plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old  so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase  Please advise what do you think will be the better approach for reading this archived data  suppose i am storing this archived data in hive so how do i retrieve this archived data  Please guide me for storing archived data in hive  also guide for Retrieving/reading archived data from hive when needed 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-29-2017
	
		
		03:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi thanks for your reply  I appreciate your inputs  Please advice on how to store and read archive data from hive   while storing data in hive , should i save it as .har in hdfs ?  our application generatee small size xml files which are stored on NAS and XML associated metadata in DB .  plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old  so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase  Please advise what do you think will be the better approach for reading this archived data  suppose i am storing this archived data in hive so how do i retrieve this archived data  Please guide me for storing archived data in hive  also guide for Retrieving/reading archived data from hive when needed    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-29-2017
	
		
		03:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@hduraiswamy i appreciate your inputs   Please advice on how to store and read archive data from hive   while storing data in hive , should i save it as .har in hdfs ?  our application generatee small size xml files which are stored on NAS and XML  associated metadata in DB .  plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old  so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase  Please advise what do you think will be the better approach for reading this archived data  suppose i am storing this archived data in hive so how do i retrieve this archived data  Please guide me for storing archived data in hive   also guide for Retrieving/reading archived data from hive when needed  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-29-2017
	
		
		03:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @mqureshiHi our application generate small size xml files which are stored on NAS and XML  associated metadata in DB .  plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old  so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase  Please advise what do you think will be the better approach for reading this archived data  suppose i am storing this archived data in hive so how do i retrieve this archived data  Please guide me for storing archived data in hive   also guide for Retrieving/reading archived data from hive when needed  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-22-2017
	
		
		06:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Team,  consider a hadoop cluster with  default block size of 64Mb , we have a case wherein we would like to make use of hadoop for storing historical data and retrieving it as per need   historical data would be in form of archive containing many small files (millions) , so thats the reason we would like to reduce default block size in hadoop to 32MB ?  I also understand that changing default size to 32MB  may adversely affect if we plan to use that cluster for applications which ,  store files which are huge in size  ,  so can anyone advise what to do in such situations  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		01-22-2017
	
		
		06:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							   
	 We have a situation where in we have lots of small xml files residing on Unix NAS and its associated metadata in Oracle DB.   
	we want to combine this 3 month old XML and its associated metadata in 1 archive file (10GB) and want to store in hadoop .   whats the best way to implement this in hadoop ? Note after creating 1 big archive , we will  have many small files (each file size may be 1Mb or less) inside my archive so i would reduce block size to 32MB for example may be   
	I read about hadoop archive .har files or storing data in hbase   would like to know pros/cons from hadoop community experience whats the recommended practice for such situations   can you please advise   also  reducing hdfs block size to 32 MB to cater to this requirement ? how does it look   I want to read this data from hadoop whenever needed without affecting performance   Thanks in advance  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		01-21-2017
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 thanks for confirming , so what i wrote is correct that is changing dfs.blocksize . restart anyways will happen 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-19-2017
	
		
		07:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							     can you please advise if we need to change any other parameter apart from changing dfs.blocksize in HDFS config   any other suggestions are also welcome as long it helps in saving space in block and not wasting any block space   also do we need any change on DataNodes side also ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
 
        













