Member since 
    
	
		
		
		10-21-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                59
            
            
                Posts
            
        
                31
            
            
                Kudos Received
            
        
                16
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4193 | 03-09-2018 06:33 PM | |
| 3417 | 02-05-2018 06:52 PM | |
| 15764 | 02-05-2018 06:41 PM | |
| 5152 | 11-30-2017 06:46 PM | |
| 2080 | 11-22-2017 06:20 PM | 
			
    
	
		
		
		08-31-2018
	
		
		06:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for the update. Glad you were able to make it work. Thanks for the comments and sharing it with the community. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-26-2018
	
		
		06:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Sriram Hadoop   >How can I change the block size for the existing files in HDFS? I want to increase the block size.  May I ask what you are trying to achieve? We might be able to make better suggestions if we know what is the problem you are trying to solve?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-26-2018
	
		
		06:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Michael Bronson  1. is it safty to run e2fsck -y /dev/sdf in order torepair a disk /dev/sdf file-system ?  Datanodes need to be able to read and write to the underlying file system. So if there is an error in the file system, we have no choice but to fix it. That said, HDFS will have the same blocks on other machines. So you can put this node into maintenance mode in Ambari and fix the file system errors. There is a possibility of losing some data blocks. So if you have this error in more than one datanode, please do this one by one, with some time in between. I would run fcsk and then reboot the Datanode machine to make sure everything is okay, before starting work on the next node.  2. is it necessary to do some other steps after running - e2fsck -y /dev/sdf ?   Not from the HDFS point of view, as I said, I would make sure I am doing this datanode by datanode and not in parallel. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-09-2018
	
		
		06:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @smdas Sorry forgot to tag you. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-09-2018
	
		
		06:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 A quick search in the code base tells me that we have these following policies   AvailableSpaceBlockPlacementPolicy  BlockPlacementPolicyDefault  BlockPlacementPolicyRackFaultTolerant  BlockPlacementPolicyWithNodeGroup  BlockPlacementPolicyWithUpgradeDomain   > yet didn't find any documentation listing the available choices.   You are absolutely right, we can certainly do better on documenting this, Thanks for pointing this out. I will address this in an Apache JIRA.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-14-2018
	
		
		05:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @PJ  I am guessing  that it could be related to "dfs.namenode.startup.delay.block.deletion" value since you mention that you restarted the cluster.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2018
	
		
		06:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Malay Sharma   Really depends on your workload. The best case is if you can get good block sizes, that if you have really large files, you should set your block size to 256MB( this reduces the number of blocks in the file system, since you are storing more data per block), or use the default 128 MB. If you have a running cluster, using smart sense, you can get a view of the file size distribution of your current cluster. That will give you an idea if you need to tune the block size for performance.   From experience, I can tell you that the performance of your cluster is not going to be dependent on block size unless you have a large number of files in the system.   Also, unlike a physical file system, setting the block size to 128MB does not mean that each block write will use up 128MB. HDFS will only use the number of bytes actually written, so there is no waste of space because of the block size.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2018
	
		
		06:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							  @Malay Sharma   HDFS is very good at caching all file names and block addresses in memory. This is done at the Namenode level. This makes HDFS incredibly fast. So if you make modifications to the file system or read a file location, all of this can be served with no disk I/Os.  This design choice of keeping all metadata in memory at all times has certain trade-offs. One of them is that we need to spend a couple of bytes (think 100s of bytes) per file and blocks.    This leads to an issue in HDFS -- when you have a file system with 500 million to 700 million -- the amount of RAM that needs to be reserved by the Namenode becomes large.  Typically, in sizes of 256 GB or more. At this size, the JVM is hard at work too; since it has to do things like garbage collection. There is also another dimension to this when you have 700 million files, it is quite possible that your cluster is serving 30-40K or more requests per second. This also creates lots of memory churn.   So a large number of files, combined with lots of file system requests makes Namenode a bottleneck in HDFS or in other words,  the metadata that we need to keep in memory creates a bottleneck in HDFS.  There are several solutions / work in progress to address this problem --    HDFS federation -- That is being shipped as part of HDP 3.0, allows many names nodes to work against a set of Datanodes.   In HDFS-7240, where we are trying to separate the block space from the namespace, that allows us to immediately double or quadruple the effective size of the cluster.   Here is good document that tracks various issues and different approaches to scaling the namenode -- Uber scaling namenode  There is another approach where we try to send the read workloads to the secondary namenode, freeing up the active namenode and thus scaling it better. That work is tracked in Consistent Reads from Standby Node   Please let me know if you have any other questions.  Thanks  Anu 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-05-2017
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 From the error message, it looks like some of the services might not be running. Can you please make sure that zookeeper and journal nodes are indeed running before starting NN? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-01-2017
	
		
		06:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Sedat Kestepe   Since you don't care about data, from an HDFS perspective it is easier to reinstall your cluster. If you insist I can lead you through the recovery steps, but if I were you I would just reinstall at this point. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













