About aengineer

aengineer · ‎12-01-2017

@Michael Bronson The steps described by you looks good. If you have Ambari running against this cluster, you should be able to find an option called "Maintenance mode" in the menus. Here is some documentation about that, https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-operations/content/setting_maintenance_mode.html It is not needed for your replace your disks, but it will avoid spurious alerts in your system.

aengineer · ‎11-30-2017

@kskp This is a strange problem. I would check the following things. 1. Check if anyone has done a massive delete operation? 2. Did you mount a disk with data already -- that is a set of data blocks pull out from some other machine/cluster? 3. Please keep the cluster in safe mode, till you puzzle out what happened.

aengineer · ‎11-30-2017

@Michael Bronson Assuming you are talking about a data node. If you are able to replace this disk trivially, that is the operation is simply to pull the disk out of a JBOD, then you can shut down this datanode, replace the disk, format and mount it back. HDFS will detect that it has lost a set of blocks (probably it has already done it since the disk is faulty and no i/o is happening to that disk), and replicate them correctly. You can check if you have any under-replicated blocks in your cluster. You can replace the disk and things will return to normal. There is, however, a small hitch, this new disk will not have the same amount of data as the other disks. If you are running Hadoop 3.0-- it is still in beta and not production ready -- you can run the diskBalancer tool, which will move data from other disks to this new disk. Generally, this will not be an issue. If the disk replacement in your machines is not as straightforward as I described, you can ask Ambari to put this machine into a maintenance state. That will tell HDFS not to replicate all the blocks after the 10 mins window ( by default) when a machine is declared dead. You can do that and then do the operation. Just so that you are aware, HDFS supports a notion of failed volumes. So if you have a data node with a large number of disks, say 8 disks, then you can set the failed volume tolerance, say to something like 2. This will make sure that node works well even in the face of two disks failing. if you do that, you can replace disks when you have a scheduled maintenance window with downtime. Please let me know if you have any more questions or need more help on this.

aengineer · ‎11-22-2017

@ARUN it is very dependent on the type and nature of your data. It is also influenced by the applications that consume data.

aengineer · ‎11-20-2017

@Michael Bronson Just to add to what @Xiaoyu Yao said; if you want to clean only a data node (worker23), you will need to ssh to that node. Then you can either format the data disks or delete data. You can see in Ambari which paths map to the data store. But you need to make sure that are actual disks if you want to run format. A safer approach would be to run "rm -rf /data/disk1", "rm -rf /data/disk2", assuming data node has stored data in a path called /data/disk1 and /data/disk2 Just as @Xiaoyu Yao mentioned, please do NOT format the namenode, the whole cluster will be lost.

aengineer · ‎09-08-2017

Since you said only one node, I am not sure if you mean there is only one physical machine in the cluster. if that is the case, you want to enable something called pseudo-distributed mode, where all processes, that is namenode, datanode etc. are going to run on the same machine. You can find instructions here. https://hortonworks.com/hadoop-tutorial/introducing-apache-ambari-deploying-managing-apache-hadoop/ On the other hand, if you want to run namenode on a single machine -- but you have a set of physical nodes, you can use Ambari normally. if you want to use Amabri, You can select HDFS and the standard installation does not assume that you have HA. If you want to turn on HA then use HA Wizard.

aengineer · ‎01-12-2017

@Tomomichi Hirano Yes, it does.

aengineer · ‎01-12-2017

@Tomomichi Hirano Would it be possible that Standby namenode is having too much Garbage collections going on ? You might want to look for GC's and if you see a GC happening each time your check point is happening, then it might explain why the standby namenode is not able to keep up. If that is the case, then tuning the memory setting of Namenode might be the right approach instead of changing the block report frequency. When we do a check point we have to decode lot of small objects and that can create memory pressure on Standby Namenode, Can you please check if there is a correlation between GC and checkpointing ?

aengineer · ‎01-10-2017

@Tomomichi Hirano Without understanding why you are seeing a "block report storm", it is hard to say if increasing this parameter will help. Typically most clusters -- even very large clusters seem to work fine with this parameter. Would you be able to share -- How many datanodes are in this cluster / how many data blocks it has ? If you have too many blocks, then block reports might slow down the active namenode. I am surprised that you are impacted by performance issues in the standby namenode. Would it be that you have GC issues and you are seeing some kind of alerts from the standby ?

aengineer · ‎01-10-2017

@Tomomichi Hirano No you cannot. Block reports serve an essential function that allows Namenode to reconcile the state of the cluster. That is these block reports tell namenode what blocks are needed and what blocks are to be deleted, if the block is under replicated etc. etc. Full block reports are expensive for Namenode process( there are both incremental and full reports) so it is set to a longer value. However, if you set it to really long values like 1 month, your namenode might not work correctly. Typically the only reason to change this value is if your Namenode is under severe load, so if you are not experiencing such a problem, I would suggest that you don't change this parameter.

Online	Offline
Last Visited	‎11-05-2018 06:26 PM

Member Since	‎10-21-2015 03:44 PM
Last Visited	‎11-05-2018 06:26 PM
Posts	59
Kudos received	31

Cloudera Community

Re: List Of Block Placement Algorithm For HDFS

Re: Ideally what should be the block size to get m...

Re: What is Small file problem in HDFS ?

Re: replace faulty disk on the worker machine

Re: HDFS directory structure for a data lake

Re: replace faulty disk on the worker machine

Re: "Pending Deletion Blocks:[276861]" What does t...

Re: replace faulty disk on the worker machine

Re: HDFS directory structure for a data lake

Re: hadoop namenode -format

Re: Enable HDFS nameservice without HDFS HA

Re: Can I safely set dfs.blockreport.intervalMsec ...

Re: Can I safely set dfs.blockreport.intervalMsec ...

Re: Can I safely set dfs.blockreport.intervalMsec ...

Re: Can I safely set dfs.blockreport.intervalMsec ...