About PabitraDas

muslihuddin · ‎03-01-2021

Thanks @PabitraDas Alright, done as your suggestion, the cluster is looking good now without the alert Thanks for the link too.

PabitraDas · ‎02-01-2021

Hello @vvk Please note, while adding/removing the journal nodes from the running cluster, you need to ensure a quorum of journal nodes available for NameNodes. (As cited in the shared document--> NameNode high availability requires that you must maintain at least three, active JournalNodes in your cluster.) It means NameNode ensures at least a quorum of Journal Nodes (2 of 3 journal nodes) available for edits log write at any given point Failing to write edits into a quorum of journal nodes, NameNode is expected to crash (shutdown itself). I believe this could be the scenario in your case. So you need to add new journal nodes first to the cluster before removing the old Journal nodes one by one ensuring a quorum of journal nodes available in the cluster. If you see NN crashed even after edits log write was successful on a quorum of JNs, then we need to check the NN log for any other issues. Thank you

smdas · ‎12-08-2020

Hello @tuk If the Post by Pabitra assisted you, Kindly mark the Post as Solution. If you utilised any other approach, Kindly share the details in the post as well. Thanks, Smarak

PabitraDas · ‎11-11-2020

Hello @Amn_468 Since you reported the DN Pause time, I spoke/referred about DN heap only. The block counts on most of the DN seems >6Millions, hence would suggest to increase the DN heap to 8GB (from current value of 6GB) and perorm a rolling restart to bring the new heap size into effect. There is no straight forward way to say you hit the small file problem but if your average block size is few MB or less than a MB in size, it is an indication that you are storing/accumulating small files in HDFS. Simplest way to determine small files in cluster is to run fsck. Fsck should show the average block size. If it's too low a value (eg ~ 1MB ), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] You may refer belwo links for your help on dealing with small files. - https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/ - https://community.cloudera.com/t5/Community-Articles/Identify-where-most-of-the-small-file-are-located-in-a-large/ta-p/247253

sace17 · ‎11-09-2020

Thanks, I'm able to access the Hadoop CLI after commenting out the line.

PabitraDas · ‎11-09-2020

Hello @Masood, I believe you are asking the commands to run in order to determine the active NN apart from CM UI ( CM > HDFS > Instance > NameNode) From CLI you have to run couple of commands to detemrine the Active/Standby NN List the namenode hostnames # hdfs getconf -namenodes c2301-node2.coelab.cloudera.com c2301-node3.coelab.cloudera.com Get nameservice name # hdfs getconf -confKey dfs.nameservices nameservice1 Get active and standby namenodes # hdfs getconf -confKey dfs.ha.namenodes.nameservice1 namenode11,namenode20 # su - hdfs $ hdfs haadmin -getServiceState namenode11 active $ hdfs haadmin -getServiceState namenode20 standby Get active and standby namenode hostnames $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode11 c2301-node2.coelab.cloudera.com:8020 $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode20 c2301-node3.coelab.cloudera.com:8020 If you want to get the active namenode hostname from hdfs-site.xml file, you can go through following python script in github – https://github.com/grakala/getActiveNN. Thank you

PabitraDas · ‎11-09-2020

Hello @AlexP Ref: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep Referring to HDFS document, answers to your questions are inline. [Q1.] How to estimate how much time would this command take for a single directory (without -w)? [A1.] It depends upon the numbr of files in the directory. If you are running setrep against a path which is a directory, then the command recursively changes the replication factor of all files under the directory tree rooted at path. The time varies dependsing on the file count under the path/directory. [Q2.] Will it trigger a replication job even if I don't use the '-w' flag? [A2.] Yes, replication will trigger without -w flag. However, it is good practice to use -w to ensure all files are having required replication factor set prior to command exits. Please note, the -w flag requests that the command wait for the replication to complete. Though use of -w potentially takes a long time to complete the command but it gurantees the replication factor changed to the specified value. [Q3.] If yes, does it mean that the NameNode will actually start deleting 'over-replicated' blocks of all existing files under a particular directory? [A3.] Yes, your understanding is correct. The additonal 1 replica of the block will mark the block as over-replicated and same will be deleted from cluster. This action will be performed for each files under the directory path keeping only 2 replicas of the file blocks. Hope this helps.

Shelton · ‎10-10-2020

@mike_bronson7 Always stick to the Cloudera documentation. Yes !!! there is no risk in running that command I can understand your reservation.

PabitraDas · ‎09-30-2020

Hello @vincentD Please review the stdout and stderr of the DN which going down frequently. You can navigate to CM > HDFS > Instance > Select the DN which went down > Processes > click on stdout/stderr atthe bottom of the page. I am asking to verify stdout/stderr suspecting an OOM error (due to java heap running out of memory) leading to the DN exit/shutdown abruptly. If the DN exit is due to OOM Error, please increase the DN heap size to adequate value to get rid off teh issue further. DN heap sizing rule of thumb says: 1 GB heap memory for 1Million blocks. You can verify your block counts on each DN by navigating to CM > HDFS > NN Web UI > Active NN > DataNode and you can see the DN stats on that page showing block counts and disk usage etc..

JB0000012345 · ‎09-30-2020

Thank you for verifying!

Online	Offline
Last Visited	‎12-08-2025 04:12 AM

Member Since	‎03-22-2017 02:53 AM
Last Visited	‎12-08-2025 04:12 AM
Posts	63
Kudos received	18

Cloudera Community

Re: TLSv1.3 Support for Zookeeper 3.8.0

Re: All Hdfs file names older than N days

Re: Reduce Non-HDFS Space

Re: HDFS Reports

Re: Cloudera management service any service not ru...

Re: JournalNode Default Group

Re: JournalNode brings down all services

Re: How to get hbck2 tool for CDH 6.3.2?

Re: Data Node Pause Duration

Re: Unable to access Hadoop CLI after enabling Ker...

Re: How to See which NameNode is Active?

Re: Changing HDFS replication factor on existing f...

Re: namenode HA and hdfs zkfc -formatZK -force

Re: DataNode daemon restarted frequently

Re: Can we still configure in cdh 6.1 hbase to wor...