Member since
03-22-2017
63
Posts
18
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3347 | 07-08-2023 03:09 AM | |
6052 | 10-21-2021 12:49 AM | |
2848 | 04-01-2021 05:31 AM | |
3772 | 03-30-2021 04:23 AM | |
6573 | 03-23-2021 04:30 AM |
03-01-2021
08:28 PM
Thanks @PabitraDas Alright, done as your suggestion, the cluster is looking good now without the alert Thanks for the link too.
... View more
02-01-2021
10:35 AM
Hello @vvk Please note, while adding/removing the journal nodes from the running cluster, you need to ensure a quorum of journal nodes available for NameNodes. (As cited in the shared document--> NameNode high availability requires that you must maintain at least three, active JournalNodes in your cluster.) It means NameNode ensures at least a quorum of Journal Nodes (2 of 3 journal nodes) available for edits log write at any given point Failing to write edits into a quorum of journal nodes, NameNode is expected to crash (shutdown itself). I believe this could be the scenario in your case. So you need to add new journal nodes first to the cluster before removing the old Journal nodes one by one ensuring a quorum of journal nodes available in the cluster. If you see NN crashed even after edits log write was successful on a quorum of JNs, then we need to check the NN log for any other issues. Thank you
... View more
12-08-2020
06:26 AM
Hello @tuk If the Post by Pabitra assisted you, Kindly mark the Post as Solution. If you utilised any other approach, Kindly share the details in the post as well. Thanks, Smarak
... View more
11-11-2020
02:06 AM
Hello @Amn_468 Since you reported the DN Pause time, I spoke/referred about DN heap only. The block counts on most of the DN seems >6Millions, hence would suggest to increase the DN heap to 8GB (from current value of 6GB) and perorm a rolling restart to bring the new heap size into effect. There is no straight forward way to say you hit the small file problem but if your average block size is few MB or less than a MB in size, it is an indication that you are storing/accumulating small files in HDFS. Simplest way to determine small files in cluster is to run fsck. Fsck should show the average block size. If it's too low a value (eg ~ 1MB ), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] You may refer belwo links for your help on dealing with small files. - https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/ - https://community.cloudera.com/t5/Community-Articles/Identify-where-most-of-the-small-file-are-located-in-a-large/ta-p/247253
... View more
11-09-2020
10:20 PM
Thanks, I'm able to access the Hadoop CLI after commenting out the line.
... View more
11-09-2020
09:42 AM
Hello @Masood, I believe you are asking the commands to run in order to determine the active NN apart from CM UI ( CM > HDFS > Instance > NameNode) From CLI you have to run couple of commands to detemrine the Active/Standby NN List the namenode hostnames # hdfs getconf -namenodes c2301-node2.coelab.cloudera.com c2301-node3.coelab.cloudera.com Get nameservice name # hdfs getconf -confKey dfs.nameservices nameservice1 Get active and standby namenodes # hdfs getconf -confKey dfs.ha.namenodes.nameservice1 namenode11,namenode20 # su - hdfs $ hdfs haadmin -getServiceState namenode11 active $ hdfs haadmin -getServiceState namenode20 standby Get active and standby namenode hostnames $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode11 c2301-node2.coelab.cloudera.com:8020 $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode20 c2301-node3.coelab.cloudera.com:8020 If you want to get the active namenode hostname from hdfs-site.xml file, you can go through following python script in github – https://github.com/grakala/getActiveNN. Thank you
... View more
11-09-2020
09:06 AM
Hello @AlexP Ref: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep Referring to HDFS document, answers to your questions are inline. [Q1.] How to estimate how much time would this command take for a single directory (without -w)? [A1.] It depends upon the numbr of files in the directory. If you are running setrep against a path which is a directory, then the command recursively changes the replication factor of all files under the directory tree rooted at path. The time varies dependsing on the file count under the path/directory. [Q2.] Will it trigger a replication job even if I don't use the '-w' flag? [A2.] Yes, replication will trigger without -w flag. However, it is good practice to use -w to ensure all files are having required replication factor set prior to command exits. Please note, the -w flag requests that the command wait for the replication to complete. Though use of -w potentially takes a long time to complete the command but it gurantees the replication factor changed to the specified value. [Q3.] If yes, does it mean that the NameNode will actually start deleting 'over-replicated' blocks of all existing files under a particular directory? [A3.] Yes, your understanding is correct. The additonal 1 replica of the block will mark the block as over-replicated and same will be deleted from cluster. This action will be performed for each files under the directory path keeping only 2 replicas of the file blocks. Hope this helps.
... View more
10-10-2020
11:35 AM
1 Kudo
@mike_bronson7 Always stick to the Cloudera documentation. Yes !!! there is no risk in running that command I can understand your reservation.
... View more
09-30-2020
10:14 PM
Hello @vincentD Please review the stdout and stderr of the DN which going down frequently. You can navigate to CM > HDFS > Instance > Select the DN which went down > Processes > click on stdout/stderr atthe bottom of the page. I am asking to verify stdout/stderr suspecting an OOM error (due to java heap running out of memory) leading to the DN exit/shutdown abruptly. If the DN exit is due to OOM Error, please increase the DN heap size to adequate value to get rid off teh issue further. DN heap sizing rule of thumb says: 1 GB heap memory for 1Million blocks. You can verify your block counts on each DN by navigating to CM > HDFS > NN Web UI > Active NN > DataNode and you can see the DN stats on that page showing block counts and disk usage etc..
... View more
09-30-2020
09:02 AM
Thank you for verifying!
... View more
- « Previous
- Next »