About Majeti

Majeti · ‎06-06-2024

Hi @rizalt , I am not sure if you are hitting this known issue https://docs.cloudera.com/runtime/7.1.2/release-notes/topics/rt-known-issues-ambari.html . You can try the workaround mentioned here for now.

Majeti · ‎06-05-2024

Hi @rizalt , You want to verify if the principal exists in the KDC admin database ? kadmin: listprincs hdfs*

Majeti · ‎06-04-2024

Hi @rizalt , Have you tried logging in with "kinit admin/admin@HADOOP.COM" from one of your cluster nodes or ambari server to see if krb5.conf is fine and can find this user/principal in the KDC server with the given password?

Majeti · ‎05-24-2024

Hi @hiralal , Another link https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/hadoop_tokens.html if you would like to check out.

Majeti · ‎05-24-2024

Hi @NaveenBlaze , You can get more info from https://github.com/c9n/hadoop/blob/master/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java#L196 . Notice these two lines in this method doTailEdits FSImage image = namesystem.getFSImage(); streams = editLog.selectInputStreams(lastTxnId + 1, 0, null, false); editsLoaded = image.loadEdits(streams, namesystem);

Majeti · ‎05-14-2024

Hi @NaveenBlaze, I am trying to explain in a summarized view but not sure if this helps you otherwise let me know what details u want around checkpointing. 1. During startup, the Standby NameNode loads the latest filesystem image from the disk into its memory. It also retrieves and applies any remaining edit log files from the Journal nodes to update its namespace. 2. This merging process happens in memory each time, ensuring that the namespace stays up-to-date with the latest changes. 3. After startup, the Standby NameNode regularly checks for new edit log files every dfs.ha.tail-edits.period (default: 60 seconds). It streams any new edits directly from the Journal nodes into memory to keep the namespace updated. 4. Additionally, the Standby NameNode checks every dfs.namenode.checkpoint.check.period (default: 60 seconds) to see if a certain number of un-checkpointed transactions have been reached (default: 1,000,000). 5. If the number of un-checkpointed transactions hasn't reached the threshold within dfs.namenode.checkpoint.period (default: 3600 seconds or 6 hours), the Standby NameNode performs a mandatory checkpoint by saving all accumulated namespace changes from memory to disk (saveNamespace). 6. After the checkpoint, the Standby NameNode requests the Active NameNode to fetch the newly built filesystem image. The Active NameNode streams it and saves it to disk for future restarts. 7. It's important to note that the edit logs stored in the Journal nodes serve as the primary source of truth during startup for both the Active and Standby NameNodes.

Majeti · ‎05-14-2024

Hi @hiralal, Your Java code seems correct and I verified it's working fine in my lab. Before running this code, I hope you have already tested using kinit cmd. Attach /etc/krb5.conf. Also try to run "java -Dsun.security.krb5.debug=true -cp `hadoop classpath`:. HdfsExample" to get more info on why it's failing. It's failing to get TGT.

Majeti · ‎04-03-2024

Hi @s198, You do not need to have hadoop file system or datanode role on the remote server. You just need to set up some hdfs gateway on the remote server and pull it using distcp. If you are using HDP or CDP, you can add the remote server as a gateway and perform distcp in the remote server. Another option is to share one of the directories in the remote server, mount it in hadoop cluster node, and perform distcp to that mounted directory.

Majeti · ‎03-22-2024

Introduction: In large Hadoop clusters, efficiently managing block replication and decommissioning of DataNodes is crucial for maintaining system performance and reliability. However, updating Namenode configuration parameters to optimize these processes often requires a Namenode restart, causing downtime and potential disruptions to cluster operations. In this article, we'll explore a procedure to expedite block replication and DataNode decommissioning in HDFS without the need for a Namenode restart. Procedure: Identify Namenode Process Directory: Locate the Namenode process directory for the current active Namenode. This directory typically resides in /var/run/cloudera-scm-agent/process/ followed by a folder that looks like "###-hdfs-NAMENODE" Modify Configuration Parameters: Edit the hdfs-site.xml file in the Namenode process directory. Adjust the following parameters to the recommended values: dfs.namenode.replication.max-streams: Increase to a recommended value (e.g., 100). dfs.namenode.replication.max-streams-hard-limit: Increase to a recommended value (e.g., 200). dfs.namenode.replication.work.multiplier.per.iteration: Increase to a recommended value (e.g., 100). Apply Configuration Changes: Execute the below command to initiate the reconfiguration process #hdfs dfsadmin -reconfig namenode <namenode_address> start <namenode_address> can be found from the value of "dfs.namenode.rpc-address" from hdfs-site.xml. Verify Configuration Changes: Monitor the reconfiguration status using the command #hdfs dfsadmin -reconfig namenode <namenode_address> status Upon completion, verify that the configuration changes have been successfully applied. It would look like something as shown below: #hdfs dfsadmin -reconfig namenode namenode_hostname:8020 status Reconfiguring status for node [namenode_hostname:8020]: started at Fri Mar 22 08:15:12 UTC 2024 and finished at Fri Mar 22 08:15:12 UTC 2024. SUCCESS: Changed property dfs.namenode.replication.max-streams-hard-limit From: "40" To: "200" SUCCESS: Changed property dfs.namenode.replication.work.multiplier.per.iteration From: "10" To: "100" SUCCESS: Changed property dfs.namenode.replication.max-streams From: "20" To: "100" Revert Configuration Changes (Optional): If needed, revert to the original configuration values by repeating the above steps with the original parameter values. Conclusion: By following the outlined procedure, administrators can expedite block replication and DataNode decommissioning in HDFS without the need for a Namenode restart. This approach minimizes downtime and ensures efficient cluster management, even in environments where Namenode High Availability is not yet implemented or desired. Note: It's recommended to test configuration changes in a non-production environment before applying them to a live cluster to avoid potential disruptions. Additionally, consult the Hadoop documentation and consider any specific requirements or constraints of your cluster environment before making configuration modifications.

Majeti · ‎01-08-2024

Hi , That's you Standby Namenode (SBNN). Please verify if it's performing checkpoint ing or not. Please perform one checkpoint from Cloudera manager to get the health test clear.

Online	Offline
Last Visited	‎01-12-2025 01:13 AM

Member Since	‎03-06-2019 08:56 PM
Last Visited	‎01-12-2025 01:13 AM
Posts	113
Kudos received	5

Cloudera Community

Re: HDFS cluster in HA enabled, during check point...

Re: org.apache.hadoop.security.AccessControlExcept...

Re: Data recovery in Erasure coding.

Re: hadoop client kerberos error

Re: [KERBEROS] Failed to kinit as the KDC administ...

Re: [KERBEROS] Failed to kinit as the KDC administ...

Re: [KERBEROS] Failed to kinit as the KDC administ...

Re: org.apache.hadoop.security.AccessControlExcept...

Re: HDFS cluster in HA enabled, during check point...

Re: HDFS cluster in HA enabled, during check point...

Re: org.apache.hadoop.security.AccessControlExcept...

Re: Can we use sqoop to Transfer file from a HDFS ...

Accelerating Replication and Decommissioning in HD...

Re: Encountered exception loading fsimage, NameNod...