About sathishkr

Shelton · ‎12-17-2024

@JSSSS The error is this "java.io.IOException: File /user/JS/input/DIC.txt._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation." All the 3 datanode according to the log are excludeNodes=[192.168.1.81:9866, 192.168.1.125:9866, 192.168.1.8> with replication factor of 3 , writes should succeed to all the 3 datanodes else the write fails. The cluster may have under-replicated or unavailable blocks due to excluded nodes HDFS cannot use these nodes, possibly due to: Disk space issues. Write errors or disk failures. Network connectivity problems between the NameNode and DataNodes. 1. Verify if the DataNodes are live and connected to the NameNode hdfs dfsadmin -report Look for the "Live nodes" and "Dead nodes" section If all 3 DataNodes are excluded, they might show up as dead or decommissioned. Ensure the DataNodes have sufficient disk space for the write operation df -h Look at the HDFS data directories (/hadoop/hdfs/data) If disk space is full, clear unnecessary files or increase disk capacity hdfs dfs -rm -r /path/to/old/unused/files View the list of excluded nodes cat $HADOOP_HOME/etc/hadoop/datanodes.exclude If nodes are wrongly excluded: Remove their entries from datanodes.exclude. Refresh the NameNode to apply changes hdfs dfsadmin -refreshNodes Block Placement Policy: If the cluster has DataNodes with specific restrictions (e.g., rack awareness), verify the block placement policy grep dfs.block.replicator.classname $HADOOP_HOME/etc/hadoop/hdfs-site.xml Default: org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault Happy hadooping

Shelton · ‎12-16-2024

@divyank Have you resolved this issue if not the issue you're encountering is common when Kerberos is enabled for HDFS, as it introduces authentication requirements that need to be properly configured. Here’s how to diagnose and resolve the problem: 1. Root Cause Analysis When Kerberos is enabled: Authentication: Every interaction with HDFS now requires a Kerberos ticket. Misconfiguration: The HDFS service or client-side configurations may not be aligned with Kerberos requirements. Keytabs: Missing or improperly configured keytab files for the HDFS service or users accessing the service. Browser Access: The HDFS Web UI may not support unauthenticated access unless explicitly configured. 2. Steps to Resolve Step 1: Verify Kerberos Configuration Check the Kerberos principal and keytab file paths for HDFS in Cloudera Manager: Navigate to HDFS Service > Configuration. Look for settings like: hadoop.security.authentication → Should be set to Kerberos. dfs.namenode.kerberos.principal → Should match the principal defined in the KDC. dfs.namenode.keytab.file → Ensure the file exists on the NameNode and has correct permissions. Step 2: Validate Kerberos Ticket Check if the HDFS service has a valid Kerberos ticket: klist -kte /path/to/hdfs.keytab If missing, reinitialize the ticket: kinit -kt /path/to/hdfs.keytab hdfs/<hostname>@<REALM> Test HDFS access from the command line: hdfs dfs -ls / If you get authentication errors, the Kerberos ticket might be invalid. Step 3: Validate HDFS Web UI Access Post-Kerberos, accessing the HDFS Web UI (e.g., http://namenode-host:50070) often requires authentication. By default: Unauthenticated Access: May be blocked. Browser Integration: Ensure your browser is configured for Kerberos authentication or the UI is set to allow unauthenticated users. Enable unauthenticated access in Cloudera Manager (if needed): Go to HDFS Service > Configuration. Search for hadoop.http.authentication.type and set it to simple. Step 4: Review Logs for Errors Check NameNode logs for Kerberos-related errors: less /var/log/hadoop/hdfs/hadoop-hdfs-namenode.log Look for errors like: "GSSException: No valid credentials provided" "Principal not found in the keytab" Step 5: Synchronize Clocks Kerberos is sensitive to time discrepancies. Ensure all nodes in the cluster have synchronized clocks ntpdate <NTP-server> Step 6: Restart Services Restart the affected HDFS services via Cloudera Manager after making changes: Restart NameNode, DataNode, and HDFS services. Test the status of HDFS hdfs dfsadmin -report 3. Confirm Resolution Verify HDFS functionality: Test browsing HDFS via the CLI: hdfs dfs -ls / Access the Web UI to confirm functionality: http://<namenode-host>:50070 If HDFS is working via CLI but not in the Web UI, revisit the Web UI settings in Cloudera Manager to allow browser access or configure browser Kerberos support. 4. Troubleshooting Tips If the issue persists: Check the Kerberos ticket validity with: klist Use the following commands to troubleshoot connectivity: hdfs dfs -mkdir /test hdfs dfs -put <local-file> /test Let me know how it goes or if further guidance is needed!

sathishkr · ‎12-12-2024

@irshan When you add balancer as a role in the HDFS cluster, it indeed will show as not started. So its an expected one. Coming to your main query, it could be possible that when you run the balancer, the balancer threshold could be with in the default percentage of 10, so it won't move the blocks. You may have to reduce the balance threshold and try again.

sathishkr · ‎12-12-2024

@Remme Though the procedure you followed might have helped you, with a larger cluster with TBs of Data, this is not a viable option. In that case, would advise working with Cloudera Support.

sathishkr · ‎12-12-2024

@cc_yang It could be possible you may have enabled HDFS space quota to the directory and the directory may have reached to it hard limit, causing the file upload throws insufficient space message. Refer more about HDFS quota as below. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html

sathishkr · ‎12-12-2024

Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend.

sathishkr · ‎12-12-2024

@darshanhira , There is not much changes to the NFS gateway end at the CDP 7.1.8, the issue you might be facing due to the underlying Linux issue. Please check if there is any stale nfs process that is blocking the NFS Gateway startup. Also please check if by chance any other process holding the port 2049, if so this may also cause the NFS gateway service startup. Also, please refer to our documentation as well. https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/scaling-namespaces/topics/hdfs-using-the-nfs-gateway-for-accessing-hdfs.html

Online	Online
Last Visited	‎12-18-2024 05:29 AM

Member Since	‎09-03-2020 03:19 AM
Last Visited	‎12-18-2024 05:29 AM
Posts	118
Kudos received	5

Cloudera Community

Re: Urgen But I cannot seem up Uplaod a file into ...

Re: HDFS GUI is giving error

Re: hdfs balancer is not working

Re: Is it possible to restore hdfs data based on b...

Re: HDFS storage space

Re: Fix Under-replicated blocks in HDFS manually

Re: Unable to Start HDFS NFS Gateway After Upgradi...