About ArpitAgarwal

ArpitAgarwal · ‎08-17-2016

Hi @William Bolton, are these applications accessing HDFS directly? What's the mode of access e.g. WebHDFS REST API, Java APIs or something else?

ArpitAgarwal · ‎08-17-2016

Hi @jovan karamacoski, are you able to share what your overall goal is? The NameNode detects DataNode failures in ~10 minutes and queues re-replication work. Disk failures can take longer and we are planning to make improvements in this area soon. The re-replication logic is complex. If you think your changes will be broadly useful please consider filing a bug in Apache HDFS Jira and submitting the changes as a patch. Best, Arpit.

ArpitAgarwal · ‎08-16-2016

Commenting to clarify that some of the advice above is not wrong but it can be dangerous. Starting with HDP 2.2 and later, the DataNode is more strict about where it expects block files to be. I do not recommend moving block files or folders on DataNodes around manually, unless you really know what you are doing. @jovan karamacoski, to answer your original question - the NameNode drives the re-replication (specifically the BlockManager class within the NameNode). The ReplicationMonitor thread wakes up periodically and computes re-replication work for DataNodes. The re-replication logic has multiple triggers like block reports, heartbeat timeouts, decommission etc.

ArpitAgarwal · ‎08-08-2016

Thanks for the heads up @Kuldeep Kulkarni. It could be a couple of things: The ephemeral port was in use by another process that is now gone. There is a process using the port but it is running with different user credentials. @vijay kadel were you running the ps/netstat/lsof commands as the root user?

ArpitAgarwal · ‎08-04-2016

The question is unclear to me but I recommend reading the following three blog posts carefully as they go into great detail about balancer basics, configuration and best practices: https://community.hortonworks.com/articles/43615/hdfs-balancer-1-100x-performance-improvement.html https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html https://community.hortonworks.com/articles/44148/hdfs-balancer-3-cluster-balancing-algorithm.html

ArpitAgarwal · ‎08-04-2016

Hi @ripunjay godhani, we no longer recommend setting up NameNode HA with NFS. Instead please use the Quorum Journal Manager setup. The Apache HA with QJM documentation is a good start: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html NameNode image files will be stored on two nodes (active and standby NN) in this setup. The latest edit logs will be on the active NameNode and at least two journal nodes (usually all three, unless one Journal Node has an extended downtime). The NameNodes can optionally be configured to write their edit logs to separate NFS shares if you really want but it is not necessary. You don't need RAID 10. HDFS HA with QJM provides good durability and availability with commodity hardware.

ArpitAgarwal · ‎08-04-2016

@sgowda, thanks for confirming you just want to mount volumes at a new location. If you are just remounting then your existing HDFS metadata and data files will be present but under new Linux paths. In that case decommissioning is not necessary. You just need to to update NameNode and DataNode configuration settings like dfs.namenode.name.dir, dfs.datanode.data.dir to point to the new locations. See this link for a full list of settings, not all may apply to you. Don't reformat the NN else you will lose all your data. The simplest approach is: Take a full cluster downtime and bring down all HDFS services. Remount volumes at the new location on all affected nodes. Update NN and DN configurations via Ambari to point to the new storage roots. Restart services. If you are not familiar with these settings I recommend learning more about HDFS first since its easy to lose data via administrative mistakes.

ArpitAgarwal · ‎08-03-2016

Are you just mounting volumes at a new location?

ArpitAgarwal · ‎08-01-2016

Hi @Facundo Bianco, you are using a privileged port number (1004) for data transfer so you cannot enable SASL. Please check your hdfs-site.xml to ensure SASL is not enabled via dfs.data.transfer.protection. The Secure DataNode section from the Apache HDFS documentation describes this. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataNode Since you are using HDP with Ambari, I recommend using the Ambari Kerberos Wizard especially if you are setting it up for the first time. At the very least it will provide you with a working reference configuration. The Ambari Kerberos Wizard is documented here: https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Ambari_Security_Guide/content/_running_the_kerberos_wizard.html

ArpitAgarwal · ‎08-01-2016

If you have setup automatic failover with ZooKeeper Failover Controllers then the ZKFC processes will automatically transition the Standby NN to Active status if the current active is unresponsive. The decision about which NN should be made active is taken by the ZKFC instances (coordinating via ZooKeeper). Ambari does not decide which NN should be active. If you wish to perform a manual failover then you can use the hdfs dfsadmin command as @Sagar Shimpi suggested. Both alternatives are described in the HDP documentation: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/nn-ha-auto-failover.html If you want to better understand the internals of automatic NN failover (recommended if you are administering a Hadoop cluster with HA), I recommend reading the Apache docs, specifically the section on Automatic Failover. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Online	Offline
Last Visited	‎11-03-2023 01:06 PM

Member Since	‎07-30-2019 10:45 AM
Last Visited	‎11-03-2023 01:06 PM
Posts	111
Kudos received	185

Cloudera Community

Re: What is active and passive NameNode in Hadoop?

Re: NameNode heapsize is bigger then it should be.

Re: Delete old BP-* DataNode directories by hand?

Re: NameNode edit logs - purging/Best practises

Re: Hadoop 3.0 in a Virtual Box for beginners

Re: Figuring out the active name node of a remote ...

Re: What is the procedure for re-replication of lo...

Re: What is the procedure for re-replication of lo...

Re: Data node down java.net.BindException: Addres...

Re: Even when i ran balancer, load one data node i...

Re: Planning hardware for NameNode/Active/Secondar...

Re: how do I change namenode and datanode dir for ...

Re: how do I change namenode and datanode dir for ...

Re: Cannot start secure DataNode

Re: How to do a cluster failover from active namen...