About ArpitAgarwal

ArpitAgarwal · ‎06-17-2016

Hi @Greenhorn Techie, yes I agree the ideal placement policy would factor in available space and IO load. However there is no implementation that currently does that. The property "dfs.datanode.fsdataset.volume.choosing.policy is defined in hdfs-default.xml: <property> <name>dfs.datanode.fsdataset.volume.choosing.policy</name> <value></value> <description> The class name of the policy for choosing volumes in the list of directories. Defaults to org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy. If you would like to take into account available disk space, set the value to "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy". </description> </property>

ArpitAgarwal · ‎06-17-2016

Hi Artem, we do not recommend using AvailableSpaceVolumeChoosingPolicy. It can cause a subset of disk drives to become a bottleneck for writes. See HDFS-8538 for some more discussion on this. A new HDFS tool called the DiskBalancer is under active development (HDFS-1312). It will allow administrators to recover from skewed distribution caused by replacing failed disks or just adding new disks.

ArpitAgarwal · ‎06-09-2016

Good writeup @Mingliang Liu. In addition to what @Chris Nauroth said, I also add -Dmaven.site.skip=true. mvn clean package -Pdist,native -Dtar -DskipTests=true -Dmaven.site.skip=true -Dmaven.javadoc.skip=true

ArpitAgarwal · ‎05-03-2016

I ran into the same issue. The Ambari server logs at `/var/log/ambari-server/ambari-server.log` showed: Failed to execute kadmin: Command: /usr/bin/kadmin -s c6401.ambari.apache.org -p kadmin/admin@EXAMPLE.COM -w ******** -r EXAMPLE.COM -q "get_principal kadmin/admin@EXAMPLE.COM" ExitCode: 1 STDOUT: Authenticating as principal kadmin/admin@EXAMPLE.COM with password. STDERR: kadmin: Communication failure with server while initializing kadmin interface Sure enough I had forgotten to start the kadmin service on the KDC. After running `/etc/init.d/kadmin start` the error went away. HTH.

ArpitAgarwal · ‎02-03-2016

Hi @Avinash C, the HDFS Architecture guide has a good description of the write pipeline (section 8.3.1.).

ArpitAgarwal · ‎02-03-2016

Hi @S Roy, using hdfs mounted as nfs would be a bad idea. An HDFS service writing its own logs to HDFS could deadlock on itself. As @Neeraj Sabharwal suggested, a local disk is best to make sure the logging store does not become a performance bottleneck. You can change the log4j settings to limit the size and number of the log files thus capping total space used by log files. Also you can write a separate daemon to periodically copy log files to HDFS for long term archival.

ArpitAgarwal · ‎01-29-2016

Hi @AR, the '/logs' servlet is admin-only. There is no way to expose it to non-privileged users. HDFS administrators are configured via dfs.cluster.administrators, although you obviously don't want to add arbitrary users to this list just to get logs servlet access.

ArpitAgarwal · ‎01-28-2016

Hi @luc tiber, the HDFS metadata layout on NameNodes and DataNodes is quite different. If you are using HDP I recommend doing an Ambari-based install.

ArpitAgarwal · ‎01-27-2016

Hi @luc tiber, the BlockpoolID should be the same across the DataNodes and the NameNodes (the configuration is a little different for federated clusters so let's ignore federation for now). The fact that the BPID is different indicates that the NameNode was either re-formatted or belongs to a different cluster, so just changing the BPID will not help as the NameNode metadata may have been lost. If this a dev/test cluster and you can afford to lose the data, I recommend redeploying the cluster from scratch as the quickest way to move forward. Else recovery will be more complicated.

ArpitAgarwal · ‎01-20-2016

That sounds wrong. The CAP theorem is an assertion about tradeoffs in all distributed systems and is equally applicable to HDFS. We do make tradeoffs within HDFS to prioritize consistency.

Online	Offline
Last Visited	‎11-03-2023 01:06 PM

Member Since	‎07-30-2019 10:45 AM
Last Visited	‎11-03-2023 01:06 PM
Posts	111
Kudos received	185

Cloudera Community

Re: What is active and passive NameNode in Hadoop?

Re: NameNode heapsize is bigger then it should be.

Re: Delete old BP-* DataNode directories by hand?

Re: NameNode edit logs - purging/Best practises

Re: Hadoop 3.0 in a Virtual Box for beginners

Re: General guidelines and best practices for tuni...

Re: General guidelines and best practices for tuni...

Re: Setting Hadoop development environment on Mac ...

Re: I tried to install kerberos in the hadoop clus...

Re: During HDFS write mechanism - acknowledgement ...

Re: Enable logging of hadoop logs to hdfs mounted ...

Re: Namenode logs access control

Re: blockpoolID in VERSION Files ... similar or di...

Re: blockpoolID in VERSION Files ... similar or di...

Re: Amount of data storage : HDFS vs NoSQL