Member since
07-30-2019
111
Posts
186
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3237 | 02-07-2018 07:12 PM | |
2447 | 10-27-2017 06:16 PM | |
2718 | 10-13-2017 10:30 PM | |
4984 | 10-12-2017 10:09 PM | |
1261 | 06-29-2017 10:19 PM |
06-17-2016
06:17 PM
Hi @Greenhorn Techie, yes I agree the ideal placement policy would factor in available space and IO load. However there is no implementation that currently does that. The property "dfs.datanode.fsdataset.volume.choosing.policy is defined in hdfs-default.xml: <property>
<name>dfs.datanode.fsdataset.volume.choosing.policy</name>
<value></value>
<description>
The class name of the policy for choosing volumes in the list of
directories. Defaults to
org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.
If you would like to take into account available disk space, set the
value to
"org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
</description>
</property>
... View more
06-17-2016
03:58 AM
4 Kudos
Hi Artem, we do not recommend using AvailableSpaceVolumeChoosingPolicy. It can cause a subset of disk drives to become a bottleneck for writes. See HDFS-8538 for some more discussion on this. A new HDFS tool called the DiskBalancer is under active development (HDFS-1312). It will allow administrators to recover from skewed distribution caused by replacing failed disks or just adding new disks.
... View more
06-09-2016
06:31 PM
2 Kudos
Good writeup @Mingliang Liu. In addition to what @Chris Nauroth said, I also add -Dmaven.site.skip=true. mvn clean package -Pdist,native -Dtar -DskipTests=true -Dmaven.site.skip=true -Dmaven.javadoc.skip=true
... View more
05-03-2016
01:27 AM
I ran into the same issue. The Ambari server logs at `/var/log/ambari-server/ambari-server.log` showed: Failed to execute kadmin:
Command: /usr/bin/kadmin -s c6401.ambari.apache.org -p kadmin/admin@EXAMPLE.COM -w ******** -r EXAMPLE.COM -q "get_principal kadmin/admin@EXAMPLE.COM"
ExitCode: 1
STDOUT: Authenticating as principal kadmin/admin@EXAMPLE.COM with password.
STDERR: kadmin: Communication failure with server while initializing kadmin interface Sure enough I had forgotten to start the kadmin service on the KDC. After running `/etc/init.d/kadmin start` the error went away. HTH.
... View more
02-03-2016
11:28 PM
3 Kudos
Hi @Avinash C, the HDFS Architecture guide has a good description of the write pipeline (section 8.3.1.).
... View more
02-03-2016
11:15 PM
2 Kudos
Hi @S Roy, using hdfs mounted as nfs would be a bad idea. An HDFS service writing its own logs to HDFS could deadlock on itself. As @Neeraj Sabharwal suggested, a local disk is best to make sure the logging store does not become a performance bottleneck. You can change the log4j settings to limit the size and number of the log files thus capping total space used by log files. Also you can write a separate daemon to periodically copy log files to HDFS for long term archival.
... View more
01-29-2016
08:29 PM
2 Kudos
Hi @AR, the '/logs' servlet is admin-only. There is no way to expose it to non-privileged users. HDFS administrators are configured via dfs.cluster.administrators, although you obviously don't want to add arbitrary users to this list just to get logs servlet access.
... View more
01-28-2016
09:26 PM
Hi @luc tiber, the HDFS metadata layout on NameNodes and DataNodes is quite different. If you are using HDP I recommend doing an Ambari-based install.
... View more
01-27-2016
07:54 PM
5 Kudos
Hi @luc tiber, the BlockpoolID should be the same across the DataNodes and the NameNodes (the configuration is a little different for federated clusters so let's ignore federation for now). The fact that the BPID is different indicates that the NameNode was either re-formatted or belongs to a different cluster, so just changing the BPID will not help as the NameNode metadata may have been lost. If this a dev/test cluster and you can afford to lose the data, I recommend redeploying the cluster from scratch as the quickest way to move forward. Else recovery will be more complicated.
... View more
01-20-2016
11:53 PM
1 Kudo
That sounds wrong. The CAP theorem is an assertion about tradeoffs in all distributed systems and is equally applicable to HDFS. We do make tradeoffs within HDFS to prioritize consistency.
... View more
- « Previous
- Next »