Member since
09-30-2014
31
Posts
13
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3942 | 10-25-2016 07:02 AM | |
1131 | 10-17-2016 11:34 AM | |
2194 | 01-07-2016 12:46 PM |
07-31-2017
08:43 AM
I'm also seeing this with Ambari 2.5.1.0 and HDP-2.4.3.0.
... View more
06-09-2017
11:01 AM
@Vani This solution works but the side-effect now is that users are allowed to override to which queue their jobs are assigned. Do you agree? Do you in that case know any way around this?
... View more
10-25-2016
07:03 AM
Thanks for your reply Anu. We didn't get around to try your suggestion so I can't accept your answer unfortunately, even though it might be valid.
... View more
10-25-2016
07:02 AM
1 Kudo
We got it to work by lowering the "dfs.datanode.balance.max.concurrent.moves" from 500 to 20, which is more in line with the guide at https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html. It's possible that we could also have gotten it to work by upping the dispatcher threads setting suggested by aengineer below but we didn't try that once we got this to work.
... View more
10-20-2016
07:36 AM
Hello, I'm trying to rebalance hdfs in our HDP 2.4.3 cluster (which is running namenode HA) and I am having a problem that the balancer only does actual work for a short time and then just sits and idles. If I kill the process and restart it, it will do some balancing immediately and then go into idle again. I have repeated this many times now. I enabled debug logging for the balancer but I can't see anything in there that explains why it just stops balancing. Here is the beginning of the log (since it shows some parameters that might be relevant): 16/10/19 16:34:10 INFO balancer.Balancer: namenodes = [hdfs://PROD1]
16/10/19 16:34:10 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
16/10/19 16:34:10 INFO balancer.Balancer: included nodes = []
16/10/19 16:34:10 INFO balancer.Balancer: excluded nodes = []
16/10/19 16:34:10 INFO balancer.Balancer: source nodes = []
16/10/19 16:34:11 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
16/10/19 16:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 16:34:11 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 500 (default=5)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
16/10/19 16:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 16:34:11 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
16/10/19 16:34:11 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
16/10/19 16:34:11 INFO net.NetworkTopology: Adding a new node: /default-rack/X.X.X.X:1019
....
16/10/19 16:34:11 INFO balancer.Balancer: Need to move 11.83 TB to make the cluster balanced.
...
16/10/19 16:34:11 INFO balancer.Balancer: Will move 120 GB in this iteration
16/10/19 16:34:11 INFO balancer.Dispatcher: Start moving blk_1661084121_587506756 with size=72776669 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
...
16/10/19 16:34:12 WARN balancer.Dispatcher: No mover threads available: skip moving blk_1457593679_384005217 with size=104909643 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
...
Here is the part of the log just after the last block has successfully been moved: ...
16/10/19 16:36:00 INFO balancer.Dispatcher: Successfully moved blk_1693419961_619844350 with size=134217728 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:101916/10/19 16:36:00 INFO balancer.Dispatcher: Successfully moved blk_1693366190_619790579 with size=134217728 from X.X.X.X:1019:DISK to X.X.X.X:1019:DISK through X.X.X.X:1019
16/10/19 19:04:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/19 21:34:11 INFO block.BlockTokenSecretManager: Setting block keys
16/10/20 00:04:11 INFO block.BlockTokenSecretManager: Setting block keys
... In the above log sections I'm not showing the debug output since that is pretty verbose and from what I can see the only things mentioned is a periodic reauthentication of the ipc.Client. I'm launching the balancer from command line using the following command: $ hdfs --loglevel DEBUG balancer -D dfs.datanode.balance.bandwidthPerSec=200000000 I have tried other values of the bandwidth setting but it doesn't change the behaviour. Can anyone see if I'm doing something wrong and point me towards a solution? Best Regards /Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
10-17-2016
11:34 AM
I just found that something like this was added somewhat recently: https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/AvailableSpaceBlockPlacementPolicy.java This seems to be what I was looking for.
... View more
10-17-2016
11:25 AM
Hello, I am wondering if there is an BlockPlacementPolicy that in addition to storing replicas safely on different racks as the default one does, also can consider how much disk space that is available on different nodes? In case where you have a cluster that consists of two sets of machines with a big difference in the amount of available disk space, the default policy will lead to the disks of the set with a smaller amount of disk space running out of disk space long before you actually reach your total HDFS capacity. Is there any such policy ready to be used? Best Regards Thomas
... View more
Labels:
- Labels:
-
Apache Hadoop
09-02-2016
01:23 PM
2 Kudos
Hello. I would like to monitor the actual memory usage of the yarn containers in our cluster. We are using defaults such as mapreduce.map.memory.mb=X; mapreduce.reduce.memory.mb=Y; But if I have understood this correctly, these values are only used to determine the maximum limit for processes running inside the containers. Is it possible to get metrics out from yarn about the actual memory usage of the process that ran in a container? It looks like something like this was implemented in https://issues.apache.org/jira/browse/YARN-2984 but I'm not sure how I can access that data. Can you give me any tips regarding this? Best Regards /Thomas Added: I can see what I'm looking for in the nodemanager logs so I guess those logs could be harvested and analyzed. Any other tips Example of nodemanager log: 2016-09-02 13:31:58,563 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 50811 for container-id container_e21_1472110676349_75100_01_006278: 668.7 MB of 2.5 GB physical memory used; 2.9 GB of 5.3 GB virtual memory used
... View more
Labels:
- Labels:
-
Apache YARN
08-11-2016
12:52 PM
Hi @Arpit Agarwal,
That is my understanding as well. Thanks for a short and to the point answer.
... View more
07-04-2016
06:43 AM
Hi Artem. I agree that /tmp is just plain wrong for this. I think Ambari chose these directories for us during cluster installation and we haven't noticed. We will remove /tmp from this configuration.
... View more