Member since
07-05-2016
25
Posts
44
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
824 | 12-01-2017 01:25 AM | |
8188 | 11-27-2017 10:51 PM | |
516 | 03-13-2017 08:45 PM |
07-18-2018
07:27 PM
2 Kudos
It is perfectly fine to mix drive types on a node. Just tag the storage type ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for each directory specified in dfs.datanode.data.dir; see also https://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml (search "dfs.datanode.data.dir")
... View more
07-18-2018
07:23 PM
2 Kudos
For each block, it only caches a single copy of the block. The number of block replicas to cache is the number of replicas from different blocks.
... View more
07-18-2018
07:07 PM
All the file data will be visible if the writer has invoked hflush/hsync successfully; see also https://hadoop.apache.org/docs/r2.8.2/api/org/apache/hadoop/fs/Syncable.html
... View more
07-18-2018
07:01 PM
> ... IllegalArgumentException: Wrong FS: hdfs://HDP05/data/IFRS9/prod, expected: hdfs://HDP39 ... This is a known bug already fixed by https://issues.apache.org/jira/browse/HDFS-11432. Sorry that you were hitting it. What is your hadoop/hdp version?
... View more
07-18-2018
06:43 PM
@rama, thanks for reporting the issue. It is a bug since FileSystem client should show NullPointerException to users. Would you like to file a JIRA to Apache ( https://issues.apache.org/jira/projects/HDFS )? I am happy to help.
... View more
12-01-2017
01:25 AM
2 Kudos
What is the conf dfs.datanode.data.dir set to? I suspect it is set to different directories under the same physical disk.
... View more
11-28-2017
12:25 AM
1 Kudo
For recoverLease using CLI, see https://community.hortonworks.com/questions/146012/force-closing-a-hdfs-file-still-open-because-uncor.html?childToView=146021#answer-146021
... View more
11-27-2017
10:51 PM
2 Kudos
"Sleep and retry" is good way to handle the "not have enough number of replicas" problem. For the "already the current lease holder" problem, you may call DistributedFileSystem.recoverLease(Path) to force lease recovery. Hope it helps.
... View more
11-27-2017
10:40 PM
2 Kudos
It seems printed by DFSInputStream.ByteBufferStrategy.doRead(..). The message means that it reads only zero length data. Would it be possible that a zero size buffer is passed so that it cannot read anything?
... View more
11-27-2017
10:22 PM
Suppose my user name is "nicholas" and "nicholas" is already configured as a proxy user. Now, is there a way for "nicholas" to run a dfs command (say mkdir) as another user "foo"?
... View more
- Tags:
- cli
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
05-09-2017
11:20 PM
> Is there any maximum configurable value of ipc.maximum.data.length? Hadoop does not enforce a maximum. > Can we change this value above 128MB? Yes, you may change it to 192MB or 256MB to get around the current issue.
... View more
05-04-2017
09:47 PM
This is actually an Ambari question. You should get a faster response if you tag this with "Ambari".
... View more
05-04-2017
09:41 PM
Are you sure that datanodes are running fine? How many datanodes shown in the namenode web UI?
... View more
05-04-2017
09:35 PM
I assume you were using the default 10% threshold. Then, the cluster is considered balanced if the average utilization is 40% since both old and new nodes are within the 10% threshold. You may consider setting the threshold to a small value using the -threshold option. For more details on the configuration and options, please see https://community.hortonworks.com/articles/43849/hdfs-balancer-2-configurations-cli-options.html Hope it helps.
... View more
03-13-2017
08:49 PM
@hardik desai, you may turn on debug log in the namenode for org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy so that it will print more information in the namenode log on why the datanodes are excluded.
... View more
03-13-2017
08:45 PM
1 Kudo
No, it is a non-disruptive procedure, provided that the cluster is healthy and is not under a heavy load. One of reasons to do so is for upgrading namenode, either software or hardware. During a namenode failover, the jobs and clients application will be redirected from the old active namenode to the new active namenode. Of course, they have to wait until the new active namenode becomes ready so that they are slowed down. In this sense, we are better to perform the failover operation when the cluster is idea or under a small load. Hope it helps.
... View more
03-13-2017
06:55 PM
What is your version of Hadoop? Could you post the output from "hadoop -version"?
... View more
03-13-2017
06:49 PM
Yes, the audit log will serve the purpose. Note that, in some cases,
it is not straightforward to search the log for deletion since a
directory (or a file) may not be deleted directly -- it may be deleted
as a part of the deletion of its parent/ancestor directory. So we
should first search the full path in the log. If it is not found,
search the parent directory path and so on. It will be more complicated if deletion and re-creation occurred repeatedly. For example 1) user A: create /foo 2) user A: create /foo/bar 3) user A: del /foo 4) user B: create /foo 5) user B: del /foo
Who has deleted /foo/bar? It is easy to mistakenly take user B as the answer. B is the last user deleted foo but B
is not the user deleted /foo/bar. In such case, we should first
determine when the target directory/file is created and then search what
happened of it starting from the creation time. You can imagine that it is even harder to find out the correct answer if the path or the parent/ancestor paths are moved/renamed. We need to pay extra attention if the rename operation is involved.
... View more
12-18-2016
07:42 AM
1 Kudo
Hi @jbarnett, In order to run HDFS balancer, the new conf dfs.internal.nameservices, which distinguishes internal and
remote clusters, needs to be set so that Balancer will use it to locate the
local file system. Alternatively, Balancer and distcp need not share the same conf since
distcp may be used for multiple remote clusters. When adding a new
remote cluster, we need to add it to the distcp conf. However, it does
not make sense to change the Balancer conf. If we are going to use a
separated conf for Balancer, we may put only one file system (i.e. the local fs but not the remote fs) in
dfs.nameservices . As a summary, there are two ways to fix the conf. Set all the local and the remote file systems in dfs.nameservices
and then set the local file system in dfs.internal.nameservices. The
conf will work for both distcp and Balancer. Set only the local file system in dfs.nameservices in the Balancer conf. Use a different conf for distcp. Hope it helps.
... View more
07-08-2016
11:08 AM
9 Kudos
The Balancer runs in iterations for balancing a cluster. An
iteration consists of four steps. We discuss each step in detail below. Step 1, Storage Group Classification Balancer first invokes the getLiveDatanodeStorageReport() rpc to the Namenode in order to get the storage report for all the storage devices in all Datanodes. The storage report contains storage utilization information such as capacity, DFS used space, remaining space, etc. for each storage device in each Datanode. A Datanode may contain multiple storage devices and the storage devices may have different storage types. A storage group G(i,T) is defined to be the group of all the storage devices with the same storage type T in Datanode i. For example, G(i,DISK) is the storage group of all the disk storage devices in Datanode i. For each storage type T in each Datanode i, Balancer computes Storage Group Utilization (%) U(i,T) = 100% * (storage group used space)/(storage group capacity), and Average Utilization (%) U(avg,T) = 100% * (sum of all used spaces)/(sum of all capacities). Let Δ be the threshold parameter (default is 10%) and G(i,T) be the storage group with storage type T in Datanode i. Then, the utilization classes are defined as following. Over-Utilized: { G(i,T) : U(avg,T) + Δ < U(i,T) },
Average + Threshold ----------------------------------------------------------------------------
Above-Average: { G(i,T) : U(avg,T) < U(i,T) <= U(avg,T) + Δ },
Average ----------------------------------------------------------------------------------------
Below-Average: { G(i,T) : U(avg,T) - Δ <= U(i,T) <= U(avg,T) },
Average - Threshold ----------------------------------------------------------------------------
Under-Utilized: { G(i,T) : U(i,T) < U(avg,T) - Δ }. Roughly speaking, a storage group is over-utilized (under-utilized) if its utilization is larger (smaller) than the average plus (minus) threshold. A storage group is above-average (below-average) if its utilization is larger (smaller) than average but within the threshold. If there are no over-utilized storage groups and no under-utilized storage groups, the cluster is said to be balanced. The Balancer terminates with a SUCCESS state. Otherwise, it continues with the following steps. Step 2, Storage Group Pairing The Balancer selects over-utilized or above-average storage groups as sources, and under-utilized or below-average storage groups as targets. It pairs a source storage group with a target storage group (source -> target) in the following priority order.
Same-Rack (the source and the target must reside in the same rack)
Over-Utilized -> Under-Utilized Over-Utilized -> Below-Average Above-Average -> Under-Utilized Any
Over-Utilized -> Under-Utilized Over-Utilized -> Below-Average Above-Average -> Under-Utilized Step 3, Block Move Scheduling For each source-target pair, the Balancer chooses block replicas from the source storage groups and then schedules block moves. A block replica in a source storage group is a good candidate if it satisfies all the conditions below.
Its storage type is the same as the target storage type. It is not already scheduled. The target does not already have the same block replica. The number of racks of the block is not reduced after the move. Note that, logically, the Balancer schedules a block replica to be “moved” from a source storage group to a target storage group. In practice, since a block usually has multiple replicas, the block move can be done by first copying the replica from a proxy, which can be any storage group containing one of the replicas of the block, to the target storage group, and then deleting the replica from the source storage group. After a candidate block in the source datanode is chosen, the Balancer selects a storage group containing the same replica as the proxy. The Balancer selects the closest storage group as the proxy in order to minimize the network traffic. When it is impossible to schedule any move, the Balancer terminates with a NO_MOVE_BLOCK state. Step 4, Block Move Execution The Balancer dispatches a scheduled block move by invoking the DataTransferProtocol.replaceBlock(..) method to the target datanode. It specifies the proxy and the source as delete-hint in the method call. Then, the target datanode copies the replica directly from the proxy to its local storage. Once the copying process has been completed, the target datanode reports the new replica to the Namenode with the delete-hint. Namenode uses the delete-hint to delete the extra replica, i.e. delete the replica stored in the source. After all block moves are dispatched, the Balancer waits until all the moves are completed. Then, the Balancer continues to run a new iteration and repeats all these steps. In case that all the scheduled moves fail for 5 consecutive iterations, the Balancer terminates with a NO_MOVE_PROGRESS state. We end this article by listing the exit states in the following table. Exit States State Code Description SUCCESS 0 The cluster is balanced (i.e. no over/under-utilized storage groups) regarding to the given threshold. ALREADY_RUNNING -1 Another Balancer is running. NO_MOVE_BLOCK -2 The Balancer is not able to schedule any move. NO_MOVE_PROGRESS -3 All the scheduled moves fail for 5 consecutive iterations IO_EXCEPTION -4 An IOException being thrown ILLEGAL_ARGUMENTS -5 Found an illegal argument in the command or in the configuration. INTERRUPTED -6 The Balancer process is interrupted. UNFINALIZED_UPGRADE -7 The cluster is being upgraded.
... View more
- Find more articles tagged with:
- Balancer
- Hadoop Core
- HDFS
- Issue Resolution
- operations
Labels:
07-07-2016
04:14 AM
14 Kudos
The original configurations and the original CLI options are respectively discussed in Section 1 & 2. The new configurations and the new CLI options are respectively discussed in Section 3 & 4. 1. Original Configurations Concurrent Moves dfs.datanode.balance.max.concurrent.moves, default is 5 This configuration is to limit the maximum number of concurrent block moves that a Datanode is allowed for balancing the cluster. If this configuration is set in a Datanode, the Datanode will throw an exception if the limit is exceeded. If it is set in the Balancer, the Balancer will schedule concurrent block movements within the limit. Note that the Datanode setting and the Balancer setting could be different. Since both settings impose a restriction, the effective setting is the minimum of them. Before HDFS-9214, changing this configuration in a Datanode requires restarting the Datanode. Therefore, it is recommended to set it to the highest possible value in Datanodes and adjust the runtime value in Balancer in order to gain the flexibility. HDFS-9214 allows this conf to be reconfigured without Datanode restart; below are the steps for re-configuring a Datanode:
Change the value of dfs.datanode.balance.max.concurrent.moves in the configuration xml file stored in the Datanode machine. Start a reconfiguration task by the following command hdfs dfsadmin -reconfig datanode <dn_addr>:<ipc_port> start For example, suppose a Datanode has 12 disks. Then, this configuration can be set to 24, a small multiple of the number of disks, in the Datanodes. A higher value may not be useful but only increases disk contention. If the Balancer is running in a maintenance window, the setting in Balancer can be the same, i.e. 24, in order to utilize all the bandwidth. However, if the Balancer is running at same time with some other jobs, it should be set to a smaller value, say 5, in the Balancer so that there is bandwidth available for the jobs. Bandwidth dfs.datanode.balance.bandwidthPerSec, default is 1048576 (=1MB/s) This configuration is to limit the bandwidth in each Datanode using for balancing the cluster. Unlike dfs.datanode.balance.max.concurrent.moves, changing this configuration does not require restarting Datanodes. It can be done by the following command. dfsadmin -setBalancerBandwidth <bandwidth in bytes per second> Mover Threads dfs.balancer.moverThreads, default is 1000 It is the number of threads in Balancer for moving blocks. Each block move requires a thread so that this configuration limits the number of total concurrent moves for balancing in the entire cluster. 2 Original CLI Option Balancing Policy, Threshold & Block Pools [-policy <policy>] Describe how to determine if a cluster is balanced. The two supported policies are: blockpool: Cluster is balanced if each pool in each node is balanced. datanode: Cluster is balanced if each datanode is balanced. Note that the blockpool policy is stricter than the datanode policy in the sense that the blockpool policy requirement implies the datanode policy requirement. The default policy is datanode. [-threshold <threshold>] Specify a number in the close interval [1.0, 100.0] representing the acceptable threshold of the percentage of storage capacity so that storage utilization outside the average +/- the threshold is considered as over/under utilized. The default threshold is 10.0, i.e. 10%. [-blockpools <comma-separated list of blockpool ids>] The Balancer will run on only the block pools specified in this list. When the list is empty, the Balancer runs on all the existing block pools. The default value is an empty list. Include & Exclude Lists [-include [-f <hosts-file> | <comma-separated list of hosts>]] When the include list is nonempty, only the datanodes specified in the list will be balanced by the Balancer. An empty include list means including all the datanodes in the cluster. The default value is an empty list. [-exclude [-f <hosts-file> | <comma-separated list of hosts>]] The datanodes specified in the exclude list will be excluded so that the Balancer will not balance those datanodes. An empty exclude list means that no datanodes are excluded. When a datanode is specified in both in the include list and the exclude list, the datanode will be excluded. The default value is an empty list. Idle-Iterations & Run-During-Upgrade [-idleiterations <idleiterations>] It specifies the number of consecutive iterations (-1 for infinite) that no blocks have been moved before the Balancer terminates with the NO_MOVE_PROGRESS exit code. The default is 5. [-runDuringUpgrade] If it is specified, the Balancer will run even if there is an ongoing HDFS upgrade; otherwise, the Balancer terminates with the UNFINALIZED_UPGRADE exit code. When there is no ongoing upgrade, this option has no effect. It is usually undesirable to run Balancer during upgrade since, in order to support rollback, the blocks being deleted from HDFS will be moved to the internal trash directory in datanodes but not actually deleted. Therefore, running Balancer during upgrading cannot reduce the usage of any datanode storage. 3 New Configurations HDFS-8818: Allow Balancer to run faster dfs.balancer.max-size-to-move, default is 10737418240 (=10GB) In each iteration, Balancer chooses datanodes in pairs and then moves data between the datanode pairs. This configuration is to limit the maximum size of data that the Balancer will move between a chosen datanode pair. When the network and disk are not saturated, increasing this configuration can increase the data transfer between datanode pair in each iteration while the duration of an iteration remains about the same. HDFS-8824: Do not use small blocks for balancing the cluster dfs.balancer.getBlocks.size, default is 2147483648 (=2GB)
dfs.balancer.getBlocks.min-block-size, default is 10485760 (=10MB) After Balancer decided to move a certain amount of data between two datanodes (source and destination), it repeatedly invokes the getBlocks(..) rpc to the Namenode in order to get lists of blocks from the source datanode until the required amount of data is scheduled. dfs.balancer.getBlocks.size is the total data size of the block list returned by a getBlocks(..) rpc. dfs.balancer.getBlocks.min-block-size is the minimum block size that the blocks will be used for balancing the cluster. HDFS-6133: Block Pinning dfs.datanode.block-pinning.enabled, default is false When creating a file, a user application may specify a list of favorable datanodes via the file creation API in DistributedFileSystem. Namenode uses best effort allocating blocks to the favorable datanodes. When dfs.datanode.block-pinning.enabled is set to true, if a block replica is written to a favorable datanode, it will be “pinned” to that datanode. The pinned replicas will not be moved for cluster balancing in order to keep them stored in the specified favorable datanodes. This feature is useful for block distribution aware user applications such as HBase. 4 New CLI Option HDFS-8826: Source Datanodes [-source [-f <hosts-file> | <comma-separated list of hosts>]] The new -source option allows specifying a source datanode list so that the Balancer selects blocks to move from only those datanodes. When the list is empty, all the datanodes can be chosen as a source. The default value is an empty list. The option can be used to free up the space of some particular datanodes in the cluster. Without the -source option, the Balancer can be inefficient in some cases. Below is an example.
Datanodes (with the same capacity) Utilization Rack D1 95% A D2 30% B D3, D4, D5 0% B
In the table above, the average utilization is 25% so that D2 is within the 10% threshold. It is unnecessary to move any blocks from or to D2. Without specifying the source nodes, Balancer first moves blocks from D2 to D3, D4 and D5, since they are under the same rack, and then moves blocks from D1 to D3, D4 and D5. By specifying D1 as the source node, Balancer directly moves blocks from D1 to D3, D4 and D5. This is the second article of the HDFS Balancer serious. We will explain the algorithm deployed by the Balancer for balancing the cluster in the next article.
... View more
- Find more articles tagged with:
- Balancer
- FAQ
- Hadoop Core
- HDFS
- How-ToTutorial
- operations
Labels:
07-06-2016
01:12 PM
10 Kudos
HDFS Balancer is a tool for balancing the data across the storage devices of a HDFS cluster. The Balancer was originally designed to run slowly so that the balancing activities do not affect the normal cluster activities and the running jobs. We have received feedback from the HDFS community that it is also desirable if Balancer can be configured to run faster. The use cases are listed below.
Free up the spaces from some nearly full datanodes. Move data to some newly added datanodes in order to utilize the new machines. Run Balancer when the cluster load is low or in a maintenance window, instead of running it as a background daemon. We have changed Balancer for addressing these new use cases. After the changes, Balancer is able to run 100x faster while it still can be configured to run slowly as before. In one of our tests, we were able to bump the performance from a few gigabytes per minute to a terabyte per minute. In addition, we added two new features -- source datanodes and block pinning. Users can specify the source datanodes so that they can free up the spaces in particular datanodes using Balancer. A block distribution aware user application can pin its block replicas to particular datanodes so that the pinned replicas will not be moved for cluster balancing. Why the data stored in HDFS is imbalanced? There are three major reasons. 1. Adding Datanodes When new datanodes are added to a cluster, newly created blocks will be written to these datanodes from time to time. However, the existing blocks will not be moved to them without using Balancer. 2. Client Behavior In some cases, a client application may not write data uniformly across the datanode machines. A client application may be skewed in writing data. It may always write to some particular machines but not the other machines. HBase is an example of such application. In some other cases, the client application is not skewed by design such as MapReduce/YARN jobs. However, the data is skewed so that some of the job tasks write significantly more data than the other tasks. When a Datanode receives the data directly from the client, it stores a copy to its local storage for preserving data locality. The datanodes receiving more data usually have a higher storage utilization. 3. Block Allocation in HDFS HDFS uses a constraint satisfaction algorithm to allocate file blocks. Once the constraints are satisfied, HDFS allocates a block by randomly selecting a storage device from the candidate set uniformly. For large clusters, the blocks are essentially allocated randomly in a uniform distribution, provided that the client applications write data to HDFS uniformly across the datanode machines. Note that uniform random allocation may not result in a uniform data distribution because of the randomness. It is usually not a problem when the cluster has sufficient space but the problem becomes serious when the cluster is nearly full. In the next article, we will explain the usage of the original configurations/CLI
options of the Balancer and as well the new configurations/CLI options
added by the recent enhancement.
... View more
- Find more articles tagged with:
- Balancer
- Hadoop Core
- HDFS
- How-ToTutorial
- operations
Labels: