About sze

sze · ‎12-01-2017

What is the conf dfs.datanode.data.dir set to? I suspect it is set to different directories under the same physical disk.

sze · ‎11-30-2017

Is this problem reproducible? Or, just a one time thing?

sze · ‎11-28-2017

For recoverLease using CLI, see https://community.hortonworks.com/questions/146012/force-closing-a-hdfs-file-still-open-because-uncor.html?childToView=146021#answer-146021

sze · ‎11-28-2017

Thanks a lot! Ming and I have tried it. It works well.

sze · ‎11-27-2017

"Sleep and retry" is good way to handle the "not have enough number of replicas" problem. For the "already the current lease holder" problem, you may call DistributedFileSystem.recoverLease(Path) to force lease recovery. Hope it helps.

sze · ‎11-27-2017

Suppose my user name is "nicholas" and "nicholas" is already configured as a proxy user. Now, is there a way for "nicholas" to run a dfs command (say mkdir) as another user "foo"?

sze · ‎05-09-2017

> Is there any maximum configurable value of ipc.maximum.data.length? Hadoop does not enforce a maximum. > Can we change this value above 128MB? Yes, you may change it to 192MB or 256MB to get around the current issue.

sze · ‎03-13-2017

No, it is a non-disruptive procedure, provided that the cluster is healthy and is not under a heavy load. One of reasons to do so is for upgrading namenode, either software or hardware. During a namenode failover, the jobs and clients application will be redirected from the old active namenode to the new active namenode. Of course, they have to wait until the new active namenode becomes ready so that they are slowed down. In this sense, we are better to perform the failover operation when the cluster is idea or under a small load. Hope it helps.

sze · ‎03-13-2017

What is your version of Hadoop? Could you post the output from "hadoop -version"?

sze · ‎03-13-2017

Yes, the audit log will serve the purpose. Note that, in some cases, it is not straightforward to search the log for deletion since a directory (or a file) may not be deleted directly -- it may be deleted as a part of the deletion of its parent/ancestor directory. So we should first search the full path in the log. If it is not found, search the parent directory path and so on. It will be more complicated if deletion and re-creation occurred repeatedly. For example 1) user A: create /foo 2) user A: create /foo/bar 3) user A: del /foo 4) user B: create /foo 5) user B: del /foo Who has deleted /foo/bar? It is easy to mistakenly take user B as the answer. B is the last user deleted foo but B is not the user deleted /foo/bar. In such case, we should first determine when the target directory/file is created and then search what happened of it starting from the creation time. You can imagine that it is even harder to find out the correct answer if the path or the parent/ancestor paths are moved/renamed. We need to pay extra attention if the rename operation is involved.

Online	Offline
Last Visited	‎07-18-2018 07:27 PM

Member Since	‎07-05-2016 11:16 PM
Last Visited	‎07-18-2018 07:27 PM
Posts	25
Kudos received	45

Cloudera Community

Re: HDFS reports a capacity larger than the sum of...

Re: How to handle: Unable to close file because th...

Re: Does Performing a Manual Failover Ever Disrupt...

Re: HDFS reports a capacity larger than the sum of...

Re: HDFS gest the wrong name for the datanode

Re: How to handle: Unable to close file because th...

Re: How do run dfs command as another user?

Re: How to handle: Unable to close file because th...

How do run dfs command as another user?

Re: ISSUE: Requested data length 146629817 is long...

Re: Does Performing a Manual Failover Ever Disrupt...

Re: Very slow hdfs command responses in cluster me...

Re: Is there a way to find the person who deleted ...