Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to speed up "hdfs dfs -mv" for more than 30 000 files ?

Solved Go to solution
Highlighted

How to speed up "hdfs dfs -mv" for more than 30 000 files ?

Hello,

I've got 30 thousand of files to move to another hdfs directory.

Do you know a better way than "hdfs dfs -mv /mydirectory/* /targetdirectory" to go faster ?

Average size of a file : 10 Kb.

And I can't merge the files in a bigger one before.

Thanks for your feedback

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

View solution in original post

7 REPLIES 7
Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

1. dfs -mv is the fastest as compare to -cp or distcp .
If possible move mydirectory instead of mydirectory/* into /targetdirectory

Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

Thanks

Not possible because the result is /targetdirectory/mydirectory and I expect all the files moved in path /targetdirectory/*

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

View solution in original post

Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

Thanks but it doesn't work for the same reason.

When you "mv /mydirectory /targetdirectory" the result is always /targetdirectory/mydirectory.

Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

After running the first command targetdirectory will be renamed to x.

So mv /mydirectory /targetdirectory is not /targetdirectory/mydirectory , instead it will just rename mydirectory to targetdirectory since the destination directory doesn't exist.

So, if targetdirectory has less files this is an option.Instead of moving 30k files, you can move less files

Thanks,

Aditya

Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Aditya Sirna,

Of course... I'm going to try this.

Thanks

Highlighted

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet

If you have more than 10 GB, I'd recommend use distcp instead of using Copy OR Move.

Don't have an account?
Coming from Hortonworks? Activate your account here