Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to speed up "hdfs dfs -mv" for more than 30 000 files ?

Solved Go to solution

How to speed up "hdfs dfs -mv" for more than 30 000 files ?

New Contributor

Hello,

I've got 30 thousand of files to move to another hdfs directory.

Do you know a better way than "hdfs dfs -mv /mydirectory/* /targetdirectory" to go faster ?

Average size of a file : 10 Kb.

And I can't merge the files in a bigger one before.

Thanks for your feedback

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

7 REPLIES 7

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

1. dfs -mv is the fastest as compare to -cp or distcp .
If possible move mydirectory instead of mydirectory/* into /targetdirectory

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

New Contributor

Thanks

Not possible because the result is /targetdirectory/mydirectory and I expect all the files moved in path /targetdirectory/*

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

New Contributor

Thanks but it doesn't work for the same reason.

When you "mv /mydirectory /targetdirectory" the result is always /targetdirectory/mydirectory.

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet,

After running the first command targetdirectory will be renamed to x.

So mv /mydirectory /targetdirectory is not /targetdirectory/mydirectory , instead it will just rename mydirectory to targetdirectory since the destination directory doesn't exist.

So, if targetdirectory has less files this is an option.Instead of moving 30k files, you can move less files

Thanks,

Aditya

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

New Contributor

@Aditya Sirna,

Of course... I'm going to try this.

Thanks

Re: How to speed up "hdfs dfs -mv" for more than 30 000 files ?

@Thierry Vernhet

If you have more than 10 GB, I'd recommend use distcp instead of using Copy OR Move.

Don't have an account?
Coming from Hortonworks? Activate your account here