Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to speed up "hdfs dfs -mv" for more than 30 000 files ?

avatar
Rising Star

Hello,

I've got 30 thousand of files to move to another hdfs directory.

Do you know a better way than "hdfs dfs -mv /mydirectory/* /targetdirectory" to go faster ?

Average size of a file : 10 Kb.

And I can't merge the files in a bigger one before.

Thanks for your feedback

1 ACCEPTED SOLUTION

avatar
Super Guru

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

View solution in original post

7 REPLIES 7

avatar

1. dfs -mv is the fastest as compare to -cp or distcp .
If possible move mydirectory instead of mydirectory/* into /targetdirectory

avatar
Rising Star

Thanks

Not possible because the result is /targetdirectory/mydirectory and I expect all the files moved in path /targetdirectory/*

avatar
Super Guru

@Thierry Vernhet,

If there are less files in /targetdirectory than the /mydirectory , you can do the below

hdfs dfs -mv /targetdirectory /x
hdfs dfs -mv /mydirectory /targetdirectory
hdfs dfs -mv /x/* /targetdirectory

Thanks,

Aditya

avatar
Rising Star

Thanks but it doesn't work for the same reason.

When you "mv /mydirectory /targetdirectory" the result is always /targetdirectory/mydirectory.

avatar
Super Guru

@Thierry Vernhet,

After running the first command targetdirectory will be renamed to x.

So mv /mydirectory /targetdirectory is not /targetdirectory/mydirectory , instead it will just rename mydirectory to targetdirectory since the destination directory doesn't exist.

So, if targetdirectory has less files this is an option.Instead of moving 30k files, you can move less files

Thanks,

Aditya

avatar
Rising Star

@Aditya Sirna,

Of course... I'm going to try this.

Thanks

avatar
@Thierry Vernhet

If you have more than 10 GB, I'd recommend use distcp instead of using Copy OR Move.