Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Distcp with wildcard

avatar
New Contributor

I want to copy only csv files from on directory into another, I use the command:

hadoop distcp /a/b/*.csv /e/f/

 

The issue here is, if there's multiple csv files under /a/b/, this command would work, it will copy all csv files into the directory /e/f/, however, if there's only one csv file, it will be copied into /e/f where f is a file not a directory, is there a way to resolve this? Thanks in advance!

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.