Support Questions
Find answers, ask questions, and share your expertise

Distcp with wildcard

New Contributor

I want to copy only csv files from on directory into another, I use the command:

hadoop distcp /a/b/*.csv /e/f/

 

The issue here is, if there's multiple csv files under /a/b/, this command would work, it will copy all csv files into the directory /e/f/, however, if there's only one csv file, it will be copied into /e/f where f is a file not a directory, is there a way to resolve this? Thanks in advance!

 

1 ACCEPTED SOLUTION

Super Collaborator

I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.

View solution in original post

1 REPLY 1

Super Collaborator

I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.

; ;