i need to merge n part files in hdfs but i dont have enough space in the local FS to generate it with getmerge. Is there another way to do this?
@eric valoschin the solution in the above link is not storing the output on local FS. It is streaming the output from HDFS to HDFS:
A command line scriptlet to do this could be as follows:
hadoop fs -text *_fileName.txt | hadoop fs -put - targetFilename.txt
This will cat all files that match the glob to standard output, then you'll pipe that stream to the put command and output the stream to an HDFS file named targetFilename.txt
it is a compress bz2 file and i get an error about the codec when tring to get de new file.
INFO compress.CodecPool: Got brand-new decompressor [.bz2]
text: Unable to write to output stream.
Can you try the following command
hadoop jar /usr/hdp/18.104.22.168-37/hadoop-mapreduce/hadoop-streaming-22.214.171.124.5.3.0-37.jar \ -Dmapred.reduce.tasks=1 \ -input "<path-to-input-directory>" \ -output "<path-to-output-directory>" \ -mapper cat \ -reducer cat
make sure which version of hadoop streaming jar you are using by going to
then give the input path and make sure the output directory is not existed as this job will merge the files and creates the output directory for you.
Here what i tried:-
#hdfs dfs -ls /user/yashu/folder2/ Found 2 items -rw-r--r-- 3 hdfs hdfs 150 2017-09-26 17:55 /user/yashu/folder2/part1.txt -rw-r--r-- 3 hdfs hdfs 20 2017-09-27 09:07 /user/yashu/folder2/part1_sed.txt
#hadoop jar /usr/hdp/126.96.36.199-37/hadoop-mapreduce/hadoop-streaming-188.8.131.52.5.3.0-37.jar \ > -Dmapred.reduce.tasks=1 \ > -input "/user/yashu/folder2/" \ > -output "/user/yashu/folder1/" \ > -mapper cat \ > -reducer cat
Folder2 having 2 files after running the above command, i am storing the merged files to folder1 directory and the 2 files got merged into 1 file as you can see below.
#hdfs dfs -ls /user/yashu/folder1/ Found 2 items -rw-r--r-- 3 hdfs hdfs 0 2017-10-09 16:00 /user/yashu/folder1/_SUCCESS -rw-r--r-- 3 hdfs hdfs 174 2017-10-09 16:00 /user/yashu/folder1/part-00000