Created on 10-09-2017 07:06 PM - edited 09-16-2022 05:22 AM
i need to merge n part files in hdfs but i dont have enough space in the local FS to generate it with getmerge. Is there another way to do this?
Created 10-09-2017 07:20 PM
Created 10-09-2017 07:53 PM
Created 10-09-2017 07:54 PM
i dont have enough space in the local FS
Created 10-09-2017 07:58 PM
@eric valoschin the solution in the above link is not storing the output on local FS. It is streaming the output from HDFS to HDFS:
============================
A command line scriptlet to do this could be as follows:
hadoop fs -text *_fileName.txt | hadoop fs -put - targetFilename.txt
This will cat all files that match the glob to standard output, then you'll pipe that stream to the put command and output the stream to an HDFS file named targetFilename.txt
=============================
Created 10-09-2017 08:13 PM
it is a compress bz2 file and i get an error about the codec when tring to get de new file.
INFO compress.CodecPool: Got brand-new decompressor [.bz2]
text: Unable to write to output stream.
Created 10-09-2017 08:07 PM
Can you try the following command
hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \ -Dmapred.reduce.tasks=1 \ -input "<path-to-input-directory>" \ -output "<path-to-output-directory>" \ -mapper cat \ -reducer cat
make sure which version of hadoop streaming jar you are using by going to
/usr/hdp
then give the input path and make sure the output directory is not existed as this job will merge the files and creates the output directory for you.
Here what i tried:-
#hdfs dfs -ls /user/yashu/folder2/ Found 2 items -rw-r--r-- 3 hdfs hdfs 150 2017-09-26 17:55 /user/yashu/folder2/part1.txt -rw-r--r-- 3 hdfs hdfs 20 2017-09-27 09:07 /user/yashu/folder2/part1_sed.txt
#hadoop jar /usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \ > -Dmapred.reduce.tasks=1 \ > -input "/user/yashu/folder2/" \ > -output "/user/yashu/folder1/" \ > -mapper cat \ > -reducer cat
Folder2 having 2 files after running the above command, i am storing the merged files to folder1 directory and the 2 files got merged into 1 file as you can see below.
#hdfs dfs -ls /user/yashu/folder1/ Found 2 items -rw-r--r-- 3 hdfs hdfs 0 2017-10-09 16:00 /user/yashu/folder1/_SUCCESS -rw-r--r-- 3 hdfs hdfs 174 2017-10-09 16:00 /user/yashu/folder1/part-00000