Created 10-05-2016 11:09 AM
I have 3 files in Hadoop but want to see the output in single file.
Created on 10-05-2016 12:11 PM - edited 08-19-2019 04:06 AM
Command Line
If these three files are in the same directory, run the following from the command line of a server in the cluster. It will merge the files into one file and store it locally
hdfs dfs -getmerge <hdfsDir> <localFile>
where <hdfsDir> is the directory holding the files on hdfs and <localFile> is the name of the merged file that will be stored locally.
If you are talking about a directory structure that looks like this in HDFS:
myFile.txt/_SUCCESS myFile.txt/part-m-00000 myFile.txt/part-m-00001
this is the result of a map-reduce job. <hdfsDir> in this case would be myFile.txt. Note that _SUCCESS is a 0 byte file: there are not contents -- it is just a flag to designate the m-r job was a success.
Ambari
Alternatively, you can do this from the File View on Ambari. Just open the directory holding the files you want to merge to one. Then check the files you want to merge
Then click concatenate from the far right dropdown
This will download the merged (concatenated) files from your browser.
Note for both approaches:
The above works for multiple files in the same directory even if the files are not the result of a map-reduce job (but is typically used for map-reduce results).
(If this is what you were looking for, please let me know by accepting the answer. Else, let me know the gaps in the answer).
Created on 10-05-2016 12:11 PM - edited 08-19-2019 04:06 AM
Command Line
If these three files are in the same directory, run the following from the command line of a server in the cluster. It will merge the files into one file and store it locally
hdfs dfs -getmerge <hdfsDir> <localFile>
where <hdfsDir> is the directory holding the files on hdfs and <localFile> is the name of the merged file that will be stored locally.
If you are talking about a directory structure that looks like this in HDFS:
myFile.txt/_SUCCESS myFile.txt/part-m-00000 myFile.txt/part-m-00001
this is the result of a map-reduce job. <hdfsDir> in this case would be myFile.txt. Note that _SUCCESS is a 0 byte file: there are not contents -- it is just a flag to designate the m-r job was a success.
Ambari
Alternatively, you can do this from the File View on Ambari. Just open the directory holding the files you want to merge to one. Then check the files you want to merge
Then click concatenate from the far right dropdown
This will download the merged (concatenated) files from your browser.
Note for both approaches:
The above works for multiple files in the same directory even if the files are not the result of a map-reduce job (but is typically used for map-reduce results).
(If this is what you were looking for, please let me know by accepting the answer. Else, let me know the gaps in the answer).
Created 10-06-2016 02:00 AM
Thank you Greg for your answer it is really helpful .