Support Questions

Find answers, ask questions, and share your expertise

How can we see the output in single file if 3 files are processed in Hadoop?

avatar
Contributor

I have 3 files in Hadoop but want to see the output in single file.

1 ACCEPTED SOLUTION

avatar
Guru

Command Line

If these three files are in the same directory, run the following from the command line of a server in the cluster. It will merge the files into one file and store it locally

hdfs dfs -getmerge <hdfsDir> <localFile>

where <hdfsDir> is the directory holding the files on hdfs and <localFile> is the name of the merged file that will be stored locally.

If you are talking about a directory structure that looks like this in HDFS:

myFile.txt/_SUCCESS
myFile.txt/part-m-00000
myFile.txt/part-m-00001

this is the result of a map-reduce job. <hdfsDir> in this case would be myFile.txt. Note that _SUCCESS is a 0 byte file: there are not contents -- it is just a flag to designate the m-r job was a success.

Ambari

Alternatively, you can do this from the File View on Ambari. Just open the directory holding the files you want to merge to one. Then check the files you want to merge

8280-screen-shot-2016-10-05-at-82419-am.png

Then click concatenate from the far right dropdown

8291-screen-shot-2016-10-05-at-81855-am.png

This will download the merged (concatenated) files from your browser.

Note for both approaches:

The above works for multiple files in the same directory even if the files are not the result of a map-reduce job (but is typically used for map-reduce results).

(If this is what you were looking for, please let me know by accepting the answer. Else, let me know the gaps in the answer).

View solution in original post

2 REPLIES 2

avatar
Guru

Command Line

If these three files are in the same directory, run the following from the command line of a server in the cluster. It will merge the files into one file and store it locally

hdfs dfs -getmerge <hdfsDir> <localFile>

where <hdfsDir> is the directory holding the files on hdfs and <localFile> is the name of the merged file that will be stored locally.

If you are talking about a directory structure that looks like this in HDFS:

myFile.txt/_SUCCESS
myFile.txt/part-m-00000
myFile.txt/part-m-00001

this is the result of a map-reduce job. <hdfsDir> in this case would be myFile.txt. Note that _SUCCESS is a 0 byte file: there are not contents -- it is just a flag to designate the m-r job was a success.

Ambari

Alternatively, you can do this from the File View on Ambari. Just open the directory holding the files you want to merge to one. Then check the files you want to merge

8280-screen-shot-2016-10-05-at-82419-am.png

Then click concatenate from the far right dropdown

8291-screen-shot-2016-10-05-at-81855-am.png

This will download the merged (concatenated) files from your browser.

Note for both approaches:

The above works for multiple files in the same directory even if the files are not the result of a map-reduce job (but is typically used for map-reduce results).

(If this is what you were looking for, please let me know by accepting the answer. Else, let me know the gaps in the answer).

avatar
Contributor

Thank you Greg for your answer it is really helpful .