Created on 01-07-2018 12:04 AM - edited 09-16-2022 05:42 AM
How can I now join all the files in one folder to one single csv file?
I have a folder called Folder1 and I want to combine them all to a file called "output.csv".
I tried:
hadoop fs -getmerge Folder1 /user/maria_dev/output.csv
But I get the error:
getmerge: Mkdirs failed to create file:/user/maria_dev (exists=false, cwd=file:/home/maria_dev)
I also tried:
hadoop fs -cat Folder1 /output.csv
But receive error: No such file or directory.
Thanks
Created 01-07-2018 02:14 AM
Are you looking at the correct directory?
Can you please share the complete PATH of the directory in the One screenshot with all the commands that you are trying from Directory.
.
Created 01-07-2018 12:28 AM
The "getmerge" command will take assume "Folder1" as HDFS Source directory and then second argument "/user/maria_dev/" as Local filesystem Destination directory and hence you will see this error..
Here is a complete example which will help in understanding "getmerge"
Syntax:
[-getmerge [-nl] <src> <localdst>]
.
1. I have 3 Files in Sandbox as following:
[maria_dev@sandbox ~]$ cat /tmp/aa.txt aa [maria_dev@sandbox ~]$ cat /tmp/bb.txt bb [maria_dev@sandbox ~]$ cat /tmp/cc.txt cc
2. I have placed those files to HDFS "/user/maria_dev/test" directory as following:
[maria_dev@sandbox ~]$ hdfs dfs -mkdir /user/maria_dev/test [maria_dev@sandbox ~]$ hdfs dfs -put /tmp/aa.txt /user/maria_dev/test [maria_dev@sandbox ~]$ hdfs dfs -put /tmp/bb.txt /user/maria_dev/test [maria_dev@sandbox ~]$ hdfs dfs -put /tmp/cc.txt /user/maria_dev/test
3. Following files are now present on HDFS.
[maria_dev@sandbox ~]$ hdfs dfs -ls /user/maria_dev/test Found 3 items -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/aa.txt -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/bb.txt -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/cc.txt
4. Now doing a "getmerge" as following. Following command will Merge the contents of all the files present in "/user/maria_dev/test/" HDFS directory to the ("local Filesystem") "/tmp/test.txt" file.
[maria_dev@sandbox ~]$ hdfs dfs -getmerge /user/maria_dev/test/* /tmp/test.txt [maria_dev@sandbox ~]$ cat /tmp/test.txt aa bb cc
.
5. Now put the Merged file to HDFS back.
[maria_dev@sandbox ~]$ hdfs dfs -put /tmp/test.txt /user/maria_dev/test/ [maria_dev@sandbox ~]$ hdfs dfs -ls /user/maria_dev/test Found 4 items -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/aa.txt -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/bb.txt -rw-r--r-- 1 maria_dev hadoop 3 2018-01-05 23:39 /user/maria_dev/test/cc.txt -rw-r--r-- 1 maria_dev hadoop 9 2018-01-05 23:55 /user/maria_dev/test/test.txt
.
.
Created 01-07-2018 12:43 AM
@ Jay Kumar SenSharma
thank you. Two questions:
1. Is there a way to merge the files directly from HDFS, or do you need to merge them to local file system and then back to HDFS?
2. I was following your instructions, but on point 4 with getmerge, I used this:
hdfs dfs -getmerge /user/maria_dev/Folder1/* /maria_dev/Folder1/output.csv
I have a folder called Folder1 (it is also on local file system under maria_dev folder as Folder1 but get the same error:
getmerge: Mkdirs failed to create file:/maria_dev/Folder1 (exists=false, cwd=file:/home/maria_dev)
Have I missed a step or written this incorrectly?
Thanks
Created 01-07-2018 12:53 AM
@ Jay Kumar SenSharma
thank you. Two questions:
1. Is there a way to merge the files directly on HDFS, or do you need to merge to local file system then put back on HDFS?
2. I followed your instructions but on point no. 4 I used:
hdfs dfs -getmerge /user/maria_dev/Folder1/* /Folder1/output.csv
I have a folder called Folder1 on HDFS and it is also the same folder on local system, but got the same error:
getmerge: Mkdirs failed to create file:/Folder1 (exists=false, cwd=file:/home/maria_dev)
Not sure why this occurred. Have I missed a step or typed incorrectly?
Thanks
Created 01-07-2018 01:05 AM
When you run the command as following:
hdfs dfs -getmerge /user/maria_dev/Folder1/* /Folder1/output.csv
Then it expects that the second argument which is "/Folder1/" is a valid directory on your local filesystem.
Hence you will need to first create a valid path in your local file system.
You will need to create the "/Folder1" directory on your local machine first.
# mkdir "/Folder1/"
The you should be able to run:
# hdfs dfs -getmerge /user/maria_dev/Folder1/* /Folder1/output.csv
.
Created 01-07-2018 01:10 AM
Similarly if you want to run the following command:
# hdfs dfs -getmerge /user/maria_dev/Folder1/* /maria_dev/Folder1/output.csv
.
Then you will need to make sure that the following PATH "/maria_dev/Folder1/" exist on your local machine (sandbox)
# mkdir -p /maria_dev/Folder1/ # hdfs dfs -getmerge /user/maria_dev/Folder1/* /maria_dev/Folder1/output.csv
.
Another Example:
# mkdir -p /tmp/aa/bb/cc/dd/Folder1/ # hdfs dfs -getmerge /user/maria_dev/Folder1/* /tmp/aa/bb/cc/dd/Folder1/output.csv
.
Created 01-07-2018 01:14 AM
The user who is running the command "hdfs" should have the WRITE permission on the local filesystem to create the directories.
Else you will see the following error:
[maria_dev@sandbox ~]$ hdfs dfs -getmerge /user/maria_dev/Folder1/* /maria_dev/Folder1/output.csv getmerge: Mkdirs failed to create file:/Folder1 (exists=false, cwd=file:/home/maria_dev)
.
This error indicates that the Sandbox User "maria_dev" does not have privileges to create that directory on the local filsystem.
[maria_dev@sandbox ~]$ mkdir /Folder1 mkdir: cannot create directory `/Folder1': Permission denied
.
So you will need to make sure two things:
Thumb Rule:
1. The PATH mentioned in the second argument of the "getmerge" command exist.
2. The Operating System user (like "maria_dev") who is running the "getmerge" command has enough Read and write permission to the PATH which is mentioned in the command.
Created 01-07-2018 01:26 AM
Example:
[maria_dev@sandbox ~]$ ls -l /Folder1/ ls: cannot access /Folder1/: No such file or directory
As the above directory does not exist hence we see the following error
[maria_dev@sandbox ~]$ hdfs dfs -getmerge /user/maria_dev/test/* /Folder1/merged_files.txt getmerge: Mkdirs failed to create file:/Folder1 (exists=false, cwd=file:/home/maria_dev)
.
As the directory does not exist hence we will need to create that PATH first.
[maria_dev@sandbox ~]$ mkdir /Folder1 mkdir: cannot create directory `/Folder1': Permission denied
.
Running directory creation command command as "root"
[maria_dev@sandbox ~]$ exit [root@sandbox ~]# mkdir -p /Folder1 [root@sandbox ~]# chmod 777 -R /Folder1/
Now as the "maria_dev" user has read-write permission on PATH '/Folder1" hence we can now run the command:
[root@sandbox ~]# su - maria_dev [maria_dev@sandbox ~]$ hdfs dfs -getmerge /user/maria_dev/test/* /Folder1/merged_files.txt
.
Created 01-07-2018 01:25 AM
@ Jay Kumar SenSharma
thank you. The Folder1 folder does indeed exist. This worked for me:
hadoop fs -getmerge /user/maria_dev/Folder1/* output.csv
I cannot seem to use any of the "hdfs dfs" commands above?
The above gave me an output file, but it was only the first file, i.e. it did not join the second file to it?
Created 01-07-2018 01:28 AM