Created 02-11-2017 08:10 PM
I am practicing steps from tutorial "A Lap Around Apache Spark" One of the step is " Perform WordCount with Spark Copy input file for Spark WordCount Example Upload the input file you want to use in WordCount to HDFS. You can use any text file as input. In the following example, log4j.properties is used as an example: As user spark: hadoop fs -copyFromLocal /etc/hadoop/conf/log4j.properties /tmp/data "
After this step I am not able to see the log4j.properties file into /tmp/data path but if I try to execute copyFromLocal above command it says File exist.
I am using below command to view file.
hdfs dfs -ls /tmp/data
also tried
hdfs dfs -cat /tmp/data/log4j.properties
Created 02-11-2017 11:25 PM
Your /tmp/data on hdfs is a file, not a directory. So, when you did the copy for the first time
tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/
wordFile.txt was copied to /tmp and renamed to "data". That's why the second time the command complains that the file exists, because by default "-put" or "-copyFromLocal" don't owerwrite target files. You can force overwrite by adding "-f":
tmp]$ hdfs dfs -copyFromLocal -f wordFile.txt /tmp/data/
If you copy to a directory, than the original file name will be preserved:
tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp
will create /tmp/wordFile.txt on hdfs.
Created 02-11-2017 08:35 PM
@Ram Ghase what if you run the command as this?
sudo -u spark hdfs dfs -ls /tmp/data/
also make sure /tmp/data exists
hdfs dfs -ls /tmp/
Created 02-11-2017 11:05 PM
please see the set of commands I am trying to executing. [root@sandbox tmp]# pwd /tmp [root@sandbox tmp]# ls -ltr | grep word -rw-r--r-- 1 root root 2 Oct 25 08:13 words.txt -rw-r--r-- 1 root root 128 Feb 11 19:39 wordFile.txt [root@sandbox tmp]# hdfs dfs -ls /tmp/data/ -rwxrwxrwx 1 root hdfs 10411 2017-02-10 03:55 /tmp/data [root@sandbox tmp]# su spark [spark@sandbox tmp]$ hdfs dfs -put wordFile.txt /tmp/data/ put: `/tmp/data': File exists [spark@sandbox tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/ copyFromLocal: `/tmp/data': File exists [spark@sandbox tmp]$ sudo -u spark hdfs dfs -ls /tmp/data/ spark is not in the sudoers file. This incident will be reported. [spark@sandbox tmp]$ exit exit [root@sandbox tmp]# sudo -u spark hdfs dfs -ls /tmp/data/ -rwxrwxrwx 1 root hdfs 10411 2017-02-10 03:55 /tmp/data [root@sandbox tmp]# su -u spark hdfs dfs -ls /tmp/data/ su: invalid option -- 'u'
Created 02-11-2017 11:16 PM
@Artem Ervits Thanks for the response. I tried same but no luck.
Created 02-11-2017 11:25 PM
Your /tmp/data on hdfs is a file, not a directory. So, when you did the copy for the first time
tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/
wordFile.txt was copied to /tmp and renamed to "data". That's why the second time the command complains that the file exists, because by default "-put" or "-copyFromLocal" don't owerwrite target files. You can force overwrite by adding "-f":
tmp]$ hdfs dfs -copyFromLocal -f wordFile.txt /tmp/data/
If you copy to a directory, than the original file name will be preserved:
tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp
will create /tmp/wordFile.txt on hdfs.
Created 02-12-2017 12:25 AM
@Predrag Minovic Thank you sir. It worked.