Support Questions

Find answers, ask questions, and share your expertise

Unable to see the file

avatar
Contributor

I am practicing steps from tutorial "A Lap Around Apache Spark" One of the step is " Perform WordCount with Spark Copy input file for Spark WordCount Example Upload the input file you want to use in WordCount to HDFS. You can use any text file as input. In the following example, log4j.properties is used as an example: As user spark: hadoop fs -copyFromLocal /etc/hadoop/conf/log4j.properties /tmp/data "

After this step I am not able to see the log4j.properties file into /tmp/data path but if I try to execute copyFromLocal above command it says File exist.

I am using below command to view file.

hdfs dfs -ls /tmp/data

also tried

hdfs dfs -cat /tmp/data/log4j.properties

1 ACCEPTED SOLUTION

avatar
Master Guru

Your /tmp/data on hdfs is a file, not a directory. So, when you did the copy for the first time

tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/

wordFile.txt was copied to /tmp and renamed to "data". That's why the second time the command complains that the file exists, because by default "-put" or "-copyFromLocal" don't owerwrite target files. You can force overwrite by adding "-f":

tmp]$ hdfs dfs -copyFromLocal -f wordFile.txt /tmp/data/

If you copy to a directory, than the original file name will be preserved:

tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp

will create /tmp/wordFile.txt on hdfs.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Ram Ghase what if you run the command as this?

sudo -u spark hdfs dfs -ls /tmp/data/

also make sure /tmp/data exists

hdfs dfs -ls /tmp/

avatar
Contributor
please see the set of commands I am trying to executing. 

[root@sandbox tmp]# pwd
/tmp
[root@sandbox tmp]# ls -ltr | grep word
-rw-r--r--  1 root       root           2 Oct 25 08:13 words.txt
-rw-r--r--  1 root       root         128 Feb 11 19:39 wordFile.txt
[root@sandbox tmp]# hdfs dfs -ls /tmp/data/
-rwxrwxrwx   1 root hdfs      10411 2017-02-10 03:55 /tmp/data
[root@sandbox tmp]# su spark
[spark@sandbox tmp]$ hdfs dfs -put wordFile.txt /tmp/data/
put: `/tmp/data': File exists
[spark@sandbox tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/
copyFromLocal: `/tmp/data': File exists
[spark@sandbox tmp]$ sudo -u spark hdfs dfs -ls /tmp/data/
spark is not in the sudoers file.  This incident will be reported.
[spark@sandbox tmp]$ exit
exit
[root@sandbox tmp]# sudo -u spark hdfs dfs -ls /tmp/data/
-rwxrwxrwx   1 root hdfs      10411 2017-02-10 03:55 /tmp/data
[root@sandbox tmp]# su -u spark hdfs dfs -ls /tmp/data/
su: invalid option -- 'u'

avatar
Contributor

@Artem Ervits Thanks for the response. I tried same but no luck.

avatar
Master Guru

Your /tmp/data on hdfs is a file, not a directory. So, when you did the copy for the first time

tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/

wordFile.txt was copied to /tmp and renamed to "data". That's why the second time the command complains that the file exists, because by default "-put" or "-copyFromLocal" don't owerwrite target files. You can force overwrite by adding "-f":

tmp]$ hdfs dfs -copyFromLocal -f wordFile.txt /tmp/data/

If you copy to a directory, than the original file name will be preserved:

tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp

will create /tmp/wordFile.txt on hdfs.

avatar
Contributor

@Predrag Minovic Thank you sir. It worked.