Member since
03-30-2018
22
Posts
0
Kudos Received
0
Solutions
04-01-2018
05:15 PM
@Aishwarya Sudhakar Could you clarify which username under which you are running the spark under? Because of its distributed aspect, you should copy the dataset.csv to HDFS users directory which is accessible to that user running the spark job. According to your output above you file is HDFS directory /demo/demo/dataset.csv so your load should look like this load "hdfs:////demo/demo/dataset.csv" This is what you said. "The demo is the directory that is inside hadoop. And datset.csv is the file that contains data." Did you mean in HDFS? Does the command print anything $ hdfs dfs -cat /demo/demo/dataset.csv Please revert !
... View more
04-01-2018
04:05 PM
@Aishwarya Sudhakar You need to understand the HDFS directory structure. This is the one which is causing issues to you. Follows some explanation. Let's say the username for these example commands is ash. So when ash tries to create a directory in HDFS with the following command /user/ashhadoop fs -mkdir demo
//This creates a directory inside HDFS directory
//The complete directory path shall be /user/ash/demo it is different than the command given below hadoop fs -mkdir /demo
//This creates a directory in the root directory.
//The complete directory path shall be /demo So a suggestion here is, whenever you try to access the directories, use the absolute path(s) to avoid the confusion. So in this case, when you create a directory using hadoop fs -mkdir demo and loads the file to HDFS using hadoop fs -copyFromLocal dataset.csv demo You file exists at /user/ash/demo/dataset.csv
//Not at /demo So your reference to your spark code for this file should be sc.textFile("hdfs://user/ash/demo/dataset.csv") Hope this helps!
... View more
03-30-2018
01:04 PM
Is this method correct to specify the path of the hdfs file. Is this same for everyone
... View more