I am doing a scala coding. Here I want to load a .CSV file in the scala code and the file is in hdfs. I put the file into hdfs using CopyFromLocal command. Then I can also see that the file is uploaded in hadoop using the hadoop fs -ls demo/dataset.csv. Here demo is a directory which I created in hadoop and dataset.csv is the file that contains the actual data. Now I want to load this file in scala program. My problem is that when I mention the path as demo/dataset.csv and run the command I get the error as file not found. I don't know how to specify the path of the file. Can anyone help me by saying how should I give the path so that it will access the file that is inside hadoop. Please
use the whole absolute path and try:
you can find the absolute path core-site.xml and look for fs.defaultFS
Also make sure your file in in root path because you mentioned "demo/dataset.csv" and not "/demo/dataset/csv", if it is not then it hsould be in the user home directory like "/user/yourusername/demo/dataset.csv".
nn is namenode, you will find it in the core-site.xml properties under fs.defaultFS. but i think your issue as i mentioned earlier is you have saved your file without '/' in the beginning of "demo" directory, it got saved into the user home, look at the output of "hdfs dfs -ls demo/data.csv and it will display the user home it is in, use either that or mv it to the root like this:
hdfs dfs -mv demo/dataset.csv /demo/dataset.csv
make sure your /demo directory exists, if not create it:
hdfs dfs -mkdir /demo
Hope this helps.
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file path?????")
You need to understand the HDFS directory structure. This is the one which is causing issues to you. Follows some explanation.
Let's say the username for these example commands is ash.
So when ash tries to create a directory in HDFS with the following command
/user/ashhadoop fs -mkdir demo //This creates a directory inside HDFS directory //The complete directory path shall be /user/ash/demo
it is different than the command given below
hadoop fs -mkdir /demo //This creates a directory in the root directory. //The complete directory path shall be /demo
So a suggestion here is, whenever you try to access the directories, use the absolute path(s) to avoid the confusion. So in this case, when you create a directory using
hadoop fs -mkdir demo
and loads the file to HDFS using
hadoop fs -copyFromLocal dataset.csv demo
You file exists at
/user/ash/demo/dataset.csv //Not at /demo
So your reference to your spark code for this file should be
Hope this helps!