Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How do I specify the path to access my hadoop file in Hortonworks?

avatar
Explorer

I am doing a scala coding. Here I want to load a .CSV file in the scala code and the file is in hdfs. I put the file into hdfs using CopyFromLocal command. Then I can also see that the file is uploaded in hadoop using the hadoop fs -ls demo/dataset.csv. Here demo is a directory which I created in hadoop and dataset.csv is the file that contains the actual data. Now I want to load this file in scala program. My problem is that when I mention the path as demo/dataset.csv and run the command I get the error as file not found. I don't know how to specify the path of the file. Can anyone help me by saying how should I give the path so that it will access the file that is inside hadoop. Please

9 REPLIES 9

avatar
Expert Contributor

@Aishwarya Sudhakar Please can you share the command with us?

avatar
Explorer

Yes I have posted it @Sandeep Kumar

avatar
Explorer

@Sandeep Kumar

avatar
Expert Contributor

@Aishwarya Sudhakar

use the whole absolute path and try:

sc.textFile("hdfs://nn:8020/demo/dataset.csv")

you can find the absolute path core-site.xml and look for fs.defaultFS

Also make sure your file in in root path because you mentioned "demo/dataset.csv" and not "/demo/dataset/csv", if it is not then it hsould be in the user home directory like "/user/yourusername/demo/dataset.csv".

avatar
Explorer

What is the meaning of nn?

avatar
Expert Contributor

@Aishwarya Sudhakar

nn is namenode, you will find it in the core-site.xml properties under fs.defaultFS. but i think your issue as i mentioned earlier is you have saved your file without '/' in the beginning of "demo" directory, it got saved into the user home, look at the output of "hdfs dfs -ls demo/data.csv and it will display the user home it is in, use either that or mv it to the root like this:

hdfs dfs -mv demo/dataset.csv /demo/dataset.csv

make sure your /demo directory exists, if not create it:

hdfs dfs -mkdir /demo

Hope this helps.

avatar
Explorer

@SandeepKumar

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file path?????")

@Sandeep Kumar

avatar
Explorer

@Sandeep Kumar

avatar
@Aishwarya Sudhakar

You need to understand the HDFS directory structure. This is the one which is causing issues to you. Follows some explanation.

Let's say the username for these example commands is ash.

So when ash tries to create a directory in HDFS with the following command

/user/ashhadoop fs -mkdir demo

//This creates a directory inside HDFS directory 
//The complete directory path shall be /user/ash/demo

it is different than the command given below

hadoop fs -mkdir /demo

//This creates a directory in the root directory.
//The complete directory path shall be /demo

So a suggestion here is, whenever you try to access the directories, use the absolute path(s) to avoid the confusion. So in this case, when you create a directory using

hadoop fs -mkdir demo

and loads the file to HDFS using

hadoop fs -copyFromLocal dataset.csv demo

You file exists at

/user/ash/demo/dataset.csv

//Not at /demo

So your reference to your spark code for this file should be

sc.textFile("hdfs://user/ash/demo/dataset.csv")

Hope this helps!