- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do I specify the path to access my hadoop file in Hortonworks?
Created on ‎03-30-2018 01:01 PM - edited ‎09-16-2022 06:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am doing a scala coding. Here I want to load a .CSV file in the scala code and the file is in hdfs. I put the file into hdfs using CopyFromLocal command. Then I can also see that the file is uploaded in hadoop using the hadoop fs -ls demo/dataset.csv. Here demo is a directory which I created in hadoop and dataset.csv is the file that contains the actual data. Now I want to load this file in scala program. My problem is that when I mention the path as demo/dataset.csv and run the command I get the error as file not found. I don't know how to specify the path of the file. Can anyone help me by saying how should I give the path so that it will access the file that is inside hadoop. Please
Created ‎03-30-2018 02:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aishwarya Sudhakar Please can you share the command with us?
Created ‎03-30-2018 03:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes I have posted it @Sandeep Kumar
Created ‎03-30-2018 04:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sandeep Kumar
Created ‎03-30-2018 03:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
use the whole absolute path and try:
sc.textFile("hdfs://nn:8020/demo/dataset.csv")
you can find the absolute path core-site.xml and look for fs.defaultFS
Also make sure your file in in root path because you mentioned "demo/dataset.csv" and not "/demo/dataset/csv", if it is not then it hsould be in the user home directory like "/user/yourusername/demo/dataset.csv".
Created ‎03-30-2018 03:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the meaning of nn?
Created ‎04-01-2018 02:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
nn is namenode, you will find it in the core-site.xml properties under fs.defaultFS. but i think your issue as i mentioned earlier is you have saved your file without '/' in the beginning of "demo" directory, it got saved into the user home, look at the output of "hdfs dfs -ls demo/data.csv and it will display the user home it is in, use either that or mv it to the root like this:
hdfs dfs -mv demo/dataset.csv /demo/dataset.csv
make sure your /demo directory exists, if not create it:
hdfs dfs -mkdir /demo
Hope this helps.
Created ‎03-30-2018 03:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@SandeepKumar
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("file path?????")
@Sandeep Kumar
Created ‎03-31-2018 09:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sandeep Kumar
Created ‎04-01-2018 04:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to understand the HDFS directory structure. This is the one which is causing issues to you. Follows some explanation.
Let's say the username for these example commands is ash.
So when ash tries to create a directory in HDFS with the following command
/user/ashhadoop fs -mkdir demo //This creates a directory inside HDFS directory //The complete directory path shall be /user/ash/demo
it is different than the command given below
hadoop fs -mkdir /demo //This creates a directory in the root directory. //The complete directory path shall be /demo
So a suggestion here is, whenever you try to access the directories, use the absolute path(s) to avoid the confusion. So in this case, when you create a directory using
hadoop fs -mkdir demo
and loads the file to HDFS using
hadoop fs -copyFromLocal dataset.csv demo
You file exists at
/user/ash/demo/dataset.csv //Not at /demo
So your reference to your spark code for this file should be
sc.textFile("hdfs://user/ash/demo/dataset.csv")
Hope this helps!
