Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS path does not exist with SparkSession object when spark master is set as LOCAL

HDFS path does not exist with SparkSession object when spark master is set as LOCAL

Contributor

I am trying to load a dataset into Hive table using Spark.

But when I try to load the file from HDFS directory to Spark, I get the exception:

 

org.apache.spark.sql.AnalysisException: Path does not exist: file:/home/cloudera/partfile;

These are the steps before loading the file.

 

val wareHouseLocation = "file:${system:user.dir}/spark-warehouse"
val SparkSession = SparkSession.builder.master("spark://localhost:7077") \
    .appName("SparkHive") \
    .enableHiveSupport() \
    .config("hive.exec.dynamic.partition", "true") \
    .config("hive.exec.dynamic.partition.mode","nonstrict") \
    .config("hive.metastore.warehouse.dir","/user/hive/warehouse") \
    .config("spark.sql.warehouse.dir",wareHouseLocation).getOrCreate()
import sparkSession.implicits._
val partf = sparkSession.read.textFile("partfile")

Exception for the statement ->

val partf = sparkSession.read.textFile("partfile")

org.apache.spark.sql.AnalysisException: Path does not exist: file:/home/cloudera/partfile;

But I have the file in my home directory of HDFS.

hadoop fs -ls
Found 1 items
-rw-r--r--   1 cloudera cloudera         58 2017-06-30 02:23 partfile

My spark version is 2.0.2

Could anyone tell me how to fix it ?

1 REPLY 1

Re: HDFS path does not exist with SparkSession object when spark master is set as LOCAL

Champion

What user are you runinig the spark ? 

 

is this path that you are referring is in  hdfs or local 

 

/home/cloudera/partfile

 

 

perform this and let me know if the files are getting listed 

 

hadoop fs -ls /home/cloudera/