Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pleaes suggest what is hdfs path to load the file in to spark

pleaes suggest what is hdfs path to load the file in to spark

New Contributor

12345.PNG

123.PNG1234.PNGunable to load the file tried to use hdfspath ? using cloudera 5.13.0

 

8/06/03 13:19:45 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context available as sc (master = local[*], app id = local-1528057185702).
18/06/03 13:19:52 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
SQL context available as sqlContext.

scala> val textFile = sc.textFile("hdfs:/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs:/home/cloudera/Documents/ard.txt MapPartitionsRDD[1] at textFile at <console>:27

scala> count.reduceByKey(_+_).saveAsTextFile("hdfs:/home/cloudera/Documents/1t.txt");
<console>:26: error: ambiguous reference to overloaded definition,
both method count in object functions of type (columnName: String)org.apache.spark.sql.TypedColumn[Any,Long]
and  method count in object functions of type (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
match expected type ?
              count.reduceByKey(_+_).saveAsTextFile("hdfs:/home/cloudera/Documents/1t.txt");
              ^

scala> val textFile = sc.textFile("hdfs:/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs:/home/cloudera/Documents/ard.txt MapPartitionsRDD[3] at textFile at <console>:27

scala> val textFile = sc.textFile("hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt MapPartitionsRDD[5] at textFile at <console>:27

scala> val count=textFile.flatMap(line=>line.split(" ").map(word=>(word,1)))
count: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at flatMap at <console>:29

scala> count.reduceByKey(_+_).saveAsTextFile("hdfs://quickstart.cloudera.8020/home/cloudera/Documents/2");
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
	

tus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)

Don't have an account?
Coming from Hortonworks? Activate your account here