Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎06-03-2018

pleaes suggest what is hdfs path to load the file in to spark

[ Edited ]

12345.PNG

123.PNG1234.PNGunable to load the file tried to use hdfspath ? using cloudera 5.13.0

 

8/06/03 13:19:45 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context available as sc (master = local[*], app id = local-1528057185702).
18/06/03 13:19:52 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
SQL context available as sqlContext.

scala> val textFile = sc.textFile("hdfs:/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs:/home/cloudera/Documents/ard.txt MapPartitionsRDD[1] at textFile at <console>:27

scala> count.reduceByKey(_+_).saveAsTextFile("hdfs:/home/cloudera/Documents/1t.txt");
<console>:26: error: ambiguous reference to overloaded definition,
both method count in object functions of type (columnName: String)org.apache.spark.sql.TypedColumn[Any,Long]
and  method count in object functions of type (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
match expected type ?
              count.reduceByKey(_+_).saveAsTextFile("hdfs:/home/cloudera/Documents/1t.txt");
              ^

scala> val textFile = sc.textFile("hdfs:/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs:/home/cloudera/Documents/ard.txt MapPartitionsRDD[3] at textFile at <console>:27

scala> val textFile = sc.textFile("hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt")
textFile: org.apache.spark.rdd.RDD[String] = hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt MapPartitionsRDD[5] at textFile at <console>:27

scala> val count=textFile.flatMap(line=>line.split(" ").map(word=>(word,1)))
count: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at flatMap at <console>:29

scala> count.reduceByKey(_+_).saveAsTextFile("hdfs://quickstart.cloudera.8020/home/cloudera/Documents/2");
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/home/cloudera/Documents/ard.txt
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
	

tus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)

Announcements