Created 03-23-2017 10:01 AM
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/
Spark context is available as sc, SQL context is available as sqlContext > Sys.setenv(SPARK_HOME="/usr/hdp/2.3.4.0.-3485/spark/bin/sparkR/") > .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","Lib"),.libPaths())) > Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-8-oracle/") > library(SparkR) > lines<-SparkR:::textFile(sc,"hdfs: /user/midhun/f.txt") 17/03/23 14:37:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 306.2 KB, free 306.2 KB) 17/03/23 14:37:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.1 KB, free 332.3 KB) 17/03/23 14:37:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39935 (size: 26.1 KB, free: 511.1 MB) 17/03/23 14:37:58 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 > words<-SparkR:::flatMap(lines,function(line){strsplit(line," ")[[1]]}) > wordcount<-SparkR:::lapply(words,function(word){list(word,1)}) > counts<-SparkR:::reduceByKey(wordcount,"+",numPartition=2) > output<-collect(counts)
17/03/23 14:40:03 INFO SparkContext: Starting job: collect at NativeMethodAccessorImpl.java:-2 17/03/23 14:40:03 WARN DAGScheduler: Creating new stage failed due to exception - job: 0 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.api.r.BaseRRDD.getPartitions(RRDD.scala:47) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.api.r.BaseRRDD.getPartitions(RRDD.scala:47) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:226) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:224) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.dependencies(RDD.scala:224) at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:386) at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:398) at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:299) at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:334) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:837) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt at java.net.URI.checkPath(URI.java:1823) at java.net.URI.<init>(URI.java:745) at org.apache.hadoop.fs.Path.initialize(Path.java:202) ... 44 more 17/03/23 14:40:03 INFO DAGScheduler: Job 0 failed: collect at NativeMethodAccessorImpl.java:-2, took 0.011744 s 17/03/23 14:40:03 ERROR RBackendHandler: collect on 17 failed Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(R
Created 03-23-2017 04:09 PM
You should have posted the code you are trying to run, and how you are trying to run (how you submit the job into Spark).
Without that it is harder to get an answer.
From the error I can see you are trying to run wordcount on this file: hdfs:%20/user/midhun/f.txt
Have you tried something like this?
hdfs dfs -put /user/midhun/f.txt spark-submit --class com.cloudera.sparkwordcount.SparkWordCount \ --master local --deploy-mode client --executor-memory 1g \ --name wordcount --conf "spark.app.id=wordcount" \ sparkwordcount-1.0-SNAPSHOT-jar-with-dependencies.jar hdfs://namenode_host:8020/user/midhun/f.txt 2
Created 03-24-2017 04:22 AM
yeah i got it.thank yu for the reponse.