Member since
03-22-2017
14
Posts
1
Kudos Received
0
Solutions
04-01-2017
04:35 AM
1 Kudo
We have a HDP 2.4 cluster.Spark version is 1.6.0. I have to convert large csv file of 1gb as dataframe. I couldnt able to when master is set as local.Could you tell me that how to launch spark with master as yarn client?and also explain how to convert large csv file as dataframe in sparkR?
... View more
Labels:
- Labels:
-
Apache Spark
03-24-2017
04:33 AM
data <- fread("/usr/bin/hadoop fs -text /path/to/the/file.csv"), fill=TRUE i used this command its working perfectly.whats the difference between both?
... View more
03-24-2017
04:22 AM
yeah i got it.thank yu for the reponse.
... View more
03-24-2017
03:49 AM
source ="com.databricks.spark.csv" is not available.
(spark version is 1.6
hdp 2.4.0.0-169
jvm/java-8-oracle)
is it possible to read without databricks?
how to add this file to cluster?
where will i get jar files or it has to include seperately?
... View more
03-23-2017
10:01 AM
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/ Spark context is available as sc, SQL context is available as sqlContext
> Sys.setenv(SPARK_HOME="/usr/hdp/2.3.4.0.-3485/spark/bin/sparkR/")
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","Lib"),.libPaths()))
> Sys.setenv(JAVA_HOME="/usr/lib/jvm/java-8-oracle/")
> library(SparkR)
> lines<-SparkR:::textFile(sc,"hdfs: /user/midhun/f.txt")
17/03/23 14:37:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 306.2 KB, free 306.2 KB)
17/03/23 14:37:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.1 KB, free 332.3 KB)
17/03/23 14:37:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39935 (size: 26.1 KB, free: 511.1 MB)
17/03/23 14:37:58 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
> words<-SparkR:::flatMap(lines,function(line){strsplit(line," ")[[1]]})
> wordcount<-SparkR:::lapply(words,function(word){list(word,1)})
> counts<-SparkR:::reduceByKey(wordcount,"+",numPartition=2)
> output<-collect(counts) 17/03/23 14:40:03 INFO SparkContext: Starting job: collect at NativeMethodAccessorImpl.java:-2
17/03/23 14:40:03 WARN DAGScheduler: Creating new stage failed due to exception - job: 0
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.api.r.BaseRRDD.getPartitions(RRDD.scala:47)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.api.r.BaseRRDD.getPartitions(RRDD.scala:47)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:226)
at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:224)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:224)
at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:386)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:398)
at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:299)
at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:334)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:837)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 44 more
17/03/23 14:40:03 INFO DAGScheduler: Job 0 failed: collect at NativeMethodAccessorImpl.java:-2, took 0.011744 s
17/03/23 14:40:03 ERROR RBackendHandler: collect on 17 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%20/user/midhun/f.txt
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(R
... View more
Labels:
- Labels:
-
Apache Spark