Member since
06-09-2014
5
Posts
0
Kudos Received
0
Solutions
05-15-2015
05:27 AM
Hi I'm newbie in spark. Please help me. I try to run simple script in spark-shell: import org.apache.spark.SparkFiles;
val inFile = sc.textFile(SparkFiles.get("test.data"));
inFile.first(); but on inFile.first() i got exception org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hdp-7:8020/tmp/spark-60b9bde7-d198-4a90-8f90-02e9cf77fa04/test.data there is no such directory in HDFS, but i have directory on local fs : /tmp/spark-60b9bde7-d198-4a90-8f90-02e9cf77fa04 with 0 files inside. I suppose the troubles in spark-shell startup - i got line in start log: 15/05/15 16:08:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d0ea3c3a-db92-43de-bc3d-6e6a6fd415f2 It seems work directory created locally, but when i try get access to RDD it try get it from HDFS: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hdp-7:8020/tmp/spark-60b9bde7-d198-4a90-8f90-02e9cf77fa04/test.data Cloudera Express 5.3.2, spark was installed as yarn application in Cloudera Manager console. full log underline: [root@hdp-16 ~]# spark-shell 15/05/15 16:08:19 INFO SecurityManager: Changing view acls to: root 15/05/15 16:08:19 INFO SecurityManager: Changing modify acls to: root 15/05/15 16:08:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/05/15 16:08:19 INFO HttpServer: Starting HTTP Server 15/05/15 16:08:19 INFO Utils: Successfully started service 'HTTP class server' on port 39187. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.2.0-SNAPSHOT /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. 15/05/15 16:08:24 INFO SecurityManager: Changing view acls to: root 15/05/15 16:08:24 INFO SecurityManager: Changing modify acls to: root 15/05/15 16:08:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/05/15 16:08:24 INFO Slf4jLogger: Slf4jLogger started 15/05/15 16:08:24 INFO Remoting: Starting remoting 15/05/15 16:08:24 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@hdp-16:51885] 15/05/15 16:08:24 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@hdp-16:51885] 15/05/15 16:08:24 INFO Utils: Successfully started service 'sparkDriver' on port 51885. 15/05/15 16:08:24 INFO SparkEnv: Registering MapOutputTracker 15/05/15 16:08:24 INFO SparkEnv: Registering BlockManagerMaster 15/05/15 16:08:24 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150515160824-7963 15/05/15 16:08:24 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/05/15 16:08:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d0ea3c3a-db92-43de-bc3d-6e6a6fd415f2 15/05/15 16:08:24 INFO HttpServer: Starting HTTP Server 15/05/15 16:08:24 INFO Utils: Successfully started service 'HTTP file server' on port 33870. 15/05/15 16:08:25 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/05/15 16:08:25 INFO SparkUI: Started SparkUI at http://hdp-16:4040 15/05/15 16:08:25 INFO Executor: Using REPL class URI: http://192.168.91.142:39187 15/05/15 16:08:25 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@hdp-16:51885/user/HeartbeatReceiver 15/05/15 16:08:25 INFO NettyBlockTransferService: Server created on 40784 15/05/15 16:08:25 INFO BlockManagerMaster: Trying to register BlockManager 15/05/15 16:08:25 INFO BlockManagerMasterActor: Registering block manager localhost:40784 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 40784) 15/05/15 16:08:25 INFO BlockManagerMaster: Registered BlockManager 15/05/15 16:08:26 INFO EventLoggingListener: Logging events to hdfs://hdp-7:8020/user/spark/applicationHistory/local-1431691705159 15/05/15 16:08:26 INFO SparkILoop: Created spark context.. Spark context available as sc. scala> import org.apache.spark.SparkFiles; import org.apache.spark.SparkFiles scala> val inFile = sc.textFile(SparkFiles.get("test.data")); 15/05/15 16:08:33 INFO MemoryStore: ensureFreeSpace(258986) called with curMem=0, maxMem=278302556 15/05/15 16:08:33 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 252.9 KB, free 265.2 MB) 15/05/15 16:08:33 INFO MemoryStore: ensureFreeSpace(21113) called with curMem=258986, maxMem=278302556 15/05/15 16:08:33 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.6 KB, free 265.1 MB) 15/05/15 16:08:33 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40784 (size: 20.6 KB, free: 265.4 MB) 15/05/15 16:08:33 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/05/15 16:08:33 INFO SparkContext: Created broadcast 0 from textFile at <console>:13 inFile: org.apache.spark.rdd.RDD[String] = /tmp/spark-60b9bde7-d198-4a90-8f90-02e9cf77fa04/test.data MappedRDD[1] at textFile at <console>:13 scala> inFile.first(); org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://hdp-7:8020/tmp/spark-60b9bde7-d198-4a90-8f90-02e9cf77fa04/test.data at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.RDD.take(RDD.scala:1060) at org.apache.spark.rdd.RDD.first(RDD.scala:1093) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:16) at $iwC$$iwC$$iwC.<init>(<console>:21) at $iwC$$iwC.<init>(<console>:23) at $iwC.<init>(<console>:25) at <init>(<console>:27) at .<init>(<console>:31) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Do you have any ideas?
... View more