Reply
Explorer
Posts: 25
Registered: ‎12-10-2014

i am trying to load a file from hdfs using scala-shell, with errors

I am trying to load a file from HDFS using scala-shell. I have load into HDFS the file using the next command:

 

[cloudera@quickstart labfiles]$ hadoop fs -copyFromLocal README.md /user/cloudera/README.md
[cloudera@quickstart labfiles]$ hadoop fs -ls /user/cloudera
Found 3 items
-rw-r--r--   1 cloudera cloudera       4811 2015-08-05 08:05 /user/cloudera/README.md
-rw-r--r--   1 cloudera cloudera    6538420 2015-08-04 09:34 /user/cloudera/downloadFiles.mp4
-rw-r--r--   1 cloudera cloudera       1770 2015-08-04 10:02 /user/cloudera/readme.txt

but when i try to load the file from HDFS using scala...

 

scala> val README = sc.textFile("/user/cloudera/README.md")
2015-08-05 08:43:13,897 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(246950) called with curMem=3500967, maxMem=280248975
2015-08-05 08:43:13,897 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_28 stored as values in memory (estimated size 241.2 KB, free 263.7 MB)
2015-08-05 08:43:13,934 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(19465) called with curMem=3747917, maxMem=280248975
2015-08-05 08:43:13,935 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_28_piece0 stored as bytes in memory (estimated size 19.0 KB, free 263.7 MB)
2015-08-05 08:43:13,935 INFO  [sparkDriver-akka.actor.default-dispatcher-3] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_28_piece0 in memory on localhost:46000 (size: 19.0 KB, free: 267.0 MB)
2015-08-05 08:43:13,935 INFO  [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_28_piece0
2015-08-05 08:43:13,937 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 28 from textFile at <console>:12
README: org.apache.spark.rdd.RDD[String] = /user/cloudera/README.md MappedRDD[35] at textFile at <console>:12

scala> README.count()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/cloudera/README.md
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
	at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
	at org.apache.spark.rdd.RDD.count(RDD.scala:910)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
	at $iwC$$iwC$$iwC.<init>(<console>:20)
	at $iwC$$iwC.<init>(<console>:22)
	at $iwC.<init>(<console>:24)
	at <init>(<console>:26)
	at .<init>(<console>:30)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
	at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala> 

I can see the file using HUE and i can access it using hdfs fs -ls /user/cloudera, so what is going on?

 

I am using cloudera-quickstart-vm-5.3.0.0-virtualbox in a macbook pro.

 

Thank you

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: i am trying to load a file from hdfs using scala-shell, with errors

Write "hdfs:///user/..." instead, to avoid ambiguity. It's referring to a local path right now.

Explorer
Posts: 25
Registered: ‎12-10-2014

Re: i am trying to load a file from hdfs using scala-shell, with errors

Yea, it works! thank you Sir.

scala> val README = sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/README.md")

scala> README.count()
2015-08-05 08:51:09,639 INFO [main] mapred.FileInputFormat (FileInputFormat.java:listStatus(247)) - Total input paths to process : 1
2015-08-05 08:51:09,643 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Starting job: count at <console>:15
2015-08-05 08:51:09,643 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Got job 15 (count at <console>:15) with 1 output partitions (allowLocal=false)
2015-08-05 08:51:09,643 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Final stage: Stage 15(count at <console>:15)
2015-08-05 08:51:09,643 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Parents of final stage: List()
2015-08-05 08:51:09,644 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Missing parents: List()
2015-08-05 08:51:09,644 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting Stage 15 (hdfs://quickstart.cloudera:8020/user/cloudera/README.md MappedRDD[39] at textFile at <console>:12), which has no missing parents
2015-08-05 08:51:09,646 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(2560) called with curMem=4304386, maxMem=280248975
2015-08-05 08:51:09,646 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_32 stored as values in memory (estimated size 2.5 KB, free 263.2 MB)
2015-08-05 08:51:09,647 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(1614) called with curMem=4306946, maxMem=280248975
2015-08-05 08:51:09,647 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_32_piece0 stored as bytes in memory (estimated size 1614.0 B, free 263.2 MB)
2015-08-05 08:51:09,647 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_32_piece0 in memory on localhost:46000 (size: 1614.0 B, free: 267.0 MB)
2015-08-05 08:51:09,647 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_32_piece0
2015-08-05 08:51:09,648 INFO [sparkDriver-akka.actor.default-dispatcher-3] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 32 from broadcast at DAGScheduler.scala:838
2015-08-05 08:51:09,649 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting 1 missing tasks from Stage 15 (hdfs://quickstart.cloudera:8020/user/cloudera/README.md MappedRDD[39] at textFile at <console>:12)
2015-08-05 08:51:09,649 INFO [sparkDriver-akka.actor.default-dispatcher-3] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Adding task set 15.0 with 1 tasks
2015-08-05 08:51:09,649 INFO [sparkDriver-akka.actor.default-dispatcher-14] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Starting task 0.0 in stage 15.0 (TID 15, localhost, ANY, 1319 bytes)
2015-08-05 08:51:09,650 INFO [Executor task launch worker-7] executor.Executor (Logging.scala:logInfo(59)) - Running task 0.0 in stage 15.0 (TID 15)
2015-08-05 08:51:09,657 INFO [Executor task launch worker-7] rdd.HadoopRDD (Logging.scala:logInfo(59)) - Input split: hdfs://quickstart.cloudera:8020/user/cloudera/README.md:0+4811
2015-08-05 08:51:09,665 INFO [Executor task launch worker-7] executor.Executor (Logging.scala:logInfo(59)) - Finished task 0.0 in stage 15.0 (TID 15). 1757 bytes result sent to driver
2015-08-05 08:51:09,670 INFO [sparkDriver-akka.actor.default-dispatcher-14] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Stage 15 (count at <console>:15) finished in 0.019 s
2015-08-05 08:51:09,670 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Job 15 finished: count at <console>:15, took 0.027246 s
res30: Long = 141

scala> 2015-08-05 08:51:09,670 INFO [task-result-getter-3] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 0.0 in stage 15.0 (TID 15) in 20 ms on localhost (1/1)
2015-08-05 08:51:09,671 INFO [task-result-getter-3] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet 15.0, whose tasks have all completed, from pool




Explorer
Posts: 25
Registered: ‎12-10-2014

Re: i am trying to load a file from hdfs using scala-shell, with errors

scala> val README = sc.textFile("hdfs:///user/cloudera/README.md")
2015-08-05 08:53:26,278 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(246950) called with curMem=4845564, maxMem=280248975
2015-08-05 08:53:26,278 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_36 stored as values in memory (estimated size 241.2 KB, free 262.4 MB)
2015-08-05 08:53:26,296 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(19465) called with curMem=5092514, maxMem=280248975
2015-08-05 08:53:26,297 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_36_piece0 stored as bytes in memory (estimated size 19.0 KB, free 262.4 MB)
2015-08-05 08:53:26,297 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_36_piece0 in memory on localhost:46000 (size: 19.0 KB, free: 266.9 MB)
2015-08-05 08:53:26,297 INFO [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_36_piece0
2015-08-05 08:53:26,298 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 36 from textFile at <console>:12
README: org.apache.spark.rdd.RDD[String] = hdfs:///user/cloudera/README.md MappedRDD[45] at textFile at <console>:12

scala> README.count()
2015-08-05 08:53:32,262 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.BlockManager (Logging.scala:logInfo(59)) - Removing broadcast 30
2015-08-05 08:53:32,262 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_30_piece0
2015-08-05 08:53:32,262 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_30_piece0 of size 1614 dropped from memory (free 275138610)
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed broadcast_30_piece0 on localhost:46000 in memory (size: 1614.0 B, free: 266.9 MB)
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_30_piece0
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_30
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-6] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_30 of size 2560 dropped from memory (free 275141170)
2015-08-05 08:53:32,263 INFO [Spark Context Cleaner] spark.ContextCleaner (Logging.scala:logInfo(59)) - Cleaned broadcast 30
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManager (Logging.scala:logInfo(59)) - Removing broadcast 32
2015-08-05 08:53:32,263 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_32
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_32 of size 2560 dropped from memory (free 275143730)
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_32_piece0
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_32_piece0 of size 1614 dropped from memory (free 275145344)
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-3] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed broadcast_32_piece0 on localhost:46000 in memory (size: 1614.0 B, free: 266.9 MB)
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-15] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_32_piece0
2015-08-05 08:53:32,264 INFO [Spark Context Cleaner] spark.ContextCleaner (Logging.scala:logInfo(59)) - Cleaned broadcast 32
2015-08-05 08:53:32,264 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManager (Logging.scala:logInfo(59)) - Removing broadcast 35
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_35
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_35 of size 2560 dropped from memory (free 275147904)
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManager (Logging.scala:logInfo(59)) - Removing block broadcast_35_piece0
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_35_piece0 of size 1614 dropped from memory (free 275149518)
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-16] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Removed broadcast_35_piece0 on localhost:46000 in memory (size: 1614.0 B, free: 266.9 MB)
2015-08-05 08:53:32,265 INFO [sparkDriver-akka.actor.default-dispatcher-14] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_35_piece0
2015-08-05 08:53:32,266 INFO [Spark Context Cleaner] spark.ContextCleaner (Logging.scala:logInfo(59)) - Cleaned broadcast 35
java.io.IOException: Incomplete HDFS URI, no host: hdfs:/user/cloudera/README.md
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:141)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.count(RDD.scala:910)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
at $iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC.<init>(<console>:22)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala>

It is necessary to use the full hdfs path:

scala> val README = sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/README.md")
2015-08-05 08:54:22,840 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(246950) called with curMem=5099457, maxMem=280248975
2015-08-05 08:54:22,841 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_37 stored as values in memory (estimated size 241.2 KB, free 262.2 MB)
2015-08-05 08:54:22,856 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(19465) called with curMem=5346407, maxMem=280248975
2015-08-05 08:54:22,858 INFO [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_37_piece0 stored as bytes in memory (estimated size 19.0 KB, free 262.1 MB)
2015-08-05 08:54:22,858 INFO [sparkDriver-akka.actor.default-dispatcher-5] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_37_piece0 in memory on localhost:46000 (size: 19.0 KB, free: 266.9 MB)
2015-08-05 08:54:22,858 INFO [main] storage.BlockManagerMaster (Logging.scala:logInfo(59)) - Updated info of block broadcast_37_piece0
2015-08-05 08:54:22,859 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 37 from textFile at <console>:12
README: org.apache.spark.rdd.RDD[String] = hdfs://quickstart.cloudera:8020/user/cloudera/README.md MappedRDD[47] at textFile at <console>:12

Thanks again
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: i am trying to load a file from hdfs using scala-shell, with errors

This indicates to me that it's not seeing your Hadoop cluster config.
Are you not running in your cluster / on an edge node?

Explorer
Posts: 25
Registered: ‎12-10-2014

Re: i am trying to load a file from hdfs using scala-shell, with errors

Hi srowen,

i am using a cloudera-quickstart-vm-5.3.0-virtualbox image, without any aditional settings from me, just unzip it and ready to go. Only one node running on my laptop.

Thank you for the assistance
Contributor
Posts: 55
Registered: ‎09-17-2013

Re: i am trying to load a file from hdfs using scala-shell, with errors

please check fs.default.name configuration, setting it properly might help i think, just in case if its not referring to namenode and port value.

Highlighted
New Contributor
Posts: 1
Registered: ‎03-30-2018

Re: i am trying to load a file from hdfs using scala-shell, with errors

I also have the same problem. But I am using Hortonworks and not Cloudera. I also get the same error when I try to upload the file. Can anyone tell me what is the procedure to load a file in Hortonworks. I have CSV file. I have put my file inside hadoop using CopyFromLocal command and it is inside demo/dataset.csv. now how do I mention the path for mine. Please help
Announcements