Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

In Zeppelin loading a simple TextFile where do I put a file so Zeppelin will see it using a Spark TextFile read?

New Contributor

Sandbox 2.5 on Virtualbox 5.1.12 on a Windows 10 machine.

I am trying to load a text file using Spark in Scala and I am not sure where to place the files so they can be seen in Zeppelin. Is there a good tutorial to familiarize me with the access for Zeppelin? I have an SSH window open using the and can access the file system on the virtualbox but not sure where Zeppelin will be looking to read a file. I am not super saavy at Linux so working my way through.

The error I get is:

markFIle: org.apache.spark.rdd.RDD[string] = cdrs.txt MapPartitionsRDD[37] at textFile at <console.:31

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:// 8020/user/zeppelin/cdrs.txt

I have gone through some of the tutorials but have not seen anything related to how Zeppelin uses hdfs to read files verses me using the SSH to the VirtualBox at root to locate files.


Rising Star

In Zepplin you can use:

hdfs dfs -ls /user/zeppelin

uid=503(zeppelin) gid=501(hadoop) groups=501(hadoop)

So this user you can use local or store it on hdfs at this users home dir: /user/zeppelin

New Contributor

How do I get a file into that directory? Forgive my inexperience.

Rising Star

I suggest following this tutorial, it show how to load data and copy files...

Super Collaborator

In general Zeppelin is running on the Zeppelin server machine in the cluster. So it cannot access local files from the users host machine.

The typical thing to do is to upload the file into HDFS and use the HDFS path in %spark notebook code to read the file using Spark.