Support Questions

mark_waldal · ‎12-28-2016

Sandbox 2.5 on Virtualbox 5.1.12 on a Windows 10 machine.

I am trying to load a text file using Spark in Scala and I am not sure where to place the files so they can be seen in Zeppelin. Is there a good tutorial to familiarize me with the access for Zeppelin? I have an SSH window open using the 127.0.0.1:4200 and can access the file system on the virtualbox but not sure where Zeppelin will be looking to read a file. I am not super saavy at Linux so working my way through.

The error I get is:

markFIle: org.apache.spark.rdd.RDD[string] = cdrs.txt MapPartitionsRDD[37] at textFile at <console.:31

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com: 8020/user/zeppelin/cdrs.txt

I have gone through some of the tutorials but have not seen anything related to how Zeppelin uses hdfs to read files verses me using the SSH to the VirtualBox at root to locate files.

Eran · ‎12-28-2016

In Zepplin you can use:

%sh 
id 
pwd
hdfs dfs -ls /user/zeppelin

uid=503(zeppelin) gid=501(hadoop) groups=501(hadoop)
/home/zeppelin

So this user you can use local or store it on hdfs at this users home dir: /user/zeppelin

mark_waldal · ‎12-28-2016

How do I get a file into that directory? Forgive my inexperience.

Eran · ‎12-28-2016

I suggest following this tutorial, it show how to load data and copy files...

http://hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/

bikas · ‎12-28-2016

In general Zeppelin is running on the Zeppelin server machine in the cluster. So it cannot access local files from the users host machine.

The typical thing to do is to upload the file into HDFS and use the HDFS path in %spark notebook code to read the file using Spark.

Cloudera Community

Support Questions

In Zeppelin loading a simple TextFile where do I put a file so Zeppelin will see it using a Spark TextFile read?