Support Questions
Find answers, ask questions, and share your expertise

RDD take action doesnt work in Zeppelin

Explorer

I am trying to load a csv file into a RDD using textFile function in Zeppelin and then do a take(10). But the take does not produce any result in Zepplin while the same commands outputs rows in SSH (shell)

I have attached the file, my zepplin notebook and some screenshots. Can you please suggest me how to resolve this error? (HortonWorks Sandbox HDP2.5 on Microsoft Azure)

data : dodgers.zip

Zeppelin pras-playground.zip

13380-dodgers-screenshot.png

13381-dodgers-zeppelin-output.png


dodgers-screenshot.png
8 REPLIES 8

Explorer

One more info : I could the same things on txt file, just now csv's.

Any pointers?

This could be due to lack of sufficient memory.

How did you launch the spark-shell? Is it in YARN mode or in standalone?

Also how is Zeppelin's Spark interpreter configured? YARN or Standalone?

Explorer

@vshukla

This is the sandbox from hortonworks. so i suppose it is standalone mode; but how do i verify that?

Spark-shell - just launch Putty, ssh'd into root, issued pyspark to get to Spark, and issued command (rdd = sc.textFile(csv)) - Works like a charm

Zeppelin - exactly the same - used %pyspark interpreter . - doesnt work

My Azure VM (D12 v2) config is 4 cores, 28 GB RAM, 200GB HDD; My local VMWare sandbox has 16GB RAM, 8 cores and 1 TB HDDspace. Will this be not enough for Zeppelin?

Explorer

If it helps: Screenshot from Amabari for Zeppelin Config

13433-zeppelin.png

Explorer

@vshukla , i did some more digging and found that whenever i run the commands, Yarn-Memory in Ambari dashboard goes upto 90%. What could be wrong?

Explorer

Hi.. a gentle bump to the thread..

It likely is due to insufficient memory. You can try bumping up the memory allocated to Sandbox and also in sandbox shutdown the unneeded services.

Another option is to try out with Spark 2.1 in HDC https://hortonworks.com/blog/try-apache-spark-2-1-zeppelin-hortonworks-data-cloud/

Explorer

@vshukla here are the configs for my machines

Azure VM (D12 v2 config) is 4 cores, 28 GB RAM, 200GB HDD;

VMWare sandbox has 16GB RAM, 8 cores and 1 TB HDDspace.

Will this be not enough for Sandbox?

The size of file i am trying to upload - 1 MB. I am able to load 5 MB txt file just fine. Only for CSVs it cribs.

I will try out the HDC. Thanks for the link!