Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Failed to run Zeppelin notebook demo Spark Streaming

Failed to run Zeppelin notebook demo Spark Streaming

New Contributor

I tried to run the note book demo available on Zeppelin in Hortonworks sandbox 2.4 (Notebook named twitter) to learn SparkStreaming. According the instruction on the top of notebook (/* BEFORE START....), I logged on Ambari to modify the configuration of Yarn service.

- CPU => Container: Minimum Container Size (VCores) 4; Maximum Container Size (Vcores): 8

- Memory

+ Node: 2250MB

+ Container: Minimum Container Size: 768MB; Maximum Container Size: 2250MB

All services are restarted after modifying but when I came back to Zeppelin to run the notebook, the second paragraph (/* UPDATE YOUR TWITTER CREDENTIALS */....) was always on the state "running" but never "finished". All twitter credentials are already updated.

P/S: without modifying the YARN configuration, I could run the second paragraph, but when running the 3rd, It was always "running" but never "finished"

Thanks for any suggestions

8 REPLIES 8

Re: Failed to run Zeppelin notebook demo Spark Streaming

@Minh-Hieu PHAM

On how much RAM you are running this sandbox?

Re: Failed to run Zeppelin notebook demo Spark Streaming

New Contributor

It's by default of Sandbox 8192Mb of RAM and 4 Processeurs

Re: Failed to run Zeppelin notebook demo Spark Streaming

Expert Contributor

Are you sure you have enough spark executors?

See spark doc - Points to remember

Re: Failed to run Zeppelin notebook demo Spark Streaming

New Contributor

what kind of executor you want to talk about?

Re: Failed to run Zeppelin notebook demo Spark Streaming

Expert Contributor

@Minh-Hieu PHAM

The doc said:

When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using a input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).

I have only played around with yarn for a bit but say for example if you only have 2GB on a machine and set Minimum Container Size (Memory) to e.g. 1.5GB then you are likely to have only 1 container that will be used by your driver, hence no executor to consume the data.

Re: Failed to run Zeppelin notebook demo Spark Streaming

New Contributor

Thank you for your response. Could you please tell me which configuration on Ambari you have done? (I run the Hortonwork Sandbox on a Mac Pro 16GB using VirtualBox and I set 8GB for the Sandbox). Thank so much!

Highlighted

Re: Failed to run Zeppelin notebook demo Spark Streaming

Expert Contributor

I guess the first thing is to find out if yarn has enough resources to start executor, perhaps a check on the sparkHistoryServer on http://sandbox.hortonworks.com:18080/ if you are running the sandbox to check if executors are progressing or not?

Re: Failed to run Zeppelin notebook demo Spark Streaming

New Contributor

I have a look at this address and see that there is a executor in progress. Could you please give me your configuration for your sandbox?