Support Questions
Find answers, ask questions, and share your expertise

Failed to run Zeppelin notebook demo Spark Streaming

I tried to run the note book demo available on Zeppelin in Hortonworks sandbox 2.4 (Notebook named twitter) to learn SparkStreaming. According the instruction on the top of notebook (/* BEFORE START....), I logged on Ambari to modify the configuration of Yarn service.

- CPU => Container: Minimum Container Size (VCores) 4; Maximum Container Size (Vcores): 8

- Memory

+ Node: 2250MB

+ Container: Minimum Container Size: 768MB; Maximum Container Size: 2250MB

All services are restarted after modifying but when I came back to Zeppelin to run the notebook, the second paragraph (/* UPDATE YOUR TWITTER CREDENTIALS */....) was always on the state "running" but never "finished". All twitter credentials are already updated.

P/S: without modifying the YARN configuration, I could run the second paragraph, but when running the 3rd, It was always "running" but never "finished"

Thanks for any suggestions


@Minh-Hieu PHAM

On how much RAM you are running this sandbox?

It's by default of Sandbox 8192Mb of RAM and 4 Processeurs

Expert Contributor

Are you sure you have enough spark executors?

See spark doc - Points to remember

what kind of executor you want to talk about?

Expert Contributor

@Minh-Hieu PHAM

The doc said:

When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using a input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).

I have only played around with yarn for a bit but say for example if you only have 2GB on a machine and set Minimum Container Size (Memory) to e.g. 1.5GB then you are likely to have only 1 container that will be used by your driver, hence no executor to consume the data.

Thank you for your response. Could you please tell me which configuration on Ambari you have done? (I run the Hortonwork Sandbox on a Mac Pro 16GB using VirtualBox and I set 8GB for the Sandbox). Thank so much!

Expert Contributor

I guess the first thing is to find out if yarn has enough resources to start executor, perhaps a check on the sparkHistoryServer on if you are running the sandbox to check if executors are progressing or not?

I have a look at this address and see that there is a executor in progress. Could you please give me your configuration for your sandbox?