I tried to run the note book demo available on Zeppelin in Hortonworks sandbox 2.4 (Notebook named twitter) to learn SparkStreaming. According the instruction on the top of notebook (/* BEFORE START....), I logged on Ambari to modify the configuration of Yarn service.
- CPU => Container: Minimum Container Size (VCores) 4; Maximum Container Size (Vcores): 8
+ Node: 2250MB
+ Container: Minimum Container Size: 768MB; Maximum Container Size: 2250MB
All services are restarted after modifying but when I came back to Zeppelin to run the notebook, the second paragraph (/* UPDATE YOUR TWITTER CREDENTIALS */....) was always on the state "running" but never "finished". All twitter credentials are already updated.
P/S: without modifying the YARN configuration, I could run the second paragraph, but when running the 3rd, It was always "running" but never "finished"
Thanks for any suggestions
The doc said:
When running a Spark Streaming program locally, do not use “local” or “local” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using a input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark Properties for information on how to set the master).
I have only played around with yarn for a bit but say for example if you only have 2GB on a machine and set Minimum Container Size (Memory) to e.g. 1.5GB then you are likely to have only 1 container that will be used by your driver, hence no executor to consume the data.
Thank you for your response. Could you please tell me which configuration on Ambari you have done? (I run the Hortonwork Sandbox on a Mac Pro 16GB using VirtualBox and I set 8GB for the Sandbox). Thank so much!
I guess the first thing is to find out if yarn has enough resources to start executor, perhaps a check on the sparkHistoryServer on http://sandbox.hortonworks.com:18080/ if you are running the sandbox to check if executors are progressing or not?