Created on 03-24-2014 06:48 AM - edited 09-16-2022 01:55 AM
Hi,
I am a newbie to Apache Spark.
I have installed CDH-5 using parcels (Beta 2 Version) and installed Spark also
As per the Spark installation documentation, http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/c..., it is said,
" Note:
So, if YARN in CDH-5 does not support Spark, how do we run Spark in CDH-5?
Please let me know and also proivde any documentation if available.
Thanks!
Created 03-24-2014 06:51 AM
At the moment, CDH5b2 deploys Spark in "standalone" mode: https://spark.apache.org/docs/0.9.0/spark-standalone.html
This simply means Spark tries to manage resources itself, rather than participating in a cluster manager like YARN or Mesos. As an end user, it shouldn't make much difference to you at all. Just fire up the shell and go.
Once a few kinks are worked out, Spark's YARN integration will be used in the future, as I understand.
Created 03-25-2014 04:29 AM
Are you on CDH5 beta 2? It already includes Spark. I wonder if its setup of Spark is interfering with whatever you have installed separately, or vice versa. Can you simply use the built-in deployment? It would be easier.
Created 11-18-2014 07:22 PM
Thanks for the help,
I follow the instruction and get this error:
Error: Cannot load main class from JAR: file:/var/lib/hadoop-hdfs/class
Can you give any advise ?
Thanks !
Created 11-19-2014 12:03 AM
That sounds like a bad command line. I don't see that path in the instructions either. Check that you are following the instructions for 5.2 in the previous link.
Created 11-19-2014 12:18 AM
Thanks for your reply sowen,
I'm just trying with another link: https://spark.apache.org/docs/1.1.0/running-on-yarn.html and it work.
I got the result:
14:52:41 INFO Client: Application report from ResourceManager:
application identifier: application_1416365742014_0003
appId: 3
clientToAMToken: null
appDiagnostics:
appMasterHost: 01slave.mabu.com
appQueue: root.root
appMasterRpcPort: 0
appStartTime: 1416383498088
yarnAppState: FINISHED
distributedFinalState: SUCCEEDED
appTrackingUrl: http://00master.mabu.com:8088/proxy/application_1416365742014_0003/history/spark-pi-1416383528301
appUser: root
Problem is i can't find where the result of Pi is like when we run Pi example on Hadoop (it'll print the resutl 3.14333...) , where can i find it ?
Thanks !
Created 11-19-2014 01:08 AM
Yes, in that example you are clearly running on YARN. So you see it in the history, right?
It looks like the example uses yarn-cluster mode, which means the driver was launched on YARN, not locally. The output will be on the YARN container that had the driver.
Try yarn-client instead to make your local process the driver and it should print the result on your console.
Created 11-19-2014 01:46 AM
Thanks again owen,
The example go well, i can see the Pi result now, still got some error :
WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(03slave.mabu.com,42930) not found
WARN ConnectionManager: All connections not cleaned up.
Don't know if it's because of the poor connection or the amount of RAM on my cluster, but this is still a good start for me anyway.
By the way, do you know where i can find more information about Spark system ( how it work, it's operation, when to user yarn-clsuter/yarn-client ...).
Thanks alot !
Created 11-19-2014 01:53 AM