Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark on YARN in CDH-5

Solved Go to solution

Re: Spark on YARN in CDH-5

Contributor

Thanks for the help,

 

I follow the instruction and get this error:

Error: Cannot load main class from JAR: file:/var/lib/hadoop-hdfs/class

 

Can you give any advise ?

 

Thanks !

Re: Spark on YARN in CDH-5

Master Collaborator

That sounds like a bad command line. I don't see that path in the instructions either. Check that you are following the instructions for 5.2 in the previous link.

Re: Spark on YARN in CDH-5

Contributor

Thanks for your reply sowen,

 

I'm just trying with another link: https://spark.apache.org/docs/1.1.0/running-on-yarn.html and it work.

I got the result:

14:52:41 INFO Client: Application report from ResourceManager: 
           application identifier: application_1416365742014_0003
           appId: 3
           clientToAMToken: null
           appDiagnostics: 
           appMasterHost: 01slave.mabu.com
           appQueue: root.root
           appMasterRpcPort: 0
           appStartTime: 1416383498088
           yarnAppState: FINISHED
           distributedFinalState: SUCCEEDED
           appTrackingUrl: http://00master.mabu.com:8088/proxy/application_1416365742014_0003/history/spark-pi-1416383528301
           appUser: root

 

Problem is i can't find where the result of Pi is like when we run Pi example on Hadoop (it'll print the resutl 3.14333...) , where can i find it ?

 

Thanks !

 

Re: Spark on YARN in CDH-5

Master Collaborator

Yes, in that example you are clearly running on YARN. So you see it in the history, right?

 

It looks like the example uses yarn-cluster mode, which means the driver was launched on YARN, not locally. The output will be on the YARN container that had the driver.

 

Try yarn-client instead to make your local process the driver and it should print the result on your console.

Re: Spark on YARN in CDH-5

Contributor

Thanks again owen,

 

The example go well, i can see the Pi result now, still got some error :

WARN YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(03slave.mabu.com,42930) not found

WARN ConnectionManager: All connections not cleaned up.

 

Don't know if it's because of the poor connection or the amount of RAM on my cluster, but this is still a good start for me anyway.

By the way, do you know where i can find more information about Spark system ( how it work, it's operation,  when to user yarn-clsuter/yarn-client ...).

 

Thanks alot !

Re: Spark on YARN in CDH-5

Master Collaborator
It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.
Don't have an account?
Coming from Hortonworks? Activate your account here