Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

upgrade spark from 1.2 to 1.3

upgrade spark from 1.2 to 1.3

I am using the latest version of CDH 5.3.2 with Spark 1.2. Apache came out with 1.3 today. I would like to upgrade my spark. Does anybody have any suggestion on how I could do it?

 

Thank You

13 REPLIES 13

Re: upgrade spark from 1.2 to 1.3

Master Collaborator

If you're using Spark on YARN, you don't really need to upgrade anything, since Spark is just another app that runs on YARN. You can run Spark 1.3.0 regardless of what is installed on the cluster. Just get a distribution onto one of the machines and invoke scripts/binaries from the 1.3.0 distribution.

 

CDH 5.4 is coming pretty soon and has 1.3.0, too.

Re: upgrade spark from 1.2 to 1.3

New Contributor

Hi Sowen,

 

I am still doing POC on a product and I am trying to do benchmarks for time to execution based upon a custom build of Spark. I can execute as a Yarn app as you had mentioned.

 

Can you tell me what performance benefits I may be giving up by not attempting to properly install this custom Spark build on the cluster and simply using YARN instead?

 

Thanks!

Stunos

Re: upgrade spark from 1.2 to 1.3

Rising Star

Just to clarify!

Installing Spark 1.3 will be another 'service' presented to CM.

In other words when I launch CM GUI there will be 2 Spark 'services', one for ver 1.2 and another for ver 1.3

As far as the command line, I have to source the proper environment in order to run the respective Spark Shell.

Is that correct?

 

Thank you.

Highlighted

Re: upgrade spark from 1.2 to 1.3

Master Collaborator

I don't think there's any performance difference. The main difference is simply that the custom build isn't supported or necessarily using the config established by CM. No, it does not become another service in CM. The CM service "Spark" is for standalone mode (not YARN), and the History server.

Re: upgrade spark from 1.2 to 1.3

Rising Star

Thank you for your reply!

1) If Spark 1.3 is not presented as a new service into CM, then the 'Spark Service' within CM will be which version??

2) Should BOTH Spark 1.2 & Spark 1.3 co-exist at the OS-level? Then you source the appropriate env??

3) Since you mentioned YARN's JobHistory Server.

    A Spark job runs into the HDFS directory /tmp/logs/<user-id>/logs/ but it does NOT write out to /user/history/done & /user/history/done_intermediate folders!!!

    Is this because the Spark job runs in Standalone Mode and not in YARN??

 

Thank you!

 

Re: upgrade spark from 1.2 to 1.3

Master Collaborator

I'm not sure what you mean. You are talking about running some custom build of Spark 1.3. You're on your own there. It has no relation to CM of course. CDH ships Spark 1.3. You should use that unless you have a reason not to. CDH itself has only one version of Spark "installed" and you should not modify it. This is presented in CM. I'm not sure what you mean about the logs. You can run Spark standalone or YARN mode. It doesn't make you run one or the other.

Re: upgrade spark from 1.2 to 1.3

New Contributor
I'm the one with the custom build and I am satisfied already.
I think TS can get fully supported Spark 1.3 just by upgrading CDH.

Re: upgrade spark from 1.2 to 1.3

Master Collaborator

Oh, didn't even notice that. TS you should start a new thread to avoid confusion.

Re: upgrade spark from 1.2 to 1.3

Rising Star

Thank you again!

Maybe I didn't make myself clear!

I have currently installed Spark 1.2, it came with CDH 5.3.1.

It also exists a Spark Service within CM (this service relates back to Spark 1.2).

Now, I am in need to install Spark 1.3.

Do I remove/uninstall Spark 1.2 and install Spark 1.3???

---

As far as JobHistory logs.

When I launch a Spark job, logs being created into /tmp/logs/<user-id>/logs but NOT in /user/history/ folders!

Then, when I launch the JobHistory portal (http://<YARN-JobHistory-Server>:19888/jobhuistory) it shows no jobs!!!

Is there a daemon that copies the logs from the /tmp/logs/<user-id>/logs fodler to the /user/history/done & /user/history/done_intermediate ones?