Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Learnings on running Spark 1.6 & 2.0 code base on same cluster

Solved Go to solution

Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Hi,

With HDP2.5 version supporting both spark versions - i.e. 1.6.2 & 2.0 (TP) - has anyone tried running both versions of spark code (not necessarity same spark application) on the same cluster? Are there any lessons learnt? I heard there are problems when both versions of the code executed on the same cluster but does not have further details. Just wondering if anyone has run some tests/evaluating 2.0 version usage for building future workloads but no immediate urge to convert all the current workloads to be migrated from oldversion to 2.0.

Let me know your views.

Cheers,

KK.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.

8 REPLIES 8

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Do you mean a standalone spark cluster or yarn cluster?

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Yarn cluster

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Just installed spark 2.0.1 on HDP 2.4 with spark 1.6.0 and it works just fine

You need a 2nd history server (and hdfs dir)

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Hi Jonathan,

Please let me know your findings. I will share mine if I progress on it. I have a test workload as well to run it on my test cluster. Need to upgrade to HDP2.5 before I could run the tests. Currently snowed under at work so not finding time to do it. It is one of my list of things to do.

Cheers,

KK

Highlighted

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

I have tried to run the KMeans clustering code built for spark 1.6 on the spark 2.0, and came across problems with data types of Vectors. This could be due to spark 2.0 using vectors from "ml" and not "mllib" but the KMeans fit method seems to be calling some of mllib functions under the hood, and these functions dont understand the "ml" vectors. I ll post here once I get work around to this issue.

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.

Re: Learnings on running Spark 1.6 & 2.0 code base on same cluster

New Contributor

Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.