Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Learnings on running Spark 1.6 & 2.0 code base on same cluster

avatar
Contributor

Hi,

With HDP2.5 version supporting both spark versions - i.e. 1.6.2 & 2.0 (TP) - has anyone tried running both versions of spark code (not necessarity same spark application) on the same cluster? Are there any lessons learnt? I heard there are problems when both versions of the code executed on the same cluster but does not have further details. Just wondering if anyone has run some tests/evaluating 2.0 version usage for building future workloads but no immediate urge to convert all the current workloads to be migrated from oldversion to 2.0.

Let me know your views.

Cheers,

KK.

1 ACCEPTED SOLUTION

avatar
New Contributor

I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.

View solution in original post

8 REPLIES 8

avatar
Contributor

Do you mean a standalone spark cluster or yarn cluster?

avatar
Contributor

Yarn cluster

avatar
New Contributor

I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.

avatar
New Contributor

Just installed spark 2.0.1 on HDP 2.4 with spark 1.6.0 and it works just fine

You need a 2nd history server (and hdfs dir)

avatar
Contributor

Hi Jonathan,

Please let me know your findings. I will share mine if I progress on it. I have a test workload as well to run it on my test cluster. Need to upgrade to HDP2.5 before I could run the tests. Currently snowed under at work so not finding time to do it. It is one of my list of things to do.

Cheers,

KK

avatar
Contributor

I have tried to run the KMeans clustering code built for spark 1.6 on the spark 2.0, and came across problems with data types of Vectors. This could be due to spark 2.0 using vectors from "ml" and not "mllib" but the KMeans fit method seems to be calling some of mllib functions under the hood, and these functions dont understand the "ml" vectors. I ll post here once I get work around to this issue.

avatar
Contributor

Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.

avatar
Contributor

Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.