Created 10-26-2016 05:01 PM
Hi,
With HDP2.5 version supporting both spark versions - i.e. 1.6.2 & 2.0 (TP) - has anyone tried running both versions of spark code (not necessarity same spark application) on the same cluster? Are there any lessons learnt? I heard there are problems when both versions of the code executed on the same cluster but does not have further details. Just wondering if anyone has run some tests/evaluating 2.0 version usage for building future workloads but no immediate urge to convert all the current workloads to be migrated from oldversion to 2.0.
Let me know your views.
Cheers,
KK.
Created 11-06-2016 06:21 PM
I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.
Created 10-27-2016 08:13 AM
Do you mean a standalone spark cluster or yarn cluster?
Created 10-29-2016 09:45 PM
Yarn cluster
Created 11-06-2016 06:21 PM
I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.
Created 11-09-2016 10:53 AM
Just installed spark 2.0.1 on HDP 2.4 with spark 1.6.0 and it works just fine
You need a 2nd history server (and hdfs dir)
Created 11-07-2016 01:25 PM
Hi Jonathan,
Please let me know your findings. I will share mine if I progress on it. I have a test workload as well to run it on my test cluster. Need to upgrade to HDP2.5 before I could run the tests. Currently snowed under at work so not finding time to do it. It is one of my list of things to do.
Cheers,
KK
Created 02-06-2017 05:10 PM
I have tried to run the KMeans clustering code built for spark 1.6 on the spark 2.0, and came across problems with data types of Vectors. This could be due to spark 2.0 using vectors from "ml" and not "mllib" but the KMeans fit method seems to be calling some of mllib functions under the hood, and these functions dont understand the "ml" vectors. I ll post here once I get work around to this issue.
Created 02-07-2017 03:52 PM
Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.
Created 02-07-2017 03:52 PM
Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.