- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Learnings on running Spark 1.6 & 2.0 code base on same cluster
Created ‎10-26-2016 05:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
With HDP2.5 version supporting both spark versions - i.e. 1.6.2 & 2.0 (TP) - has anyone tried running both versions of spark code (not necessarity same spark application) on the same cluster? Are there any lessons learnt? I heard there are problems when both versions of the code executed on the same cluster but does not have further details. Just wondering if anyone has run some tests/evaluating 2.0 version usage for building future workloads but no immediate urge to convert all the current workloads to be migrated from oldversion to 2.0.
Let me know your views.
Cheers,
KK.
Created ‎11-06-2016 06:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.
Created ‎10-27-2016 08:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you mean a standalone spark cluster or yarn cluster?
Created ‎10-29-2016 09:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yarn cluster
Created ‎11-06-2016 06:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm just about to try this myself so will be watching for answers, I'm guessing we may need to run 2 history servers. Don't forsee any other problem.
Created ‎11-09-2016 10:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just installed spark 2.0.1 on HDP 2.4 with spark 1.6.0 and it works just fine
You need a 2nd history server (and hdfs dir)
Created ‎11-07-2016 01:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jonathan,
Please let me know your findings. I will share mine if I progress on it. I have a test workload as well to run it on my test cluster. Need to upgrade to HDP2.5 before I could run the tests. Currently snowed under at work so not finding time to do it. It is one of my list of things to do.
Cheers,
KK
Created ‎02-06-2017 05:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried to run the KMeans clustering code built for spark 1.6 on the spark 2.0, and came across problems with data types of Vectors. This could be due to spark 2.0 using vectors from "ml" and not "mllib" but the KMeans fit method seems to be calling some of mllib functions under the hood, and these functions dont understand the "ml" vectors. I ll post here once I get work around to this issue.
Created ‎02-07-2017 03:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.
Created ‎02-07-2017 03:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Clustering algorithms in Spark 2.0 use ML version so, the feature vectors need to be of type ML instead of MLLIB. I had to convert an MLLIB vector to ML vector to make it work in spark 2.0.
