Created on 11-17-2017 03:22 AM - edited 09-16-2022 05:32 AM
Hi All
I am being asked install spark2 on CDH5.8 cluster. and The CDH5.8 Cluster was setup via rpm packages. Here I want to know How can I install spark2 on existing cluster. Anyone has same questions or experiences like me? Thanks a lot.
Note: I checked CDH documents and found I can install spark2 via parcel, but seems parcel is conflict with rpm package install.
Created 11-22-2017 09:24 PM
@MarkusH. Thank you. I've work around for that and tried that it works well so far.
First, we have to migirate CDH from package to parcel. migrate CDH from package to Parcel
Second, we install spark2.
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_mc_addon_services.html
Created on 11-20-2017 12:57 AM - edited 11-20-2017 01:18 AM
I have this problem too. No word from Cloudera if and when they will ship Spark 2 RPM packages for CDH 5.
I think you could install Spark 2 from Apache Bigtop (or build your own RPM) on an edge node and deploy Spark 2 jobs with Yarn. With Yarn you would not need Spark Worker packages on the worker nodes.
Edit:
I just tried this with Apache Zeppelin and it seem to work. I took the tar.gz from spark.apache.org and extracted it on an edge node. Then configured zeppelin-env.sh with the following variables:
export HADOOP_USER_NAME=spark export HADOOP_CONF_DIR=/etc/hadoop/conf export MASTER=yarn-client export SPARK_HOME=/opt/spark-2.2.0-bin/hadoop2.6
When I run spark code in Zeppelin I can see that they get executed with Yarn. They can access HDFS files.
Created 11-22-2017 09:24 PM
@MarkusH. Thank you. I've work around for that and tried that it works well so far.
First, we have to migirate CDH from package to parcel. migrate CDH from package to Parcel
Second, we install spark2.
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_mc_addon_services.html