I am being asked install spark2 on CDH5.8 cluster. and The CDH5.8 Cluster was setup via rpm packages. Here I want to know How can I install spark2 on existing cluster. Anyone has same questions or experiences like me? Thanks a lot.
Note: I checked CDH documents and found I can install spark2 via parcel, but seems parcel is conflict with rpm package install.
I have this problem too. No word from Cloudera if and when they will ship Spark 2 RPM packages for CDH 5.
I think you could install Spark 2 from Apache Bigtop (or build your own RPM) on an edge node and deploy Spark 2 jobs with Yarn. With Yarn you would not need Spark Worker packages on the worker nodes.
I just tried this with Apache Zeppelin and it seem to work. I took the tar.gz from spark.apache.org and extracted it on an edge node. Then configured zeppelin-env.sh with the following variables:
export HADOOP_USER_NAME=spark export HADOOP_CONF_DIR=/etc/hadoop/conf export MASTER=yarn-client export SPARK_HOME=/opt/spark-2.2.0-bin/hadoop2.6
When I run spark code in Zeppelin I can see that they get executed with Yarn. They can access HDFS files.