Created 06-24-2020 02:31 AM
Hi,
I need to setup a 5 node cluster with Hadoop 3.1.0 and Spark 2.4.5 . Someone recommended to use Ambari to do so. I checked Ambari but it seems Ambari can be used only to install HDP and latest HDP do not support Spark 2.4.5 version.
Please suggest in this aspect, what will be the best way to setup the required big data cluster.
Created 06-25-2020 02:03 AM
Hi,
Thanks for the reply the approach that you have shared as below, is not supported with HDP cluster.
"Can we install hadoop & spark 2.4.5 packages on multi node cluster without using hdp, ambari & cloudera"
So option here is to use the Spark 2.3.2 version that comes with HDP 3.1.5, we can also involve the technical support team of cloudera, if you are hitting any issue with the Spark 2.3.2.
Thanks and Regards,
Vikas Dadhich
Created on 06-25-2020 01:33 AM - edited 06-25-2020 01:34 AM
Hi,
Thanks for creating a new thread seems you need help to setup spark 2.4.5 with Ambari. As you already mentioned that HDP does not support the spark 2.4.5.
I agree, we do have latest HDP 3.1.5 at this moment that comes with the Apache Spark 2.3.2 for more information you can refer the link[1].
Also, installing the upstream component versions is not supported so we are very limited with the resources. Please follow the support matrix to use the supported configuration as mentioned under the link[2].
link[1]: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/release-notes/content/comp_versions.html
link[2]: https://supportmatrix.hortonworks.com/
Thanks and Regards,
Vikas Dadhich
Created 06-25-2020 01:57 AM
Thanks for the reply. Can we install hadoop & spark 2.4.5 packages on multi node cluster without using hdp, ambari & cloudera ? We already have spark applications running on spark 2.4.5 version and we do not want to go back to backward versions. Even we are planning to upgrade them soon to spark 3 because of better delta lake compatibility.
If we install hadoop and spark packages manually on each node of the cluster, can there be any maintanance issues at later stage in production ?
Created 06-25-2020 02:03 AM
Hi,
Thanks for the reply the approach that you have shared as below, is not supported with HDP cluster.
"Can we install hadoop & spark 2.4.5 packages on multi node cluster without using hdp, ambari & cloudera"
So option here is to use the Spark 2.3.2 version that comes with HDP 3.1.5, we can also involve the technical support team of cloudera, if you are hitting any issue with the Spark 2.3.2.
Thanks and Regards,
Vikas Dadhich
Created 06-25-2020 02:48 AM
I will check our spark 2.4.5 application code compatibility with spark 2.3.2 version. Is Ambari & HDP going to be discontinued in near future as part of cloudera and hortonworks merger going ? We need to plan our choice of softwares accordingly.