Created 05-27-2016 03:24 PM
I've setting up an HDP 2.4 cluster and when I attempt to install Spark without Hive I get a dependency issue and cannot continue. I don't wish to use Hive with Spark. If I use a blueprint for install can I force Spark install without Hive, and if so, would there be any consequences?
Created 05-30-2016 07:32 PM
I've discovered that if I omit the dependencies the Ambari Web UI was forcing me to install in an Ambari blueprint, and then submit it manually to the Ambari REST API that I can install Spark without Hive (and its dependencies) no problem.
I've created an unattended install by using the Ambari REST API and submitting a blueprint and cluster hostmapping file.
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-Step1:CreateBlueprint
Created 05-27-2016 03:27 PM
I think there is no workaround available now to install spark without hive. hive is needed by spark to create sqlContext(hiveContext) to query hive tables from spark.
Created 05-27-2016 03:30 PM
Yes, but then when I install Hive I also need to install Mysql (for the metastore), Pig and Tez, more dependencies I don't need. When not using Hive with Spark there needs to be an option to not include it as part of the install.
Created 05-27-2016 04:04 PM
@Sean Glover The Apache Spark download will allow you to build spark in multiple ways using various build flags to include/exclude components:
http://spark.apache.org/docs/latest/building-spark.html
Without Hive, you can still create a SQLContext, but it will be native to Spark and not leverage HiveContext. Without a HiveContext, you cannot reference the Hive Metastore, use Hive UDF's etc. Other tools like the Zeppelin data science notebook also default to creating a HiveContext (configurable) so it will need the Hive dependencies.
Created 05-27-2016 06:51 PM
Sorry, maybe the question wasn't clear. I know Hive isn't a requirement of Spark, but Ambari makes it a requirement as part of installation of the HDP platform. I'm asking how I can workaround Ambari not forcing me to install Hive without having to install Spark manually on my HDP platform.
Created 05-28-2016 04:43 AM
You can install Spark manually and run it like that, without telling Ambari, or you can register SHS and Spark clients using Ambari REST API. That's the easiest way. Otherwise, removing Hive dependency would require changing some Ambari files.
Created 05-30-2016 07:32 PM
I've discovered that if I omit the dependencies the Ambari Web UI was forcing me to install in an Ambari blueprint, and then submit it manually to the Ambari REST API that I can install Spark without Hive (and its dependencies) no problem.
I've created an unattended install by using the Ambari REST API and submitting a blueprint and cluster hostmapping file.
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-Step1:CreateBlueprint
Created 10-10-2017 03:59 PM
Could you share how you generated a blueprint to install Spark on top of a running cluster with other existing services? It's not obvious how to do it.