Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why does Ambari force me to install Hive when I install Spark?

avatar
Explorer

I've setting up an HDP 2.4 cluster and when I attempt to install Spark without Hive I get a dependency issue and cannot continue. I don't wish to use Hive with Spark. If I use a blueprint for install can I force Spark install without Hive, and if so, would there be any consequences?

1 ACCEPTED SOLUTION

avatar
Explorer

I've discovered that if I omit the dependencies the Ambari Web UI was forcing me to install in an Ambari blueprint, and then submit it manually to the Ambari REST API that I can install Spark without Hive (and its dependencies) no problem.

I've created an unattended install by using the Ambari REST API and submitting a blueprint and cluster hostmapping file.

https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-Step1:CreateBlueprint

View solution in original post

7 REPLIES 7

avatar
Super Guru

I think there is no workaround available now to install spark without hive. hive is needed by spark to create sqlContext(hiveContext) to query hive tables from spark.

avatar
Explorer

Yes, but then when I install Hive I also need to install Mysql (for the metastore), Pig and Tez, more dependencies I don't need. When not using Hive with Spark there needs to be an option to not include it as part of the install.

avatar

@Sean Glover The Apache Spark download will allow you to build spark in multiple ways using various build flags to include/exclude components:

http://spark.apache.org/docs/latest/building-spark.html

Without Hive, you can still create a SQLContext, but it will be native to Spark and not leverage HiveContext. Without a HiveContext, you cannot reference the Hive Metastore, use Hive UDF's etc. Other tools like the Zeppelin data science notebook also default to creating a HiveContext (configurable) so it will need the Hive dependencies.

avatar
Explorer

Sorry, maybe the question wasn't clear. I know Hive isn't a requirement of Spark, but Ambari makes it a requirement as part of installation of the HDP platform. I'm asking how I can workaround Ambari not forcing me to install Hive without having to install Spark manually on my HDP platform.

avatar
Master Guru

You can install Spark manually and run it like that, without telling Ambari, or you can register SHS and Spark clients using Ambari REST API. That's the easiest way. Otherwise, removing Hive dependency would require changing some Ambari files.

avatar
Explorer

I've discovered that if I omit the dependencies the Ambari Web UI was forcing me to install in an Ambari blueprint, and then submit it manually to the Ambari REST API that I can install Spark without Hive (and its dependencies) no problem.

I've created an unattended install by using the Ambari REST API and submitting a blueprint and cluster hostmapping file.

https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-Step1:CreateBlueprint

avatar
Explorer

Could you share how you generated a blueprint to install Spark on top of a running cluster with other existing services? It's not obvious how to do it.