I am new to Hadoop and I had some questions with regards to Hive installation
1) On which Nodes exactly (Master, Slave or Edge) should Hive and its components be installed?
2) Should Hive and its components only be installed on Edge Nodes
3) Should some Hive components be installed on Edge Nodes and some on Master Nodes?
4) Should any part of Hive be installed on the Secondary Name Node?
I would really appreciate it if anyone could answer this for me. This has been driving me crazy for quite some time now
@Leenurs Quadras Hive installation is independent of NameNode/Secondary NameNode location. In the configuration file you just need to specify where is Hadoop installed so that it can access the job tracker for submitting Map Reduce programs.
Theoretically you can setup HiveServer, MetaStore Server, hive clients, etc. all in the Master Node. However in a production scenario placing them in a Master or Slave node is not a good idea. You should set up hiveserver on a dedicated Node so it doesn't compete for resources against the namenode processes (It gets quite busy in a multi-user scenario). MetaStore is a separate daemon which can be either embedded with the HiveServer (in this case it uses derby and isn't ideal for a production use) or can be setup separately as a dedicated database service(the recommended way). Beeline (hive client) can run in embedded mode as well as remotely. Remote HiveServer2 mode is recommended for production use, as it is more secure and doesn't require direct HDFS/metastore access to be granted for users.
Hope this answers all your questions.