05-22-2014 09:23 PM - edited 05-22-2014 10:03 PM
We have a requirement to analyze big data for one of our telecom customer, after that requirement analysis we found hadoop with hive is a suitable solution for this , i have found some of use cases and best practices in the net , but as per configuration perspective i couldnt get exact information,can you please advice on my below queries ,
1. We are planning to setup hadoop cluster with 40 to 50 nodes, here the doubt is , if we configured 50 node hadoop cluster , and its available for use, now we want to install hive on this with hcat , metastore etc ... whats the best approach to install hive on this cluster setup? its should installed and confiured on all the nodes or its enough with master node /name node? what is best and recommended approach for production and HA environment setup?
2. Or is it possible to configure Hive in different server , is it recommended for production setup?
could yoy please provide any detailed use case or best practice information to implement hive on hadoop environment. your advice and suggestions will help lot to understand.
thanks in Advance.
06-26-2014 12:53 PM
No matter how many number of nodes you install hive on, any query that you run through hive gets converted into MR job and will be submitted via JT and the data is stored in HDFS . Instaling hive in all the nodes is fine,So in my opinion , having one dedicated node for metastore and one for HiveServer2 and a list of nodes as gateways.