Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Inquiry for architectural design

avatar
Expert Contributor

Hi,

After some trials on limited number of VMs I need to establish a Hadoop ecosystem on 12 physical servers.

However, after all that reading and installations on VMs I feel little bit confused.

I am going to install HDP (Most probably with Ambari and Zookeeper), Solr, Elastic Search and Hue. I am planning to set 3 of these servers for namenode high availability and set others as datanodes / slaves. In terms of placement of applications, that's all I have clear in my mind.

What I am not clear about is placement of SOLR and Elastic search. Is it a bad, using datanode servers also for these applications? Should I set other servers as dedicated? Can I use high availability nodes also for other applications? In high availability mode, will MapReduce, Yarn and Spark server processes be operating on the high availability nodes? Are there any applications those needs to be on the same host (I am asking these questions especially for Solr, ES, Zookeeper, Hue)?

Or you can skip all these questions if you have a reference which will give me a look from a broad perspective to these key point relations all together.

I'll be glad if you can share your experience with me.

Any comments appreciated 🙂

1 ACCEPTED SOLUTION

avatar
Master Guru

Yes, you can combine services on the same master, including HA instances and search apps. You just try to distribute them so that all 3 masters are about equally busy. Here is one layout:

M1: Ambari, AMS Collector, Hue, Oozie

M2: NN1, RM1, ATS, Spark History Server, Spark Thrift Server, SOLR

M3: NN2, RM2, MR2 JHS, Hive HS2 + Metastore + DB, ES

with Zookeepers and Journal nodes on all 3 masters. If you want to test SOLR and ES extensively you may wish to place them on the 4th master. If you want to add more services (HBase, Falcon, Atlas), just distribute their master components on available masters.

View solution in original post

4 REPLIES 4

avatar
Master Guru

Yes, you can combine services on the same master, including HA instances and search apps. You just try to distribute them so that all 3 masters are about equally busy. Here is one layout:

M1: Ambari, AMS Collector, Hue, Oozie

M2: NN1, RM1, ATS, Spark History Server, Spark Thrift Server, SOLR

M3: NN2, RM2, MR2 JHS, Hive HS2 + Metastore + DB, ES

with Zookeepers and Journal nodes on all 3 masters. If you want to test SOLR and ES extensively you may wish to place them on the 4th master. If you want to add more services (HBase, Falcon, Atlas), just distribute their master components on available masters.

avatar
Expert Contributor

Hi @Predrag Minovic,

So as far as I understand from your answer, I should not install any other applications or (server or client) on datanodes. Is that correct?

I will not test SOLR and ES. They'll be in the environment as a certain plan. Would you have any suggestion on which servers to install them?

Also would you recommend using HDFS Federation?

avatar
Master Guru

Yes, correct, you have already dedicated one quarter of your nodes to masters, so pack all master components there and leave the rest to do processing. Regarding SOLR and ES you can install them like in my proposal above. And no need for HDFS federation because the cluster is not that large.

avatar
Expert Contributor

Saying 4th master, do you suggest turning one of the slaves to master for SOLR and ES?