New Contributor
Posts: 1
Registered: ‎04-28-2015

Multifunctional cluster vs Specialized Clusters

Dear all,


We are currently involved in a project where we pretend to use Hadoop to process and store a huge amount of data and Spark for real-time analytics. At this moment, we are currently wondering which architecture would suite best for our case from this two possibilities:


A) A single Multifunctional server with HBASE, HDFS (standard configuration) and spark with 3 masters and tens of slaves.


B) Three specialized clusters optimized for:

- Cluster 1, specialized for storage: HDFS configured to manage these small files. This cluster will process very small files and will store every file in HBASE.

- Cluster 2, specialized for batch processing: Will process huge files of structured data. HDFS will be configured to deal with these huge files.

- Cluster 3, with Spark oriented to real-time processing on streaming data.


What would be the most important criteria to determine the best option in our case?


Best Regards!