04-28-2015 07:12 AM
We are currently involved in a project where we pretend to use Hadoop to process and store a huge amount of data and Spark for real-time analytics. At this moment, we are currently wondering which architecture would suite best for our case from this two possibilities:
A) A single Multifunctional server with HBASE, HDFS (standard configuration) and spark with 3 masters and tens of slaves.
B) Three specialized clusters optimized for:
- Cluster 1, specialized for storage: HDFS configured to manage these small files. This cluster will process very small files and will store every file in HBASE.
- Cluster 2, specialized for batch processing: Will process huge files of structured data. HDFS will be configured to deal with these huge files.
- Cluster 3, with Spark oriented to real-time processing on streaming data.
What would be the most important criteria to determine the best option in our case?