Reply
New Contributor
Posts: 1
Registered: ‎04-28-2015

Multifunctional cluster vs Specialized Clusters

Dear all,

 

We are currently involved in a project where we pretend to use Hadoop to process and store a huge amount of data and Spark for real-time analytics. At this moment, we are currently wondering which architecture would suite best for our case from this two possibilities:

 

A) A single Multifunctional server with HBASE, HDFS (standard configuration) and spark with 3 masters and tens of slaves.

 

B) Three specialized clusters optimized for:

- Cluster 1, specialized for storage: HDFS configured to manage these small files. This cluster will process very small files and will store every file in HBASE.

- Cluster 2, specialized for batch processing: Will process huge files of structured data. HDFS will be configured to deal with these huge files.

- Cluster 3, with Spark oriented to real-time processing on streaming data.

 

What would be the most important criteria to determine the best option in our case?

 

Best Regards!

 

Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.