We are working on containerizing our solution uses the following Hadoop with its ecosystem components for the individual use cases. The expectation is to have the components/it alternate as containers, orchestrated in any Kubernetes cluster without any hardware dependency.
We would like to understand Cloudera's strategy on containerization and any alternates that Cloudera may propose for our needs.
HDFS - Event data store in the file format of CSV, Avro, and parquet
NFS - To provide access to data sources to place data in HDFS
HBase - NoSQL database for serving customer care requests within SLA
YARN - Resource Management for MRV2 and Spark workloads
MRV2 - ETL batch processing for reporting data
Spark - Processing engine for real-time aggregation and querying
Spark Thrift server – Exposing reporting data through a JDBC interface to the BI tool.