Member since
07-03-2020
5
Posts
1
Kudos Received
0
Solutions
09-23-2020
02:41 AM
Thank you. I have started thinking about it exactly from this resource 🙂 Not all is clear for me for a while, but now I have +- ability of imaging about this question, it's all very individual, I understand. Thank you. This question is closed )
... View more
08-19-2020
01:53 AM
Ma be we will exclude impala from this list for resources economy. Hive is enough for a while.
... View more
08-18-2020
07:13 AM
Hello!
Please help me about minimal hardware requirements for out small cluster.
We decided to make very small production cluster with high availability for archiving purposes based on Cloudera CDH 6.3.3 (community version)
Storage size planning as about of 10-20 TiB
Workload planning:
- every 2 minutes ETL from external oracle to local parquet about of 500-1000 rows of data
- periodically (very rarely) analytic queries to hive about search through all of parquets
- periodically (very rarely) run spark ad-hoc tasks with goals same as above
Components:
- Cloudera manager
- HDFS
- Hive
- Hue
- Impala
- Spark
- Yarn with MR2
- ZooKeeper
- Streamsets parcel (as a part of cloudera)
We want to use only 3 hosts (not more) and disaster of any of this host must not crash all system.
So we plan to place all of above components to all of the hosts.
In another words, each component will be on each host.
Is it normal and available or someone may advise another alignment?
We also want to know if we can place HDFS namenode and cloudera manager on only 2 hosts or this components also better to put to all three hosts?
And, finally, which minimal requirements of RAM, CPU and disk storage to each of this three hosts?
Big thanks in advance!
... View more
Labels: