03-28-2019 12:15 AM
Good day guys, im newby in Cloudera and wanted to ask 2 questions.
1) I got 20TB of data and i should migrate it to 10 servers, do i need to have 20TB of disk on each server ?
2) How do i organize the right HDFS model (NameNode, DataNode, SecondaryNameNone) on those 10 servers ?
Thanks, i hope to receive the answer very soon )
03-28-2019 03:40 AM
1) If you want to migrate all data, you can compress them and allocated in other nodes/servers. And not need 20TB of disk.
Althow if you need availble the data information, yo have 2 scenarios:
- Ten replication factor: then need 20TB per server.
- One replication factor: only need 20TB distributed in 10 servers.
- Best: replication factor 5 and 4TB per server.
2) Its depends, you need one namenode, one secondarynamenode, and for example 8 datanodes. You need to put attention of resources of your hosts.
03-28-2019 04:07 AM
Thanks for your reply, so if i get it the right way, size on each server depends on replication factor i put, is there any table of dependencies of replication factor and disk sizing ?
Also wanted to ask about the resources on each node, so summary i need some documentation about replica factor, sizing and ram usage.
03-28-2019 04:14 AM
You are right. There are not any table, you must to study your scenario(HA, security, access number ...).
- Volume users?
- Volume data?
All documentation is available here, according your version: