Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Linux OS build - disk space requirements for CDH

avatar
Explorer

Hi, I can find specific information for the Linux OS for Cloudera Manager's disk space allocation...

 

http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cm_ig_cm_requirements.h...

 

...but not for CDH. ??

 

I understand that CDH can form many roles, but I need to develop a generic Linux OS build for hosts that may use one or more roles with CDH (Flume/Datanodes etc) so we can deploy quickly and easily. I guess what I am after is a broad example of mount point disk space requirements (like the ones outlined for CM) 

 

Can anyone point me in the right direction please? 

 

Thanks in advance

 

Neil

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi Neil,

 

As you said in your message, it all depends or what you want to run, however here are some guidelines. I'm not talking about the "data" drives since you can have as many as you want with the size you need.

 

1) CM requirements will apply to CDH, like for the /var folder

2) CM will start to alert you when Journal Nodes, Namenodes, and other processes directories start to be under 10GB. Therefore, accounting at least 20GB per service for the "meta" (logs, configs, binaries, etc.) is a good idea. So if you have a YARN + DN + Spark on a node, give them at least 60GB of disk space for

3) Master processes will use space based on the size of the cluster. Indeed, the bigger the cluster is, the more data, the more blocks, the more space is used on the NN and JN directories. So for clusters bigger than 30 nodes you might want to think about giving them a bit more.

 

Now. It is not recommended to run any sercive on the OS disk (and not just partition). And since disks are bigger and bigger, you might end up with something like 1TB available on your partitition sur CM agaent + CDH services (on worker nodes). If that's the case, I don't think you should really worry about the available space and just share this space between the different mounting points (if split in partitions).

 

Let me know if I can provide anymore details or information or if this doesn't reply to your question.

 

JM

View solution in original post

1 REPLY 1

avatar
Rising Star

Hi Neil,

 

As you said in your message, it all depends or what you want to run, however here are some guidelines. I'm not talking about the "data" drives since you can have as many as you want with the size you need.

 

1) CM requirements will apply to CDH, like for the /var folder

2) CM will start to alert you when Journal Nodes, Namenodes, and other processes directories start to be under 10GB. Therefore, accounting at least 20GB per service for the "meta" (logs, configs, binaries, etc.) is a good idea. So if you have a YARN + DN + Spark on a node, give them at least 60GB of disk space for

3) Master processes will use space based on the size of the cluster. Indeed, the bigger the cluster is, the more data, the more blocks, the more space is used on the NN and JN directories. So for clusters bigger than 30 nodes you might want to think about giving them a bit more.

 

Now. It is not recommended to run any sercive on the OS disk (and not just partition). And since disks are bigger and bigger, you might end up with something like 1TB available on your partitition sur CM agaent + CDH services (on worker nodes). If that's the case, I don't think you should really worry about the available space and just share this space between the different mounting points (if split in partitions).

 

Let me know if I can provide anymore details or information or if this doesn't reply to your question.

 

JM