- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Linux OS build - disk space requirements for CDH
Created on ‎06-03-2015 06:39 AM - edited ‎09-16-2022 02:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I can find specific information for the Linux OS for Cloudera Manager's disk space allocation...
...but not for CDH. ??
I understand that CDH can form many roles, but I need to develop a generic Linux OS build for hosts that may use one or more roles with CDH (Flume/Datanodes etc) so we can deploy quickly and easily. I guess what I am after is a broad example of mount point disk space requirements (like the ones outlined for CM)
Can anyone point me in the right direction please?
Thanks in advance
Neil
Created ‎06-18-2015 04:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Neil,
As you said in your message, it all depends or what you want to run, however here are some guidelines. I'm not talking about the "data" drives since you can have as many as you want with the size you need.
1) CM requirements will apply to CDH, like for the /var folder
2) CM will start to alert you when Journal Nodes, Namenodes, and other processes directories start to be under 10GB. Therefore, accounting at least 20GB per service for the "meta" (logs, configs, binaries, etc.) is a good idea. So if you have a YARN + DN + Spark on a node, give them at least 60GB of disk space for
3) Master processes will use space based on the size of the cluster. Indeed, the bigger the cluster is, the more data, the more blocks, the more space is used on the NN and JN directories. So for clusters bigger than 30 nodes you might want to think about giving them a bit more.
Now. It is not recommended to run any sercive on the OS disk (and not just partition). And since disks are bigger and bigger, you might end up with something like 1TB available on your partitition sur CM agaent + CDH services (on worker nodes). If that's the case, I don't think you should really worry about the available space and just share this space between the different mounting points (if split in partitions).
Let me know if I can provide anymore details or information or if this doesn't reply to your question.
JM
Created ‎06-18-2015 04:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Neil,
As you said in your message, it all depends or what you want to run, however here are some guidelines. I'm not talking about the "data" drives since you can have as many as you want with the size you need.
1) CM requirements will apply to CDH, like for the /var folder
2) CM will start to alert you when Journal Nodes, Namenodes, and other processes directories start to be under 10GB. Therefore, accounting at least 20GB per service for the "meta" (logs, configs, binaries, etc.) is a good idea. So if you have a YARN + DN + Spark on a node, give them at least 60GB of disk space for
3) Master processes will use space based on the size of the cluster. Indeed, the bigger the cluster is, the more data, the more blocks, the more space is used on the NN and JN directories. So for clusters bigger than 30 nodes you might want to think about giving them a bit more.
Now. It is not recommended to run any sercive on the OS disk (and not just partition). And since disks are bigger and bigger, you might end up with something like 1TB available on your partitition sur CM agaent + CDH services (on worker nodes). If that's the case, I don't think you should really worry about the available space and just share this space between the different mounting points (if split in partitions).
Let me know if I can provide anymore details or information or if this doesn't reply to your question.
JM
