Created on 09-25-2017 07:41 PM - edited 09-16-2022 05:17 AM
I have to calculate what amount of space will new applications use on the existing cluster and how this will impact the overall utilized capacity within the cluster. The reasoning is that the intake process should address the impact this will have on disk utilization within the cluster.
In ambari have the following details:
DFS used 2.3 PB 72.19 %
Non DFS used 2.6 Tb (0.08 %)
Remaining 918.4 TB ( 27.73 %
lets i have a application A which has 100 Gb of data which is going to use hdp...
How much will it consume on the hdfs
Created 09-25-2017 08:05 PM
What is your clusters replication factor? If your replication is 3X then 100 GB x 3 = 300 GB. 300 GB will be the about of hdfs storage used.
Created 09-25-2017 08:12 PM
Thanks for the reply
I know that 300 GB of data will be used by HDFS..but my question is with 300 GB of data how will it impact the existing setup??
In ambari i can see above details how will that change and with more and more application coming in, how to assess the impact
Created 09-26-2017 06:22 PM
DFS used 2.3 PB will go up 2.6 PB
Remaining 918.4 TB will go down to 918.1
non DFS should not change.
let me know if you need more info.
Created 09-26-2017 06:29 PM
Thanks for the info..can you let me know how is that calculated. I have been reading forums but i couldn't understand.
Eg: I have application A with 4576 Gb of data i was able to calculate the space on hdfs which is 1804 Gb and requires 1.12 data nodes as data node we have are of 23 TB..
I want to know how does it impact the existing setup?? as u mentioned above how will DFS, remaining, and non DFS change.
Created 09-26-2017 06:56 PM