I have to calculate what amount of space will new applications use on the existing cluster and how this will impact the overall utilized capacity within the cluster. The reasoning is that the intake process should address the impact this will have on disk utilization within the cluster.
In ambari have the following details:
DFS used 2.3 PB 72.19 %
Non DFS used 2.6 Tb (0.08 %)
Remaining 918.4 TB ( 27.73 %
lets i have a application A which has 100 Gb of data which is going to use hdp...
How much will it consume on the hdfs
What is your clusters replication factor? If your replication is 3X then 100 GB x 3 = 300 GB. 300 GB will be the about of hdfs storage used.
Thanks for the reply
I know that 300 GB of data will be used by HDFS..but my question is with 300 GB of data how will it impact the existing setup??
In ambari i can see above details how will that change and with more and more application coming in, how to assess the impact
DFS used 2.3 PB will go up 2.6 PB
Remaining 918.4 TB will go down to 918.1
non DFS should not change.
let me know if you need more info.
Thanks for the info..can you let me know how is that calculated. I have been reading forums but i couldn't understand.
Eg: I have application A with 4576 Gb of data i was able to calculate the space on hdfs which is 1804 Gb and requires 1.12 data nodes as data node we have are of 23 TB..
I want to know how does it impact the existing setup?? as u mentioned above how will DFS, remaining, and non DFS change.