Support Questions

Find answers, ask questions, and share your expertise

Kudu Backup Space Requirement

New Contributor


I want to find the space required for storing Kudu Table Backup using spark-submit command.

How it is related to Kudu Table On disk size and how to calculate the space required to store the Backup of Kudu table?

I am using the following command as reference to run Kudu backup for a table - 

spark-submit --class org.apache.kudu.backup.KuduBackup [***FULL PATH TO kudu-backup2_2.11-1.12.0.jar***] --kuduMasterAddresses [***KUDU MASTER HOSTNAME 1***]:7051,[***KUDU MASTER HOSTNAME 2***]:7051 --rootPath file:/// [***DIRECTORY TO USE FOR BACKUP***] impala::[***DATABASE NAME***].foo

To check the Kudu table On disk size I used the statistics kudu command -

kudu table statistics <master_addresses> <table_name>

I have a table with following statistics -

TABLE sampleTable1
on disk size: 31667556474
live row count: 2000000000
on disk size limit: N/A
live row count limit: N/A

How much space will be required in rootPath to store the backup of this table?
Or how can we calculate the required space for storing backup?

Thank you


Expert Contributor

HI Team, if you are storing the back in HDFS like above -rootPath file:/// [***DIRECTORY TO USE FOR BACKUP***]  then use the below command:

hdfs dfs -du -s -h file:/// [***DIRECTORY TO USE FOR BACKUP***] 

if its different storage like ozone(ofs/o3) then also hdfs command will work if its S3 then use the aws command