Created on
10-30-2023
02:46 PM
- edited on
11-08-2023
11:44 PM
by
VidyaSargur
Kudu Command Line Copy: The Copy Table command can be used to copy one Kudu table to another. The two tables could be in the same cluster or not. The two tables must have the same table schema but could have different partition schemas. Alternatively, the tool can create the new table using the same table and partition schema as the source table.
Example:
Full table copy command:
kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name> -write_type=upsert -num_threads=3
Incremental copy command:
kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name> -write_type=upsert -num_threads=3 -predicates='["AND", [">=", "some_value", 234]]'
Spark Backup Utility:
Kudu supports both full and incremental table backups via a job implemented using Apache Spark. Additionally, it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark.
Example:
Backup:
spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 3 \
--executor-memory 1G \
--master yarn \
--name KuduBackup_Job1 \
--class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH/lib/kudu/kudu-backup2_2.11.jar \
--kuduMasterAddresses xxx.xx.xxx.73 \
--rootPath hdfs:///user/root \
default.sample
Restore:
spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 2 \
--executor-memory 1G \
--master yarn \
--name KuduRestore_Job1 \
--class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH/jars/kudu-backup2_2.11-1.15.0.7.1.7.0-551.jar \
--kuduMasterAddresses xxx.xx.xxx.136 \
--rootPath hdfs:///user/root \
--createTables false \
--newDatabaseName spark_copy default.sample
KUDU Spark Backup-Restore Incremental Scenarios
|
Scenario |
Backup |
Restore |
|
Inserting New Rows |
New Partition will be created for Incremental Rows |
Incremental Data Will be Loaded |
|
Updating row value |
New Partition will be created for Incremental/Updated Rows |
Incremental/Updated Data will be loaded |
|
Changing Column Data Type |
Not Supported in KUDU |
Not Supported in KUDU |
|
Adding New Column |
New Partition will be created for Incremental/Updated Rows. No Full Load required |
We need to add Column First and then do Restore |
|
Deleting New Column |
New Partition will be created for Incremental/Updated Rows. No Full Load required |
We need to delete Column First and then do restore |
|
Deleting a Row |
New Partition will be created for Deleted rows. No Full Load required |
Rows will be deleted with restore utility |
Command line copy Incremental Scenarios
|
Scenario |
Backup |
Restore |
|
Inserting New Rows |
As long we have timestamp we can do incrementals restore |
Incremental Data Will be Loaded |
|
Updating row value |
As long as we are updating timestamp value with update incremental copy will work |
As long as we are updating timestamp value with update incremental copy will work |
|
Changing Column Data Type |
Not Supported in KUDU |
Not Supported in KUDU |
|
Adding New Column |
We need to add columns first, else incremental copy will fail. |
|
|
Deleting New Column |
We need to delete Column First and then do incremental table copy. |
|
|
Deleting a Row |
Rows Needs to be deleted as soon as we delete rows in the actual table. |