Community Articles

VidyaSargur · ‎10-30-2023

Kudu Command Line Copy: The Copy Table command can be used to copy one Kudu table to another. The two tables could be in the same cluster or not. The two tables must have the same table schema but could have different partition schemas. Alternatively, the tool can create the new table using the same table and partition schema as the source table.

Example:

Full table copy command:

kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name>  -write_type=upsert -num_threads=3

Incremental copy command:

kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name>  -write_type=upsert -num_threads=3 -predicates='["AND", [">=", "some_value", 234]]'

Spark Backup Utility:

Kudu supports both full and incremental table backups via a job implemented using Apache Spark. Additionally, it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark.

Example:

Backup:

spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 3 \
--executor-memory 1G \
--master yarn \
--name KuduBackup_Job1 \
--class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH/lib/kudu/kudu-backup2_2.11.jar \
--kuduMasterAddresses xxx.xx.xxx.73 \
--rootPath hdfs:///user/root \
default.sample

Restore:

spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 2 \
--executor-memory 1G \
--master yarn \
--name KuduRestore_Job1 \
--class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH/jars/kudu-backup2_2.11-1.15.0.7.1.7.0-551.jar \
--kuduMasterAddresses xxx.xx.xxx.136 \
--rootPath hdfs:///user/root \
--createTables false \
--newDatabaseName spark_copy default.sample

KUDU Spark Backup-Restore Incremental Scenarios

Scenario	Backup	Restore
Inserting New Rows	New Partition will be created for Incremental Rows	Incremental Data Will be Loaded
Updating row value	New Partition will be created for Incremental/Updated Rows	Incremental/Updated Data will be loaded
Changing Column Data Type	Not Supported in KUDU	Not Supported in KUDU
Adding New Column	New Partition will be created for Incremental/Updated Rows. No Full Load required	We need to add Column First and then do Restore
Deleting New Column	New Partition will be created for Incremental/Updated Rows. No Full Load required	We need to delete Column First and then do restore
Deleting a Row	New Partition will be created for Deleted rows. No Full Load required	Rows will be deleted with restore utility

Command line copy Incremental Scenarios

Scenario	Backup	Restore
Inserting New Rows	As long we have timestamp we can do incrementals restore	Incremental Data Will be Loaded
Updating row value	As long as we are updating timestamp value with update incremental copy will work	As long as we are updating timestamp value with update incremental copy will work
Changing Column Data Type	Not Supported in KUDU	Not Supported in KUDU
Adding New Column		We need to add columns first, else incremental copy will fail.
Deleting New Column		We need to delete Column First and then do incremental table copy.
Deleting a Row		Rows Needs to be deleted as soon as we delete rows in the actual table.

Cloudera Community

Community Articles

Comparison : Kudu Copy Command vs Spark backup utility

Apache Kudu

Cloudera Data Platform (CDP)