Community Articles

Find and share helpful community-sourced technical articles.
avatar
Cloudera Employee

Kudu Command Line Copy: The Copy Table command can be used to copy one Kudu table to another. The two tables could be in the same cluster or not. The two tables must have the same table schema but could have different partition schemas. Alternatively, the tool can create the new table using the same table and partition schema as the source table.

Example:

Full table copy command:

kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name>  -write_type=upsert -num_threads=3

Incremental copy command:

kudu table copy <master_addresses> <source_table_name> <dest_master_addresses> -dst_table=<table_name>  -write_type=upsert -num_threads=3 -predicates='["AND", [">=", "some_value", 234]]'

Spark Backup Utility:

Kudu supports both full and incremental table backups via a job implemented using Apache Spark. Additionally, it supports restoring tables from full and incremental backups via a restore job implemented using Apache Spark.

Example:

Backup: 

spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 3 \
--executor-memory 1G \
--master yarn \
--name KuduBackup_Job1 \
--class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH/lib/kudu/kudu-backup2_2.11.jar \
--kuduMasterAddresses xxx.xx.xxx.73 \
--rootPath hdfs:///user/root \
default.sample

Restore:

spark-submit \
--driver-cores 1 \
--driver-memory 1G \
--executor-cores 2 \
--executor-memory 1G \
--master yarn \
--name KuduRestore_Job1 \
--class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH/jars/kudu-backup2_2.11-1.15.0.7.1.7.0-551.jar \
--kuduMasterAddresses xxx.xx.xxx.136 \
--rootPath hdfs:///user/root \
--createTables false \
--newDatabaseName spark_copy default.sample

KUDU Spark Backup-Restore Incremental Scenarios

Scenario

Backup

Restore

Inserting New Rows

New Partition will be created for Incremental Rows

Incremental Data Will be Loaded

Updating row value

New Partition will be created for Incremental/Updated Rows

Incremental/Updated Data will be loaded

Changing Column Data Type

Not Supported in KUDU

Not Supported in KUDU

Adding New Column

New Partition will be created for Incremental/Updated Rows. No Full Load required

We need to add Column First and then do Restore

Deleting New Column

New Partition will be created for Incremental/Updated Rows. No Full Load required

We need to delete Column First and then do restore

Deleting a Row

New Partition will be created for Deleted rows. No Full Load required

Rows will be deleted with restore utility

Command line copy Incremental Scenarios

Scenario

Backup

Restore

Inserting New Rows

As long we have timestamp we can do incrementals restore

Incremental Data Will be Loaded

Updating row value

As long as we are updating timestamp value with update incremental copy will work

As long as we are updating timestamp value with update incremental copy will work

Changing Column Data Type

Not Supported in KUDU

Not Supported in KUDU

Adding New Column

 

We need to add columns first, else incremental copy will fail.

Deleting New Column

 

We need to delete Column First and then do incremental table copy.

Deleting a Row

 

Rows Needs to be deleted as soon as we delete rows in the actual table. 

 

900 Views
0 Kudos