Support Questions

VidyaSargur · ‎12-20-2019

Hi Guys,

I got a request to copy one of the large hive external table that has 72883 partitions from one database to another database. I am new to the Hive database so I need your help in copying that table. I have done the following steps so far and not sure about next step.

Do I need to add those 72883 partitions manually or is there another way to add those partitions to the target table.

Your early response is very much appreciated.

Steps:

1. created the external table in new databases with the same definition of the source table,but with different location.

2. Copy the CSV files of source table location to target table location.

Regards

~Suresh D

Shelton · ‎01-02-2020

@Uppal

Great if all went well we usually run msck repair table daily once you have loaded a new partition in HDFS location.

Why you need to run msck Repair table statement every time after each ingestion?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.

msck will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist
If you will find the need remember to do that else accept the answer and close the thread :

View solution in original post

Shelton · ‎12-21-2019

@Uppal

Best way to duplicate a partitioned table in Hive

Create the new target table with the schema from the old table the describe formatted could help with the SQL
Use hadoop fs -cp to copy all the partitions from source to the target table
Run MSCK REPAIR TABLE table_name; on the target table

HTH

Shelton · ‎01-01-2020

@Uppal

Any updates on this thread.

Uppal · ‎01-02-2020

Thank you for following up on this. Followed the below steps to copy the Hive external table successfully.

Steps:

1. Created the target external table in new databases with the same definition of the source table, but with different location. Extracted the definition of the source table with "show create table <Hive External source table>"

2. Copy the CSV files of source table location to target table location with "hadoop fs -cp"

3. Then added the partitions 72,883 partitions with the below Command. I had to come up with a script for 72,833 partitions though.

"ALTER TABLE <Hive External Target Table> ADD PARTITION <Partition Name>;

Regards

Suresh D

Shelton · ‎01-02-2020

@Uppal

Great that worked out better for you, did you run MSCK REPAIR TABLE table_name; on the target table?

f you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer.

Happy hadooping

Uppal · ‎01-02-2020

Hi Shelton,

I have not run " MSCK REPAIR TABLE table_name; " on the target table . Not sure if it is needed. Why we have to run that command. My target table is working fine without running that command.

Regards

~Suresh D

Shelton · ‎01-02-2020

@Uppal

Great if all went well we usually run msck repair table daily once you have loaded a new partition in HDFS location.

Why you need to run msck Repair table statement every time after each ingestion?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.

msck will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist
If you will find the need remember to do that else accept the answer and close the thread :

Uppal · ‎01-02-2020

Thank you Shelton for sharing. I do not know about it. I will run this command instead of adding the partitions with alter table command next time.

Thank you again.

Suresh D

Uppal · ‎01-02-2020

Hi Shelton,

I did find a note to add the partitions in another way. Are you aware of it? if so, do you see any issues with it.

Regards

~Suresh D

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive-automate-msck.html

Automate partition discovery and repair

Hive automatically and periodically discovers discrepancies in partition metadata in the Hive metastore and corresponding directories on the file system, and then performs synchronization. Automating this operation for log data or data in Spark and Hive catalogs is especially helpful.

The discover.partitions table property enables and disables synchronization of the file system with partitions. In external partitioned tables, this property is enabled (true) by default when you create the table using Hive in HDP 3.1.4 and later. To a legacy external table (created using an earlier version of Hive), add discover.partitions to the table properties to enable partition discovery. By default, the discovery and synchronization of partitions occurs every 5 minutes, but you can configure the frequency as shown in this task.

Assuming you have an external table created using a version of Hive that does not support partition discovery, enable partition discovery for the table.
```
ALTER TABLE exttbl SET TBLPROPERTIES ('discover.partitions' = 'true');
```
Set synchronization of partitions to occur every 10 minutes expressed in seconds: In Ambari > Hive > Configs, set metastore.partition.management.task.frequency to 600.

Cloudera Community

Support Questions

Copying the Hive External table from one database to another database.

Automate partition discovery and repair