Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Copying the Hive External table from one database to another database.

avatar
Explorer

Hi Guys,

 

I got a request to copy one of the large hive external table that has 72883 partitions from one database to another database.   I am new to the Hive database so I need your help in copying that table.  I have done the following steps so far and not sure about next step. 

 

Do I need to add those 72883 partitions manually or is there another way to add those partitions to the target table.   

 

Your early response is very much appreciated.

 

Steps:

1.  created the external table in new databases with the same definition of the source table,but with different location.  

2.  Copy the CSV files of source table location  to target table location.

 

 

 

Regards

~Suresh D

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Uppal 

 

Great if all went well we usually run msck repair table daily once you have loaded a new partition in HDFS location.

Why  you need to run msck Repair table statement every time after each ingestion?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.

msck will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist
If you will find the need remember to do that  else accept the answer and close the thread :


View solution in original post

8 REPLIES 8

avatar
Master Mentor

@Uppal 

 

Best way to duplicate a partitioned table in Hive

  • Create the new target table with the schema from the old table  the describe formatted could help with the SQL
  • Use hadoop fs -cp to copy all the partitions from source to the target table
  • Run MSCK REPAIR TABLE table_name; on the target table

HTH

avatar
Master Mentor

@Uppal 

Any updates on this thread.

avatar
Explorer

Thank you for following up on this.   Followed the below steps to copy the Hive external table successfully.   

 

Steps:

1.  Created the target external table in new databases with the same definition of the source table, but with different location.    Extracted the definition of the source table with "show create table <Hive External  source table>"

2.  Copy the CSV files of source table location  to target table location with "hadoop fs -cp"

3.  Then added the partitions 72,883 partitions with the below Command.  I had to come up with a script for 72,833 partitions though.   

                              "ALTER TABLE <Hive External Target Table> ADD PARTITION <Partition Name>;  

         

Regards

Suresh D

avatar
Master Mentor

@Uppal 

Great that worked out better for you, did you run MSCK REPAIR TABLE table_name; on the target table?

f you found this answer addressed your initial question, please take a moment to login and click "accept" on the answer.  

Happy hadooping

 

avatar
Explorer

Hi Shelton,

 

I have not run  " MSCK REPAIR TABLE table_name; " on the target table .  Not sure if it is needed.  Why we have to run that command.  My target table is working fine without running that command.

 

Regards

~Suresh D

 

avatar
Master Mentor

@Uppal 

 

Great if all went well we usually run msck repair table daily once you have loaded a new partition in HDFS location.

Why  you need to run msck Repair table statement every time after each ingestion?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS , the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of below ways to add the newly add partitions.

msck will add metadata about partitions to the Hive metastore for partitions for which such metadata doesn't already exist
If you will find the need remember to do that  else accept the answer and close the thread :


avatar
Explorer

Thank you Shelton for sharing.  I do not know about it.   I will run this command instead of adding the partitions with alter table command next time.  

 

Thank you again.

Suresh D

 

avatar
Explorer

Hi Shelton,

 

I did find a note to add the partitions in another way.   Are you aware of it?   if so, do you see any issues with it.  

 

Regards

~Suresh D

 

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive-automate-msck.html

Automate partition discovery and repair

Hive automatically and periodically discovers discrepancies in partition metadata in the Hive metastore and corresponding directories on the file system, and then performs synchronization. Automating this operation for log data or data in Spark and Hive catalogs is especially helpful.

The discover.partitions table property enables and disables synchronization of the file system with partitions. In external partitioned tables, this property is enabled (true) by default when you create the table using Hive in HDP 3.1.4 and later. To a legacy external table (created using an earlier version of Hive), add discover.partitions to the table properties to enable partition discovery. By default, the discovery and synchronization of partitions occurs every 5 minutes, but you can configure the frequency as shown in this task.

  1. Assuming you have an external table created using a version of Hive that does not support partition discovery, enable partition discovery for the table.
    ALTER TABLE exttbl SET TBLPROPERTIES ('discover.partitions' = 'true');
  2. Set synchronization of partitions to occur every 10 minutes expressed in seconds: In Ambari > Hive > Configs, set metastore.partition.management.task.frequency to 600.