12-28-2015 12:57 AM
I am trying to copy parquet files between 2 different Impala database that they are working 2 different Cloudera installation.
I created tables on target with exactly same type of columns and partitions
I copied parquet files from source and copied into target fs then moved into hdfs location and then I run refresh <table> command on impala-shell
For nonpartitioned tables :--> there is no problem after refresh command. I could see all the data in target tables
For partitioned tables --> there is no data after refresh command, so I tried with following procedure
When I check the target table, I could see data in target table except on some column/s of target table which they are created as partitioned columns on source table
What is the proper way of parquet file copy operation between 2 impala database, if the tables have partitions?
12-28-2015 02:27 PM
For partitioned tables, you need to also re-create the partition metadata in the target table. Just doing a refresh will not re-create that metadata. Here are a few options for re-creating the partitions in the new table:
1. Use Impala's "ALTER TABLE new_table RECOVER PARTITIONS" (available in Impala 2.3)
2. Use Hive's "msck repair" to recover the partition metadata
3. Re-create the partitions manually with ALTER TABLE new_table ADD PARTITION(...)
12-29-2015 01:36 AM
Thanks for your reply
1. ) I am running with following impala and CDH version. Can I easily upgrade only impala ?
2.) I applied following steps with partitioned and nonpartitioned table but I am failed again
I think I have an error about the steps but could not find yet. Any advice ?
3.) I am getting many parquet tables from external source and I have many tables with various partitions (not only datetime info) and I am trying to create an automatization. So I need a a little bit easier way to automate it. For now, I have focused 1 and 2 :)