Support Questions
Find answers, ask questions, and share your expertise

parquet file copy on partitioned tables




I am trying to copy parquet files between 2 different Impala database that they are working 2 different Cloudera installation.


I created tables on target with exactly same type of columns and partitions

I copied parquet files from source and copied into target fs then moved into hdfs location and then I run refresh <table> command on impala-shell


For nonpartitioned tables :--> there is no problem after refresh command. I could see all the data in target tables


For partitioned tables --> there is no data after refresh command, so I tried with following procedure


  • dropped partitioned tables from target,
  • recreated tables on target without partition. I created source table's partitioned columns as normal columns on target
  • copied parquet files into hdfs location
  • then refresh table command


When I check the target table, I could see data in target table except on some column/s of target table which they are created as partitioned columns on source table


What is the proper way of parquet file copy operation between 2 impala database, if the tables have partitions?






Master Collaborator

For partitioned tables, you need to also re-create the partition metadata in the target table. Just  doing a refresh will not re-create that metadata. Here are a few options for re-creating the partitions in the new table:

1. Use Impala's "ALTER TABLE new_table RECOVER PARTITIONS" (available in Impala 2.3)

2. Use Hive's "msck repair" to recover the partition metadata

3. Re-create the partitions manually with ALTER TABLE new_table ADD PARTITION(...)




Hello Alex


Thanks for your reply


1. ) I am running with following impala and CDH version. Can I easily upgrade only impala ?
impala.x86_64                  2.2.0+cdh5.4.5+0-1.cdh5.4.5.p0.8.el6
impala-shell.x86_64            2.2.0+cdh5.4.5+0-1.cdh5.4.5.p0.8.el6


2.) I applied following steps with partitioned and nonpartitioned table but I am failed again

  • dropped table
  • recreated table without partitioned columns
  • copied parquet files into same directory (I collected directory information from show create table output)
  • run msck repair table <tablename> in hive CLI
  • there is no data in partitioned columns


  • dropped table
  • recreated table with partitioned columns
  • copied parquet files into same directory
  • run msck repair table <tablename> in hive CLI
  • no data in table

I think I have an error about the steps but could not find yet. Any advice ?


3.) I am getting many parquet tables from external source and I have many tables with various partitions (not only datetime info) and I am trying to create an automatization. So I need a a little bit easier way to  automate it. For now, I have focused 1 and 2 🙂


Many thanks