Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

parquet file copy on partitioned tables

parquet file copy on partitioned tables

Contributor

Hello

 

I am trying to copy parquet files between 2 different Impala database that they are working 2 different Cloudera installation.

 

I created tables on target with exactly same type of columns and partitions

I copied parquet files from source and copied into target fs then moved into hdfs location and then I run refresh <table> command on impala-shell

 

For nonpartitioned tables :--> there is no problem after refresh command. I could see all the data in target tables

 

For partitioned tables --> there is no data after refresh command, so I tried with following procedure

 

  • dropped partitioned tables from target,
  • recreated tables on target without partition. I created source table's partitioned columns as normal columns on target
  • copied parquet files into hdfs location
  • then refresh table command

 

When I check the target table, I could see data in target table except on some column/s of target table which they are created as partitioned columns on source table

 

What is the proper way of parquet file copy operation between 2 impala database, if the tables have partitions?

 

Thanks

 

 

2 REPLIES 2

Re: parquet file copy on partitioned tables

Master Collaborator

For partitioned tables, you need to also re-create the partition metadata in the target table. Just  doing a refresh will not re-create that metadata. Here are a few options for re-creating the partitions in the new table:

1. Use Impala's "ALTER TABLE new_table RECOVER PARTITIONS" (available in Impala 2.3)

2. Use Hive's "msck repair" to recover the partition metadata

3. Re-create the partitions manually with ALTER TABLE new_table ADD PARTITION(...)

 

 

Re: parquet file copy on partitioned tables

Contributor

Hello Alex

 

Thanks for your reply

 

1. ) I am running with following impala and CDH version. Can I easily upgrade only impala ?
impala.x86_64                  2.2.0+cdh5.4.5+0-1.cdh5.4.5.p0.8.el6
impala-shell.x86_64            2.2.0+cdh5.4.5+0-1.cdh5.4.5.p0.8.el6

 

2.) I applied following steps with partitioned and nonpartitioned table but I am failed again

  • dropped table
  • recreated table without partitioned columns
  • copied parquet files into same directory (I collected directory information from show create table output)
  • run msck repair table <tablename> in hive CLI
  • there is no data in partitioned columns

 

  • dropped table
  • recreated table with partitioned columns
  • copied parquet files into same directory
  • run msck repair table <tablename> in hive CLI
  • no data in table

I think I have an error about the steps but could not find yet. Any advice ?

 

3.) I am getting many parquet tables from external source and I have many tables with various partitions (not only datetime info) and I am trying to create an automatization. So I need a a little bit easier way to  automate it. For now, I have focused 1 and 2 :)

 

Many thanks

Suluhan