Support Questions

Find answers, ask questions, and share your expertise

Hive table storage format conversion from TEXT to Parquet

avatar
New Contributor

Hi All ,

 

   I am doing POC on our DEV cluster , When I am trying to change fileformat using below command ,

alter table students set fileformat PARQUET;

 

I am getting below error when I run select command on student table .

Error: java.io.IOException: java.lang.RuntimeException: hdfs://nameservice1/<path> is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [46, 51, 49, 10] (state=,code=0)

 

Any workaround to convert the existing Hive tables from 'org.apache.hadoop.mapred.TextInputFormat' to 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 

 

Thanks in advance ...

1 REPLY 1

avatar
Super Guru

@Genentech I am not sure if this is the answer you are looking for, but my recommendation is to leave your original table as is and select results from that into the parquet table.  I am a firm believe of using backup copies, staging, copies or temporary copies of original data sources on the path through translation to final source.

 

Make a new empty table with the parquet format you want.   The format must match.   Next execute:

 

INSERT INTO final_table SELECT * from source_table;

 

If you need to retain the same original table name, you can alter or drop the original table, and execute a rename statement on the final_table above.

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ