Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

can we apply the partitioning on the already existing Hive table

avatar
Rising Star

Hi,

We have some fact tables which contains large number of rows. We have partition applied on the month right now. It is most more likely that in coming future we might need to apply partition by week number. As update command is missing in Hive so whenever there is situation to update the historical data we just drop the partition & create a new partition. So applying partitions is necessary.

I am wondering is applying partitioning on the existing columns in hive table POSSIBLE ?

How to handle the situation where we have to apply the partitioning dynamically based on the load ?

i think dropping the table & recreating table for most of the requirement is not good thing

11 REPLIES 11

avatar
Master Guru

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

Are you using CDH or HDP? In HDP I would propose ORC format. Its very similar to Parquet and just better supported and tested.

If your load from SQL Server is slow its most likely not the hive creation but sqoop. So you could increase the number of mappers but there might not be an easy fix. If you have the problems in the INSERT INTO you can look into the PPT for tips. ( Specifically the distribution methods near the end )

avatar
Rising Star

we are on the CDH. I will have a look on the PPT. Can you answer my another comment on https://community.hortonworks.com/questions/14313/facing-issues-while-ingesting-data-into-hive.html