Created 02-08-2016 10:44 AM
Hi,
We have some fact tables which contains large number of rows. We have partition applied on the month right now. It is most more likely that in coming future we might need to apply partition by week number. As update command is missing in Hive so whenever there is situation to update the historical data we just drop the partition & create a new partition. So applying partitions is necessary.
I am wondering is applying partitioning on the existing columns in hive table POSSIBLE ?
How to handle the situation where we have to apply the partitioning dynamically based on the load ?
i think dropping the table & recreating table for most of the requirement is not good thing
Created 02-08-2016 03:16 PM
http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data
Are you using CDH or HDP? In HDP I would propose ORC format. Its very similar to Parquet and just better supported and tested.
If your load from SQL Server is slow its most likely not the hive creation but sqoop. So you could increase the number of mappers but there might not be an easy fix. If you have the problems in the INSERT INTO you can look into the PPT for tips. ( Specifically the distribution methods near the end )
Created 02-08-2016 03:26 PM
we are on the CDH. I will have a look on the PPT. Can you answer my another comment on https://community.hortonworks.com/questions/14313/facing-issues-while-ingesting-data-into-hive.html