Support Questions

praveen_bora · ‎02-08-2016

Hi,

We have some fact tables which contains large number of rows. We have partition applied on the month right now. It is most more likely that in coming future we might need to apply partition by week number. As update command is missing in Hive so whenever there is situation to update the historical data we just drop the partition & create a new partition. So applying partitions is necessary.

I am wondering is applying partitioning on the existing columns in hive table POSSIBLE ?

How to handle the situation where we have to apply the partitioning dynamically based on the load ?

i think dropping the table & recreating table for most of the requirement is not good thing

bleonhardi · ‎02-08-2016

http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data

Are you using CDH or HDP? In HDP I would propose ORC format. Its very similar to Parquet and just better supported and tested.

If your load from SQL Server is slow its most likely not the hive creation but sqoop. So you could increase the number of mappers but there might not be an easy fix. If you have the problems in the INSERT INTO you can look into the PPT for tips. ( Specifically the distribution methods near the end )

praveen_bora · ‎02-08-2016

we are on the CDH. I will have a look on the PPT. Can you answer my another comment on https://community.hortonworks.com/questions/14313/facing-issues-while-ingesting-data-into-hive.html

Cloudera Community

Support Questions

can we apply the partitioning on the already existing Hive table