11-14-2013 03:35 PM
Prob statement: Read data from partitoned hive table to another partitioned hive table
By using hive insert query ,it is taking lot of time.Wana optimize it,So I am using MapReduce program to do this by avoiding suffle sort phases. USing only mapper with zero reducers.block size is 512GB and input data size is 1TB.So it taking 2810 mappers.I am writing MultipleOutput format to load in partitions like
My problem here is... Mappers emiting 1 lakh output part files. means in each partition it has 1620 output part files.
/user/hive/warehouse/viji/visit_yr="2013"/month="12"/date="2"/* |wc -l
like this i have 12 months and 30 days so total part files = 3*12*1620 = 1 lakh +
even though it is copying data very fast..while fetching query is taking lot of time as there 1 lkh part files ...
can any one please help me..how to control the part files from mappers output.
11-14-2013 04:03 PM
I think the crux of your question relates to mapreduce, so I have moved this thread to that discussion board in the hopes that some MR experts can help you here.