Reply
New Contributor
Posts: 1
Registered: ‎11-14-2013

can we control the output part files of mapper with no reducer?

Prob statement: Read data from partitoned hive table to another partitioned hive table

By using hive insert query ,it is taking lot of time.Wana optimize it,So I am using MapReduce program to do this by avoiding suffle sort phases. USing only mapper with zero reducers.block size is 512GB and input data size is 1TB.So it taking 2810 mappers.I am writing MultipleOutput format to load in partitions like

/user/hive/warehouse/viji/visit_yr="2013"/month="12"/date="2"/....

My problem here is... Mappers emiting 1 lakh output part files. means in each partition it has 1620 output part files.

/user/hive/warehouse/viji/visit_yr="2013"/month="12"/date="2"/* |wc -l

1620

like this i have 12 months and 30 days so total part files = 3*12*1620 = 1 lakh +

even though it is copying data very fast..while fetching query is taking lot of time as there 1 lkh part files ...

can any one please help me..how to control the part files from mappers output.

 

Thanks,

Viji

Highlighted
Posts: 416
Topics: 51
Kudos: 86
Solutions: 49
Registered: ‎06-26-2013

Re: can we control the output part files of mapper with no reducer?

I think the crux of your question relates to mapreduce, so I have moved this thread to that discussion board in the hopes that some MR experts can help you here.

 

Regards,

 

Clint

Announcements