Reply
Highlighted
Explorer
Posts: 62
Registered: ‎01-22-2014

CPU time when using HCatalog API in MapReduce

Hi,

 

When we use the HCatalog APIs in MapReduce (HCatalog InputFormat and OutputFormat), the CPU time used by the jobs seems to shoot up drastically (nearly 80% more) than when not using HCatalog when selecting data from Hive.

 

The operation being done in MapReduce is an operation done by a normal Hive query - select date,count(1) from table group by date ;

 

 

Any specific reasons for this ? Please let me know, Thanks!

 

Cloudera Employee
Posts: 322
Registered: ‎01-16-2014

Re: CPU time when using HCatalog API in MapReduce

I would expect a higher CPU load when using the hcatalog but not that high. Can you check if you are spending a lot of time in GC's (garbage collections) you might need to give the mappers/reducers a little more memory to work with to compensate for the extra work they are doing.

 

Wilfred