Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CPU time when using HCatalog API in MapReduce


CPU time when using HCatalog API in MapReduce




When we use the HCatalog APIs in MapReduce (HCatalog InputFormat and OutputFormat), the CPU time used by the jobs seems to shoot up drastically (nearly 80% more) than when not using HCatalog when selecting data from Hive.


The operation being done in MapReduce is an operation done by a normal Hive query - select date,count(1) from table group by date ;



Any specific reasons for this ? Please let me know, Thanks!



Re: CPU time when using HCatalog API in MapReduce

Super Collaborator

I would expect a higher CPU load when using the hcatalog but not that high. Can you check if you are spending a lot of time in GC's (garbage collections) you might need to give the mappers/reducers a little more memory to work with to compensate for the extra work they are doing.




Don't have an account?
Coming from Hortonworks? Activate your account here