I am running my pig scripts and Hive queries in tez mode. For all of these pig scripts/Hive queries the mapper memory requested was more than the memory used. So I changed the mapreduce.map.memory.mb to a lesser value and also changed the mapreduce.map.java.opts.
Even after changing these values, my mapper memory requested is more than the map memory used, nothing seemed to changed in performance metrics. (This was from analyzing the job in Dr. elephant), but then the pig script also aborted now with below error message after changing these settings.
"java.lang.IllegalArgumentException: tez.runtime.io.sort.mb 1638 should be larger than 0 and should be less than the available task memory (MB):786"
I never gave 786 MB anywhere in my setting, where did it take this value from?
And also, how do I configure the map and reduce memory in tez execution mode? (I see documentation for hive to set then hive.tez.container.size, but nothing for pig).
And is it possible to configure the map and reduce memory differently in tez mode? since in hive on tez documentation it was just mentioned about the map memory setting nothing for reducer memory. And also since tez creates a dag of tasks, they are not like map reduce right, both map and reduce are just seen as an individual task in DAG? or are these DAG tasks still can be classified into mapper/reducer actions?
Thanks for the suggestion. I have not tried these parameters.. What are these parameters for? Are these the ones that help set the mapper memory size in pig?
Since you running pig while hive.execution.engine is in tez mode, you can tune these parameters OR set the upper limit in hive-env either ways you should be able to control how much memory is allocated for your job.
This community article explains the ideal values in detail:
Try that and see if that helps.