Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the Oozie Heap size recommendation for production?

avatar

If a cluster needs to run 100 Oozie workflow concurrently, is there any formula to estimate oozie_heapsize?

Or is there any internal/external best practice document mentioning about heap size?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@hosako@hortonworks.com

I found this very helpful

Oozie launcher is just another MapReduce job, any configuration you can set for any MapReduce job is valid for the launcher. But the most relevant and useful ones are usually the memory and queue setting (mapreduce.map.memory.mb and mapreduce.job.queuename). The way to set these for the launcher in an Oozie workflow action is to prefix “oozie.launcher” to the setting. For example, oozie.launcher.mapreduce.map.memory.mb will control the memory for the launcher mapper itself as opposed to just mapreduce.map.memory.mb which will only influence the memory setting for the underlying MapReduce job that the Hadoop, Hive, or Pig action runs. So, if you have a Hive query which requires you to increase the client side heap size when you submit the query using the Hive CLI, remember to increase the launcher mapper’s memory when you define the Oozie action for it.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@hosako@hortonworks.com

I found this very helpful

Oozie launcher is just another MapReduce job, any configuration you can set for any MapReduce job is valid for the launcher. But the most relevant and useful ones are usually the memory and queue setting (mapreduce.map.memory.mb and mapreduce.job.queuename). The way to set these for the launcher in an Oozie workflow action is to prefix “oozie.launcher” to the setting. For example, oozie.launcher.mapreduce.map.memory.mb will control the memory for the launcher mapper itself as opposed to just mapreduce.map.memory.mb which will only influence the memory setting for the underlying MapReduce job that the Hadoop, Hive, or Pig action runs. So, if you have a Hive query which requires you to increase the client side heap size when you submit the query using the Hive CLI, remember to increase the launcher mapper’s memory when you define the Oozie action for it.

avatar

Thanks! Does this mean Oozie's Tomcat heap size would not be important to run 100 concurrentworkflow?

avatar
Master Mentor

If I am in your shoes then I will be focusing on this thread