I'm trying to tune our cluster to optimize performance.
Currently, we still have default values for hive.exec.reducers.bytes.per.reducer and hive.exec.reducers.max.
According to the documentation, in Hive 0.13, hive.exec.reducers.bytes.per.reducer should default to 256mb, but Ambari (our HDP stack is 2.2.8) appears to be defaulting this to 64mb. But on Hive 0.14, the default is the all the way up to 1GB.
And then for hive.exec.reducers.max, the HDP default is 1,009.
I'm trying to understand how best to set these values. It seems like there is a relationship between these values, the cluster specs, and also the YARN settings, and I'm trying to understand the relationship.
For hive.exec.reducers.max, I would think it should be a multiple of: number data nodes x number of CPUs per node. So for a cluster with 10 data nodes and 16 CPUs per nodes, it would probably be a multiple of 160. Right? Maybe 320 or 480?
hive.exec.reducers.bytes.per.reducer is a bit more mysterious. The default went up by a factor of 20 between 0.13 and 0.14. Why?
And then how does this all relate to YARN container sizes?
I would be interested in why hive.exec.reducers.max defaults to 1009, and what influences an appropriate choice for this setting. I couldn't find any detail on it in either of the linked articles.