Dear All,
I have a case, simple query (below) reserved huge memory (122G)
select * from lyr1_raw.CI_CUSTMAST_HS;
that's not partition table & only contain 11 million records.
when I check on Yarn-Job monitoring, I found the root cause.
it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container.
and per container reserved 8G.
because current memory config is like below
mapreduce.map.memory.mb=8192 #8G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
mapreduce.reduce.memory.mb=16384 #16G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
yarn.scheduler.minimum-allocation-mb=2048 #2G
when i change the config like below
mapreduce.map.memory.mb=1024 #1G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
mapreduce.reduce.memory.mb=1024 #1G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
yarn.scheduler.minimum-allocation-mb=2048 #2G
the query run with same time & only reserved 32G .
so my question is, is there any formula to config mapreduce memory?
Note: sorry for my english
Thanks