I have a case, simple query (below) reserved huge memory (122G)
select * from lyr1_raw.CI_CUSTMAST_HS;
that's not partition table & only contain 11 million records.
when I check on Yarn-Job monitoring, I found the root cause.
it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container.
and per container reserved 8G.
because current memory config is like below
mapreduce.map.memory.mb=8192 #8G mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G mapreduce.reduce.memory.mb=16384 #16G mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G yarn.scheduler.minimum-allocation-mb=2048 #2G
when i change the config like below
mapreduce.map.memory.mb=1024 #1G mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m mapreduce.reduce.memory.mb=1024 #1G mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m yarn.scheduler.minimum-allocation-mb=2048 #2G
the query run with same time & only reserved 32G .
so my question is, is there any formula to config mapreduce memory?
Note: sorry for my english
@hendry Hi Hendry,
122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.
To learn the mapreduce memory usage, i would recommend you to use one of the tool that can help you identifying where you are loosing ,memory.
In my side i'm usind Dr. elephant which help me to understand what is the best configuration to fully utilize my resources.
For more details:
thanks for let me know dr-elephant. I will try it
agree with you. But, I just run a simple query.