Support Questions

hendry · ‎05-10-2018

Dear All,

I have a case, simple query (below) reserved huge memory (122G)

select * from lyr1_raw.CI_CUSTMAST_HS;

that's not partition table & only contain 11 million records.

when I check on Yarn-Job monitoring, I found the root cause.

it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container.

and per container reserved 8G.

because current memory config is like below

mapreduce.map.memory.mb=8192 #8G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
mapreduce.reduce.memory.mb=16384 #16G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
yarn.scheduler.minimum-allocation-mb=2048 #2G

when i change the config like below

mapreduce.map.memory.mb=1024 #1G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
mapreduce.reduce.memory.mb=1024 #1G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
yarn.scheduler.minimum-allocation-mb=2048 #2G

the query run with same time & only reserved 32G .

so my question is, is there any formula to config mapreduce memory?

Note: sorry for my english

Thanks

Fawze · ‎05-11-2018

@hendry Hi Hendry,

122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

To learn the mapreduce memory usage, i would recommend you to use one of the tool that can help you identifying where you are loosing ,memory.

In my side i'm usind Dr. elephant which help me to understand what is the best configuration to fully utilize my resources.

For more details:

https://github.com/linkedin/dr-elephant

hendry · ‎05-13-2018

@Fawze

thanks for let me know dr-elephant. I will try it

Spoiler

122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

agree with you. But, I just run a simple query.

Thanks

Cloudera Community

Support Questions

Best Practice Map/Reduce Memory Config