Reply
Explorer
Posts: 10
Registered: ‎04-17-2018

Best Practice Map/Reduce Memory Config

[ Edited ]

Dear All,

 

I have a case, simple query (below) reserved huge memory (122G)

 

select * from lyr1_raw.CI_CUSTMAST_HS;

that's not partition table & only contain 11 million records.

 

when I check on Yarn-Job monitoring, I found the root cause.

it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container. 

and per container reserved 8G.

because current memory config is like below

 

mapreduce.map.memory.mb=8192 #8G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
mapreduce.reduce.memory.mb=16384 #16G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
yarn.scheduler.minimum-allocation-mb=2048 #2G

 

when i change the config like below

 

mapreduce.map.memory.mb=1024 #1G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
mapreduce.reduce.memory.mb=1024 #1G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
yarn.scheduler.minimum-allocation-mb=2048 #2G

the query run with same time & only reserved 32G .

 

so my question is, is there any formula to config mapreduce memory?

 

 

Note: sorry for my english

 

Thanks

Expert Contributor
Posts: 316
Registered: ‎01-25-2017

Re: Best Practice Map/Reduce Memory Config

@hendry Hi Hendry,

 

122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

 

To learn the mapreduce memory usage, i would recommend you to use one of the tool that can help you identifying where you are loosing ,memory.

 

In my side i'm usind Dr. elephant which help me to understand what is the best configuration to fully utilize my resources.

 

For more details:

 

https://github.com/linkedin/dr-elephant

Highlighted
Explorer
Posts: 10
Registered: ‎04-17-2018

Re: Best Practice Map/Reduce Memory Config

@Fawze

thanks for let me know dr-elephant. I will try it

 

Spoiler
122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

agree with you. But, I just run a simple query.

 

Thanks

Announcements