Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best Practice Map/Reduce Memory Config

avatar
Explorer

Dear All,

 

I have a case, simple query (below) reserved huge memory (122G)

 

select * from lyr1_raw.CI_CUSTMAST_HS;

that's not partition table & only contain 11 million records.

 

when I check on Yarn-Job monitoring, I found the root cause.

it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container. 

and per container reserved 8G.

because current memory config is like below

 

mapreduce.map.memory.mb=8192 #8G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
mapreduce.reduce.memory.mb=16384 #16G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
yarn.scheduler.minimum-allocation-mb=2048 #2G

 

when i change the config like below

 

mapreduce.map.memory.mb=1024 #1G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
mapreduce.reduce.memory.mb=1024 #1G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
yarn.scheduler.minimum-allocation-mb=2048 #2G

the query run with same time & only reserved 32G .

 

so my question is, is there any formula to config mapreduce memory?

 

 

Note: sorry for my english

 

Thanks

2 REPLIES 2

avatar
Master Collaborator

@hendry Hi Hendry,

 

122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

 

To learn the mapreduce memory usage, i would recommend you to use one of the tool that can help you identifying where you are loosing ,memory.

 

In my side i'm usind Dr. elephant which help me to understand what is the best configuration to fully utilize my resources.

 

For more details:

 

https://github.com/linkedin/dr-elephant

avatar
Explorer

@Fawze

thanks for let me know dr-elephant. I will try it

 

Spoiler
122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

agree with you. But, I just run a simple query.

 

Thanks