Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Best Practice Map/Reduce Memory Config

Best Practice Map/Reduce Memory Config

Explorer

Dear All,

 

I have a case, simple query (below) reserved huge memory (122G)

 

select * from lyr1_raw.CI_CUSTMAST_HS;

that's not partition table & only contain 11 million records.

 

when I check on Yarn-Job monitoring, I found the root cause.

it consume huge memory, because when hive run that query, it need 15 Map task & create 15 container. 

and per container reserved 8G.

because current memory config is like below

 

mapreduce.map.memory.mb=8192 #8G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
mapreduce.reduce.memory.mb=16384 #16G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
yarn.scheduler.minimum-allocation-mb=2048 #2G

 

when i change the config like below

 

mapreduce.map.memory.mb=1024 #1G
mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
mapreduce.reduce.memory.mb=1024 #1G
mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx748m
yarn.scheduler.minimum-allocation-mb=2048 #2G

the query run with same time & only reserved 32G .

 

so my question is, is there any formula to config mapreduce memory?

 

 

Note: sorry for my english

 

Thanks

2 REPLIES 2

Re: Best Practice Map/Reduce Memory Config

Super Collaborator

@hendry Hi Hendry,

 

122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

 

To learn the mapreduce memory usage, i would recommend you to use one of the tool that can help you identifying where you are loosing ,memory.

 

In my side i'm usind Dr. elephant which help me to understand what is the best configuration to fully utilize my resources.

 

For more details:

 

https://github.com/linkedin/dr-elephant

Re: Best Practice Map/Reduce Memory Config

Explorer

@Fawze

thanks for let me know dr-elephant. I will try it

 

Spoiler
122 GB in the big Data is not considered as too much, and it's depend on the logic you are doing in the map, normally the logic in the reducer which you should cosider increasing its memory.

agree with you. But, I just run a simple query.

 

Thanks