Support Questions

AceWinner · ‎05-18-2016

Hi,

We have a 12 nodes cluster running CDH 5.4.9 under CM 5.4.5.

Each node has 12 cpus (24 vcores with HT), 128gb of ram and 6 HDDs.
Each of them runs a HDFS datanode daemon, a yarn nodemanager and an impala daemon

Cluster is pretty simple : no Hbase, no sentry, no kerberos.

Since we are on 5.4.9, we were running Impala's ressource management inside of YARN. Because this setup is not supported in CDH 5.5, we wanted to separate Impala from Yarn and reconfigure the static pools accordingly.

These are the steps that were taken on May 17th
1-Shutdown Impala
2-Delete the 2 llama role instances
3-Modify the Impala config to remove Yarn as a ressource manager
4-Start the Static Pool config wizard
5-Set the % as : 5% for HDFS, 10% for Impala and 85% for Yarn. (was 5% HDFS and 95% Yarn before)
6-Restart everything.

Now, after the restart, everything looks like it's running fine. Hive and Impala query runs fine and no errors whatsoever.

This morning (May 18th), we noticed that most of our ETLs (mainly Hive jobs) did not run. Looking up the "Yarn Applications" page in Cloudera Manager, there were 3 running applications and a huge list of pending ones. The 3 running applications had the EXACT same problem :

1-M/R was started with 3000+mapper to run and 1099 reducers
2-Most of the mappers completed successfully
3-The reducer started the copy phase while the rest of the mappers continued their job
4-Then, at one point, the job hangs because, for some reasons, pending mappers are never started.

So we get stuck with a job that has 2-3 pending mappers and 100+ running reducers and it stays like that forever because for some reasons, the pending mappers never get submitted.

At first, I noticed that we had a failover of the ressource manager during the night but it was unrelated : re-running the query ends up doing the exact same problem!

What I've tried :
1-Disabled CGroups altogether --> all M/R jobs now failing
2-Rollback my config by manually setting back Impala inside of Yarn using Llama, re-enabling CGroups and rolling back the following config :
Default Value for : Container Memory Maximum,Container Virtual CPU Cores Maximum,Cgroup CPU Shares,Cgroup I/O Weight
Old values for : Default Number of Reduce Tasks per Job, Container Memory

#2 did the trick but I'm back to square 1 where I can't upgrade to CDH5.5 or newer.

Any clues on where to start the investigation? It's a bit of a pain since this behavior cannot be replicated on our test cluster (not enough data), only in Production...

Thanks!

Wilfred · ‎05-18-2016

I am not sure if you posted to the old mailing list before but the numbers seem to similar to not be the same question.

If you have a case like you describe the reducers can take over the cluster and cause the deadlock like Haibo said. We have fixed some issues in this behaviour in later CDH releases than what you are on.

The number of containers that you can run at the same tim ein the cluster I estimate is somewhere in the 250 to 300 range at a maximum. The only way to prevent that the clusterwill be taken over by just reducers is to set slow start for the job to 1. It might slow down the job a bit but you should never see this kind of deadlock again.

Try it and let us know how it went,

Wilfred

View solution in original post

haibochen · ‎05-18-2016

Sounds a lot like a deadlock problem in MapReduce. The only solution I can think of is to kill the job and start over. Because the chances of getting into a deadlock are very rare, the job should be able to progress at the second run. The deadlock problem has been seen in 5.7 as well. Until it is completely solved in apache hadoop, it will stay for a while.

Wilfred · ‎05-18-2016

I am not sure if you posted to the old mailing list before but the numbers seem to similar to not be the same question.

If you have a case like you describe the reducers can take over the cluster and cause the deadlock like Haibo said. We have fixed some issues in this behaviour in later CDH releases than what you are on.

The number of containers that you can run at the same tim ein the cluster I estimate is somewhere in the 250 to 300 range at a maximum. The only way to prevent that the clusterwill be taken over by just reducers is to set slow start for the job to 1. It might slow down the job a bit but you should never see this kind of deadlock again.

Try it and let us know how it went,

Wilfred

AceWinner · ‎05-19-2016

We must be very unlucky because our specific pattern of ETLs resulted in almost 100% reproductivity of the deadlock when Yarn had less ressources.

I thought I had fully rolled back the config but turns out I had forgotten 2 settings and we still had problems (not 100% but around 10% of the time, it hung)

Container Memory Maximum : Wizard set it to 85gb, put it back down to 64gb
Container Virtual CPU Cores Maximum : Wizard set it to 24, put it back down to 32

With a proper rollback to previous config, last night's run went well.

I'll monitor the situation and switch back to static pools and make sure to double check everything, including setting the slow start parameter.

The whole plan was to upgrade to CDH5.7, the first step being dropping LLama.

I'll followup once the change is done.

Thanks

haibochen · ‎05-19-2016

Sorry, missed the point that the issue can be reproduced 100%. In CDH5.4, both YARN and MR have some bugs that together in some cases, can cause deadlock or jobs to slow down dramatically. In your case, it is more likely the latter. Keep us posted on how CDH5.7 behaves.

AceWinner · ‎05-24-2016

Alright,

Quick update : with the slow start setting to "1", and impala in his own pool with Yarn having a bit less memory no more deadlocks. Job ran a little bit slower but this is to be expected.

Planning the upgrade to CDH5.7 this week and we'll set it back to 0.8 when upgraded.

I'll keep this thread updated.

Wilfred · ‎05-24-2016

Thank you for the update.

It might be better to start a new thread for new issues that are encountered so we do not get tripped up by old information and can take a fresh look at the issue you have.

Wilfred

AceWinner · ‎05-25-2016

Right.

Just accepted your reply as a solution.

thanks

Cloudera Community

Support Questions

Odd behavior when pending mappers get stuck on a huge M/R job after Static Pool reconfig for Impala