About bcwalrus

bcwalrus · ‎12-05-2014

I just learned that this has nothing to do with YARN HA. So you're likely to be running into NM recovery issue. If you upgrade to Cloudera Manager 5.2.1 (or later), it'll automatically defaults the recovery dir to a non-tmp location. So you'll be good. If you can't upgrade, you can manually set that config in the previous post.

bcwalrus · ‎11-21-2014

Do you have YARN HA turned on? If so, could you add this to your NodeManagers safety valve, and then restart the NMs? <property> <name>yarn.nodemanager.recovery.dir</name> <value>/var/lib/yarn-nm-recovery</value> </property> (Please create that /var/lib/yarn-nm-recovery directory, and change the owner to the `yarn' user.) And if you're not running YARN HA, then I'm at a lost. Could you paste your NM log, from /var/log/hadoop-yarn/...?

bcwalrus · ‎10-30-2014

> why ? why ? why ? i ask myself for many times, but no answer. but i believe this 1000 has connection to that 1000. No. One is the default min user id, another one is an exit code. They happen to have the same numeric value. But no relationship. > then i am going to create hdfs directory manually You're not supposed to mess with /yarn/nm/usercache. --- First of all, why do you want to run a job as user `hdfs' or `yarn'? And are you using Cloudera Manager? Let's say you have a legit reason to use `hdfs' user. Did you restart YARN after modifying container-executor.cfg? What is the container log output on such a fail launch?

bcwalrus · ‎09-17-2014

<name>mapreduce.reduce.java.opts</name> <value>-Djava.net.preferIPv4Stack=true -Xmx1280m -Xmx825955249</value> Limits the heap to ~825MB. Most JVMs resolve duplicate args by picking the last one. So this is nowhere close to the 3GB that you intended. You should find out where you set this in CM and change it. Do that before you play with parallelcopies. But to answer your questions, yes, it'll increase CPU, memory & network usage. And it could lead to more disk spills and slow down your job.

bcwalrus · ‎09-16-2014

I'm confused. Your initial post says that the reduce heap is 2457MB. Now it seems that's just 787.69MB. Which one is right? What does /etc/hadoop/conf/mapred-site.xml say?

bcwalrus · ‎09-16-2014

Virtual memory checking is pointless. Please make sure that `yarn.nodemanager.vmem-check-enabled' is turned off. The CDH default is off already. That shouldn't matter though. You said that the job died due to OOME. It didn't die because it got killed by NM.

bcwalrus · ‎09-16-2014

The default in MR1 is unlimited, for both mapred.cluster.max.reduce.memory.mb and mapred.job.reduce.memory.mb. What did you set for mapred.child.java.opts (MR1)? Do you have the job counters from a big MR1 job? It'll tell you the average memory usage across the reducers, which will give you a good idea on what to set for MR2.

bcwalrus · ‎09-15-2014

I'm not asking about the heap of the TT process. I'm asking about the -Xmx of the reducers of this particular job (which used to work in MR1 and is failing in MR2). You said that the reducers are failing due to OOME. They're getting 2457MB in MR2. What did they get in MR1?

bcwalrus · ‎09-15-2014

What are your MR1 settings? Do reducers used to get -Xmx2457m on MR1? Also, the AM memory at 1.5GB is a bit high. You could probably cut that to 1GB.

bcwalrus · ‎08-30-2014

The "Broken pipe" suggests that the child Python process failed. I'd add logging statements in the Python code to log to a local file. You should also add a global catch-all, to log any fatal errors before exit. > if fields[index].isdigit(): That is not safe. What if your input line doesn't have that many fields. Same for: > val = int(fields[index]) It'll fail if that field cannot be converted to an int.

Online	Offline
Last Visited	‎12-24-2014 02:35 PM

Member Since	‎08-08-2013 03:22 PM
Last Visited	‎12-24-2014 02:35 PM
Posts	35
Kudos received	4

Cloudera Community

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Yarn applications hang foreever if run in para...

Re: Yarn.application.classpath issue in cloudera c...

Re: (hadoop-2.0.0-cdh4.5.0)The yarn example runs f...

Re: Yarn: One nodemanager refuse to start

Re: Yarn: One nodemanager refuse to start

Re: Cloudera Support, Very strange issue: system u...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Jobs fail in Yarn with out of Java heap memory...

Re: Not able to execute Python based Hadoop Stream...