Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie Mapreduce Error: Requested Replication 10 exceeds HDFS maximum

Highlighted

Oozie Mapreduce Error: Requested Replication 10 exceeds HDFS maximum

New Contributor

I have a Cloudera Standard 4.7.2 managed cdh3u6 cluster with oozie support installed. The oozie service seems to be running fine, and I am able to see my killed jobs in the oozie 1.0 web ui.

Because we only have 4 Hadoop Datanodes, I set the maximum HDFS Replication factor to 4 to prevent under replicated blocks. Because of this setting my oozie map reduce jobs are Failing with this message:

2013-11-20 11:01:51,637 WARN org.apache.oozie.command.wf.ActionStartCommand: USER[root] GROUP[users] TOKEN[] APP[bi2r-batch-data-processing-framework] JOB[0000000-131120110118947-oozie-oozi-W] ACTION[0000000-131120110118947-oozie-oozi-W@bi2r-data-ingestion-action] Error starting action [bi2r-data-ingestion-action]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: java.io.IOException: file /user/root/.staging/job_201311151031_0053/job.split.
Requested replication 10 exceeds maximum 4

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyReplication(FSNamesystem.java:1170)
…(large trace through hadoop dfs classes and such)
at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:531)
... 10 more

I have not had this issue running these map reduce jobs with the normal hadoop -job command.

I have tried a few solutions already to no success. I have tried using both options below with the property names, dfs.replication, dfs.replication.max, and mapred.submit.replication. However, after much reading it seems as though mapred.submit.replication is the proper name.

1.) Add the following to the Oozie Server Configuration Safety Valve for oozie-site.xml in Cloudera Manager, then restart the oozie service. I have also tried editing the oozie-site.xml and oozie-default.xml files directly and then bouncing the Oozie service. Still no luck.

<property>
<name>mapred.submit.replication</name>
<value>3</value>
</property>
2.) Update the Oozie workflow.xml for the job and set the DFS replication inside the job configuration:
<job-xml>conf/bi2r-mapreduce.xml</job-xml>
<configuration>
<property>
<name>mapred.submit.replication</name>
<value>3</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>


Neither solution has worked for me yet. I would appreciate any information someone may have on this topic. Here are some sites I have been using for reference:
http://mail-archives.apache.org/mod_mbox/oozie-user/201307.mbox/<CE1BF5B6.6C008%25chitnis@yahoo-inc....
http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
https://groups.google.com/a/cloudera.org/forum/#!topic/scm-users/6QPJ8e2hwHk

 

I have also verified that my mapred-site.xml has mapred.submit.replication set to 1 and that value is shown in the cloudera managed version of mapred-site.xml on the nodes after deploying client config.

 

Thanks in advance to anyone that can help,
Charles

http://championofcyrodiil.blogspot.com
3 REPLIES 3

Re: Oozie Mapreduce Error: Requested Replication 10 exceeds HDFS maximum

New Contributor

I have also tried two seperate oozie actions.  Java Action (commented out now), and the MapReduce Action.  Both resulted in default replication factor of 10.  So I think the issue is occuring when the client node stages the resources on hdfs after jobtracker gives it a job id.

 

<action name="bi2r-data-ingestion-action">

 

<!-- <java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>conf/bi2r-mapreduce.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>com.somepackage.MapredMainClass</main-class>
</java> -->


<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>conf/bi2r-mapreduce.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.submit.replication</name>
<value>4</value>
</property>
</configuration>
</map-reduce>
<ok to="distr-copy-action" />
<error to="fail" />
</action>

http://championofcyrodiil.blogspot.com

Re: Oozie Mapreduce Error: Requested Replication 10 exceeds HDFS maximum

Contributor

I haven't tested this, but can you try adding the following property to your aciton's <configuration> section?

dfs.replication=3

 

You can also try setting the following property in your hdfs-site configuration; you may have to put it in the safety valve for HDFS if its not exposed in CM

dfs.replication.max=10

 

As a general point, I'd strongly recommend you upgrade to a version of CDH 4.x (ideally the latest).  We've made some signficant improvements in Oozie since CDH 3uX, especially for Coordinators.  CDH 3 has also reached end-of-life.  

Software Engineer | Cloudera, Inc. | http://cloudera.com

Re: Oozie Mapreduce Error: Requested Replication 10 exceeds HDFS maximum

New Contributor
Thanks for the reply.

I'll try the action configuration dfs.replication and post a follow up. I initially set the dfs.replication.max to the number(4) of data nodes i had.

I recall that with the default of 10, having more replicas than datanodes resulted in a lot under replicated blocks. The logging reports the information as a warning, so I lowered the max replication factor to prevent software from replicating data beyond the number of datanodes in the cluster. I assumed it would be best to limit replication at the server level since i don't have control over all the source? Do under replicated blocks cause problems with a namenode's performance? or is it just a warning to alert the administrator of the data loss risk? This would be an acceptable risk in my development environment, i think.

I understand the general point. Unfortunately I work under busniess requirements and use cdh3u6 to replicate the production environment.
http://championofcyrodiil.blogspot.com