Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

mapred.submit.replication for jobs that has the Jobtracker as the submitted host

avatar
Master Collaborator

Hi,

 

I see some jobs in my cluster that submitted the job via the Jobtracker node.

 

Looking at all data nodes and mapred.submit.replication is 2, in the job tracker mapred-site.xml there is no mapred.submit.replication property, i added it manually to the file and restarted the job tracker, but still see in the job file for the running jobs  that has job tracker as the Submit Host the mapred.submit.replication is 10 and not 2.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

I manage to solve by adding mapred-site.xml at the oozie server under /etc/hadoop/conf and overwriting the submit replication

View solution in original post

5 REPLIES 5

avatar
Mentor
That property is job-applied, not server controlled. Wherever you are
submitting your job from, the local or in-code configuration isn't loading
your custom value, so the default value would get used instead. An
application can usually discover your configs if the directory carrying the
config XML files is on the application's classpath. Read more at
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html

BTW you shouldn't be using MRv1 anymore, its deprecated. Use YARN with its
MRv2 (although, the above fact still remains the same).

avatar
Master Collaborator

Hi,

 

Can i enforce this at the cluster level?

 

This is the coord job configuration for the running job, vlpr-mha01 is acts as JT and NN.

 

<configuration>
<property>
<name>jobType</name>
<value>rm</value>
</property>
<property>
<name>dwhType</name>
<value>da</value>
</property>
<property>
<name>oozie.coord.application.path</name>
<value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/sched/</value>
</property>
<property>
<name>recycleBinDir</name>
<value>hdfs://vlpr-mha01:54310/liveperson/data/server_dataaccess_retention/recycle_bin/</value>
</property>
<property>
<name>freq</name>
<value>1440</value>
</property>
<property>
<name>workflowAppUri</name>
<value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/sched/</value>
</property>
<property>
<name>start</name>
<value>2014-03-02T10:24Z</value>
</property>
<property>
<name>user.name</name>
<value>dataaccess</value>
</property>
<property>
<name>jobRoot</name>
<value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1</value>
</property>
<property>
<name>workingOnDir</name>
<value>hdfs://vlpr-mha01:54310/liveperson/data/server_dataaccess_retention/recycle_bin/</value>
</property>
<property>
<name>oozie.libpath</name>
<value>hdfs://vlpr-mha01:54310/liveperson/code/server_dataaccess_retention/lp-dataaccess-retention-1.0.0.1/lib</value>
</property>
<property>
<name>nameNode</name>
<value>hdfs://vlpr-mha01:54310</value>
</property>
<property>
<name>end</name>
<value>2020-01-01T00:00Z</value>
</property>
<property>
<name>jobTracker</name>
<value>vlpr-mha01:54311</value>
</property>
</configuration>

 

 

This is an old cluster that trying not make changes at the job level, should be dead in 6 months

avatar
Mentor
You can pass custom into your WF actions that affect most
action types that load them automatically. If you use actions within
the workflow however, then Oozie prepares a file that you need to manually
load into the code, this is described at
http://archive.cloudera.com/cdh5/cdh/5/oozie/WorkflowFunctionalSpec.html#a3.2.7_Java_Action

Replication for MR submit files (jars/etc.) is a client-side action, it
cannot be controlled by a central server.

avatar
Master Collaborator

I want to find the place that i can disable passing specific host to submit the job through.

 

Is see that the oozie launcher for the job is submitting from slpr-mha01 which is the JT,NN and Oozie node but he the job itself is submitted through DN.

 

The jobs are scheduled using Oozie.

avatar
Master Collaborator

I manage to solve by adding mapred-site.xml at the oozie server under /etc/hadoop/conf and overwriting the submit replication