I'm relativly new to the whole hadoop topic. I managed to install CDH5 using CM and the Free Edition. The problem with the very first example I'm trying is, that it is accepted by the yarn node, but never executed.
The MR job is running fine in local mode. I've already read alot about resource configuration for the containers, but I do not really think that this is the problem.
If I simply scp the jar into the yarn master and execute it over there the job is accepted and executed without any errors.
$ hadoop jar my.jar MyDriver -conf conf/hadoop-yarn.yaml input output
does NOT work.
fs.defaultFS = hdfs://cdh5-master/
mapreduce.frameworkname = yarn
yarn.resourcemanager.address = cdh5-master:8032
scp my.jar hadoop@cdh5-master:.
yarn jar my.jar MyDriver input output
DOES work without any issues.
Could anyone help me out and point me into the direction I'm currenlty missing ?
Can you provide a bit more context as to what services are configured on the node that you're submitting the job from, where it is not executing?
If the submitting client node either (1) does not have any YARN services configured on it, (2) doesn't have a YARN gateway role configured, or (3) does not contain an updated YARN client config exported from CM, then it may not have all the necessary configs needed to properly run the job. Can you kindly check if that is the case?
If the client node is part of the cluster (i.e. a node managed by CM) but does not have a YARN service setup on it, or YARN client (gateway) config present, you can add it from CM via CM -> YARN -> Instances -> Add Role Instances -> Gateway, save changes and Deploy Client Configs. Another method to check is to select another node within the cluster that does have a YARN service configured (such as a NodeManager, JobHistoryServer, or ResourceManager service), and submit it from there. If it does run on this node, then we can confirm there is some missing configuration on the original client node that you submitted the job from initially.
My setup had 2 nodemanager, was missing YARN service on node1. Jobs completed after adding node1 as NodeManager.
added it from CM via CM -> YARN -> Instances -> Add Role Instances -> Gateway
Thanks for the additional feedback Fewcents! To clarify, the steps provided (CM -> YARN -> Instances -> Add Role Instances -> Gateway) added a Node 1 as a YARN Gateway, which effectively pushes the YARN client configurations to Node 1, which will then allow Node 1 to properly submit YARN jobs to the cluster.
Unless Node 1 was added also as a NodeManager service, adding Node 1 as a NodeManager isn't required to submit jobs from it, only that it is specified as a YARN Gateway.