28806
DISCUSSIONS
102200
MEMBERS
3161
ARTICLES
Created on 04-18-2016 01:42 PM - edited 04-18-2016 01:44 PM
I am using WebHCat REST interface to launch Pig jobs in Cloudera CDH5.5.2 environment.
First, I added a WebHCat service through Cloudera Manager on one of the nodes. Then, I compressed the Pig installation from "/opt/cloudera/parcels/CDH/lib/pig" and uploaded the tarball into HDFS. Finally, I made the following configuration changes in "WebHCat Server Configuration Safety Valve for webhcat-site.xml".
<property> <name>templeton.libjars</name> <value>${env.TEMPLETON_HOME}/share/webhcat/svr/lib/zookeeper-3.4.5-cdh5.5.2.jar</value> <description>Jars to add to the classpath.</description> </property> <property> <name>templeton.storage.root</name> <value>/user/webhcat</value> <description>The path to the directory to use for storage</description> </property> <property> <name>templeton.pig.archive</name> <value>hdfs:///user/webhcat/pig-0.12.0-cdh5.5.2.tar.gz</value> <description>The path to the Pig archive.</description> </property> <property> <name>templeton.pig.path</name> <value>pig-0.12.0-cdh5.5.2.tar.gz/pig/bin/pig</value> <description>The path to the Pig path.</description> </property>
When I invoke Pig scripts using curl, WebHCat launches a TempletonControllerJob which has one map task as expected. This job in turn is NOT launching the actual job from the REST API call. In the Resource Manager page, I only see the controller job (parent job), but PigLatin job (child job) could not be seen.
However, the controller job is completed and the status getting succeeded, while looking inside this parent job, the actual Pig scripts are getting executed in the local instance. I am expecting the child jobs to be executed as a separate MR job in Hadoop cluster.
Why is the controller job not launching a separate MR job for the pig scripts? Do I need to make any specific configuration changes?