Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

WebHCat service - Pig job runs in local

Highlighted

WebHCat service - Pig job runs in local

New Contributor

I am using WebHCat REST interface to launch Pig jobs in Cloudera CDH5.5.2 environment.

 

First, I added a WebHCat service through Cloudera Manager on one of the nodes. Then, I compressed the Pig installation from "/opt/cloudera/parcels/CDH/lib/pig" and uploaded the tarball into HDFS. Finally, I made the following configuration changes in "WebHCat Server Configuration Safety Valve for webhcat-site.xml".

 

 

<property>
  <name>templeton.libjars</name>
  <value>${env.TEMPLETON_HOME}/share/webhcat/svr/lib/zookeeper-3.4.5-cdh5.5.2.jar</value>
  <description>Jars to add to the classpath.</description>
</property>
<property>
<name>templeton.storage.root</name>
<value>/user/webhcat</value>
<description>The path to the directory to use for storage</description>
</property>
<property>
  <name>templeton.pig.archive</name>
  <value>hdfs:///user/webhcat/pig-0.12.0-cdh5.5.2.tar.gz</value>
  <description>The path to the Pig archive.</description>
</property>
<property>
  <name>templeton.pig.path</name>
  <value>pig-0.12.0-cdh5.5.2.tar.gz/pig/bin/pig</value>
  <description>The path to the Pig path.</description>
</property>

 

 

When I invoke Pig scripts using curl, WebHCat launches a TempletonControllerJob which has one map task as expected. This job in turn is NOT launching the actual job from the REST API call. In the Resource Manager page, I only see the controller job (parent job), but PigLatin job (child job) could not be seen.

 

However, the controller job is completed and the status getting succeeded, while looking inside this parent job, the actual Pig scripts are getting executed in the local instance. I am expecting the child jobs to be executed as a separate MR job in Hadoop cluster.

 

Why is the controller job not launching a separate MR job for the pig scripts? Do I need to make any specific configuration changes?

1 REPLY 1

Re: WebHCat service - Pig job runs in local

New Contributor

We have exactly the same problem with hive. I hope there is some solution because this makes webhcat totally unusable.

Don't have an account?
Coming from Hortonworks? Activate your account here