New Contributor
Posts: 3
Registered: ‎03-16-2016

WebHCat service - Pig job runs in local

[ Edited ]

I am using WebHCat REST interface to launch Pig jobs in Cloudera CDH5.5.2 environment.


First, I added a WebHCat service through Cloudera Manager on one of the nodes. Then, I compressed the Pig installation from "/opt/cloudera/parcels/CDH/lib/pig" and uploaded the tarball into HDFS. Finally, I made the following configuration changes in "WebHCat Server Configuration Safety Valve for webhcat-site.xml".



  <description>Jars to add to the classpath.</description>
<description>The path to the directory to use for storage</description>
  <description>The path to the Pig archive.</description>
  <description>The path to the Pig path.</description>



When I invoke Pig scripts using curl, WebHCat launches a TempletonControllerJob which has one map task as expected. This job in turn is NOT launching the actual job from the REST API call. In the Resource Manager page, I only see the controller job (parent job), but PigLatin job (child job) could not be seen.


However, the controller job is completed and the status getting succeeded, while looking inside this parent job, the actual Pig scripts are getting executed in the local instance. I am expecting the child jobs to be executed as a separate MR job in Hadoop cluster.


Why is the controller job not launching a separate MR job for the pig scripts? Do I need to make any specific configuration changes?

New Contributor
Posts: 1
Registered: ‎04-26-2016

Re: WebHCat service - Pig job runs in local

We have exactly the same problem with hive. I hope there is some solution because this makes webhcat totally unusable.