CDH Quickstart 5.4 Pig and HCatalog


I have written a query in Pig Editor and on execution its giving an error. I am not sure how to resolve this. I am using Hue 3.7.0.

The query is:-


stock_a = LOAD 'nyse_stocks' USING org.apache.hcatalog.pig.HCatLoader();
DESCRIBE stock_a


The table nyse_stocks is created in Metastore tables. I have also added hive-site.xml file in the properties of the script.



The log that I have is:-



Apache Pig version 0.12.0-cdh5.4.2 (rexported) 
compiled May 19 2015, 17:03:41

Run pig script using for Pig version 0.8+
2015-08-11 12:23:47,368 [uber-SubtaskRunner] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.4.2 (rexported) compiled May 19 2015, 17:03:41
2015-08-11 12:23:47,390 [uber-SubtaskRunner] INFO org.apache.pig.Main - Logging error messages to: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1438833065513_0...
2015-08-11 12:23:47,738 [uber-SubtaskRunner] INFO org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2015-08-11 12:23:48,938 [uber-SubtaskRunner] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-08-11 12:23:48,947 [uber-SubtaskRunner] INFO org.apache.hadoop.conf.Configuration.deprecation - is deprecated. Instead, use fs.defaultFS
2015-08-11 12:23:48,947 [uber-SubtaskRunner] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020
2015-08-11 12:23:48,990 [uber-SubtaskRunner] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8032
2015-08-11 12:23:49,016 [uber-SubtaskRunner] WARN org.apache.pig.PigServer - Empty string specified for jar path


After I visit the error link above I see:-



Cannot access: /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1438833065513_0010/container_1438833065513_0010_01_000001/pig-job_1438833065513_0010.log. Note: You are a Hue admin but not a HDFS superuser (which is "hdfs").

[Errno 2] File /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1438833065513_0010/container_1438833065513_0010_01_000001/pig-job_1438833065513_0010.log not found

Also I see:-


<file script.pig, line 1, column 35> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]



Can anyone guide me in solving this issue.


New Contributor

When I use:-

 stock_a = LOAD '/user/hive/warehouse/nyse_stocks/NYSE-2000-2001.tsv.gz' USING PigStorage() as (exchange:chararray, stock_symbol:chararray, date:chararray, stock_price_open:float,stock_price_high:float, stock_price_low:float,stock_price_close:float,stock_volume:double, stock_price_adj_close:float);


instead of

stock_a = LOAD 'nyse_stocks' USING org.apache.hcatalog.pig.HCatLoader();



the query executes successfully.


Not sure what's the problem while using HCatalog.

Cloudera Employee

Hi G_Arti,


Could you try using org.apache.hive.hcatalog.pig.HCatLoader() instead of org.apache.hcatalog.pig.HCatLoader() ?


As part of HCatalog moving to the Hive project, all client facing classes were moved to from org.apache.hcatalog to

 org.apache.hive.hcatalog resulting in the org.apache.hcatalog being deprecated starting CDH 5.3.


