Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Configure PIG in HUE to work with Hive tables using HCatalog

Configure PIG in HUE to work with Hive tables using HCatalog

Contributor

Hi All,

 

I am using HUE3.6 in CDH 5.1. I am facing issue with pig to read the Hive tables. I think there has to be some configuration need to be done for Hcatalog 

Can someone please point out the document to configure hue to work with pig.

Following is the errorstack: 

 

2014-09-09 10:55:58,010 [main] INFO  org.apache.pig.Main  - Apache Pig version 0.12.0-cdh5.1.0 (rexported) compiled Jul 12 2014, 08:41:26
2014-09-09 10:55:58,012 [main] INFO org.apache.pig.Main - Logging error messages to: /yarn/nm/usercache/cloudera/appcache/application_1410276487009_0001/container_1410276487009_0001_01_...
2014-09-09 10:55:58,067 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2014-09-09 10:55:58,189 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-09-09 10:55:58,189 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-09 10:55:58,189 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://txwlcloud2:8020
2014-09-09 10:55:58,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: txwlcloud2:8032
2014-09-09 10:55:59,013 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-09-09 10:55:59,013 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2014-09-09 10:55:59,013 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-09-09 10:55:59,014 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2014-09-09 10:55:59,014 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2014-09-09 10:55:59,014 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2014-09-09 10:55:59,014 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2014-09-09 10:55:59,276 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2014-09-09 10:55:59,327 [main] INFO org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
2014-09-09 10:55:59,692 [main] INFO DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
2014-09-09 10:56:00,004 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
Failed to parse: Can not retrieve schema from loader org.apache.hcatalog.pig.HCatLoader@d47f419
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1409)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:342)
at org.apache.pig.PigServer.executeBatch(PigServer.java:367)
at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:287)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:227)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: Can not retrieve schema from loader org.apache.hcatalog.pig.HCatLoader@d47f419
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:91)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:853)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 30 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:179)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
... 37 more
Caused by: java.io.IOException: java.lang.Exception: Could not instantiate a HiveMetaStoreClient connecting to server uri:[null]
at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:191)
at org.apache.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:194)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
... 38 more
Caused by: java.lang.Exception: Could not instantiate a HiveMetaStoreClient connecting to server uri:[null]
at org.apache.hcatalog.pig.PigHCatUtil.getHiveMetaClient(PigHCatUtil.java:152)
at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:186)
... 40 more
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at org.apache.hcatalog.common.HiveClientCache.getOrCreate(HiveClientCache.java:167)
at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:143)
at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:548)
at org.apache.hcatalog.pig.PigHCatUtil.getHiveMetaClient(PigHCatUtil.java:150)
... 41 more
6 REPLIES 6
Highlighted

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Contributor

I found the partial solution 

1) Load the hive-site.xml to the job properties

2) add jars to the pig shared folder

 

This might help 

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Good to know!

And normally you just need to specify the hive-site.xml and that's it. I
created https://issues.cloudera.org/browse/HUE-2326 to improve this.

http://gethue.com/hadoop-tutorial-how-to-access-hive-in-pig-with/


Romain

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Contributor

Thanks for your reply but now i am only getting heart beat in the hue editor 

 

attaching the log 

 

Apache Pig version 0.12.0-cdh5.1.2 (rexported) 
compiled Aug 25 2014, 19:51:48

Run pig script using PigRunner.run() for Pig version 0.8+
2014-09-11 11:52:03,939 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.1.2 (rexported) compiled Aug 25 2014, 19:51:48
2014-09-11 11:52:03,940 [main] INFO org.apache.pig.Main - Logging error messages to: /yarn/nm/usercache/cloudera/appcache/application_1410447707862_0001/container_1410447707862_0001_01_...
2014-09-11 11:52:04,002 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2014-09-11 11:52:04,126 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-09-11 11:52:04,127 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-11 11:52:04,127 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://txwlcloud1:8020
2014-09-11 11:52:04,134 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: txwlcloud2:8032
2014-09-11 11:52:05,046 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-09-11 11:52:05,046 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2014-09-11 11:52:05,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-09-11 11:52:05,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2014-09-11 11:52:05,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2014-09-11 11:52:05,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2014-09-11 11:52:05,047 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2014-09-11 11:52:05,198 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
2014-09-11 11:52:05,268 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://txwlcloud1:9083
2014-09-11 11:52:05,343 [main] INFO hive.metastore - Connected to metastore.
2014-09-11 11:52:05,722 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-11 11:52:05,838 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
2014-09-11 11:52:05,906 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-11 11:52:06,033 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-09-11 11:52:06,063 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-11 11:52:06,196 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-11 11:52:06,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-11 11:52:06,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-11 11:52:06,377 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at txwlcloud2/10.215.204.203:8032
2014-09-11 11:52:06,456 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-11 11:52:06,531 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2014-09-11 11:52:06,531 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-11 11:52:06,531 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2014-09-11 11:52:06,600 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
2014-09-11 11:52:06,752 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-09-11 11:52:06,755 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2014-09-11 11:52:06,756 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4660172352052698884.jar
2014-09-11 11:52:09,726 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4660172352052698884.jar created
2014-09-11 11:52:09,726 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2014-09-11 11:52:09,764 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-09-11 11:52:09,853 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-09-11 11:52:09,856 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2014-09-11 11:52:09,873 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at txwlcloud2/10.215.204.203:8032
2014-09-11 11:52:11,091 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-09-11 11:52:11,155 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
2014-09-11 11:52:11,171 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-09-11 11:52:11,336 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2014-09-11 11:52:11,514 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1410447707862_0002
2014-09-11 11:52:11,515 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: mapreduce.job, Service: job_1410447707862_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5415cf91)
2014-09-11 11:52:11,515 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Kind: RM_DELEGATION_TOKEN, Service: 10.215.204.203:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1410454288077, maxDate=1411059088077, sequenceNumber=2, masterKeyId=2)
2014-09-11 11:52:12,572 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud1:8020/user/oozie/share/lib/lib_20140910164806/pig/commons-httpclient-3.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud1:8020/user/oozie/share/lib/lib_20140910164806/hcatalog/commons-httpclient-3.1.jar This will be an error in Hadoop 2.0
2014-09-11 11:52:12,576 [JobControl] WARN org.apache.hadoop.mapreduce.v2.util.MRApps - cache file (mapreduce.job.cache.files) hdfs://txwlcloud1:8020/user/oozie/share/lib/lib_20140910164806/pig/commons-io-2.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://txwlcloud1:8020/user/oozie/share/lib/lib_20140910164806/hcatalog/commons-io-2.1.jar This will be an error in Hadoop 2.0
2014-09-11 11:52:12,819 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1410447707862_0002
2014-09-11 11:52:12,910 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://txwlcloud2:8088/proxy/application_1410447707862_0002/
2014-09-11 11:52:12,911 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1410447707862_0002
2014-09-11 11:52:12,911 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A
2014-09-11 11:52:12,911 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4] C: R:
2014-09-11 11:52:12,911 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://txwlcloud2:50030/jobdetails.jsp?jobid=job_1410447707862_0002
2014-09-11 11:52:12,991 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Contributor

i tried running the same query 

A = LOAD 'revenue_subs' USING org.apache.hcatalog.pig.HCatLoader();
DUMP A;

 

thru cli

>pig -useHCatalog pigtest1.pig

 

the job ran just fine but when I am running thu hue I am getting the issue... 

I do not know I am missing something in the configuration... 

 

educeLayer.MapReduceLauncher  - 0% complete
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat

Can someone please help I am dealing with this for last two days...

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Re: Configure PIG in HUE to work with Hive tables using HCatalog

Contributor

Romain,

That is really helpful information. Thanks you so much for that. I incerased the number of apps per namenode to 4. However I resolved the issue by changing some of the resource setting at Yarn resource management configuration. like 

 

mapreduce.map.memory.mb

yarn.nodemanager.resource.memory-mb

ApplicationMaster Java Maximum Heap Size

 

Though it will be great if I could find a doc to see what will be ideal values for thiese configurations. When I installed using CM it placed some value(I think based on the cluster capability). When I start seeing individual configurations I see there are default values. Which come cases are higher than current value in some cases lower. It will be great if I could find some place where these number allocation strategy will be explained. 

 

For now it is working. And thanks again for your help.

 

Amit