Support Questions

PentaReddy · ‎12-16-2016

aervits · ‎12-16-2016

it's easier to drop a table and recreate, delete from a table and truncate are not supported by API

set hcat.bin /usr/bin/hcat;
sql drop table if exists codes;
sql create table codes(code string, description string, total_emp int, salary int);
a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader();
b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader();
c = join b by code, a by code;d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();

View solution in original post

PentaReddy · ‎12-18-2016

I think that's the only solution i think is the best at the moment using temporary table. Does Overwrite exists in Pig? Like Hive Overwrite table data.

aervits · ‎12-18-2016

According to the wiki it exists but in our mutual tests, we were not able to fit your use case. Perhaps you'd like to open an enhancement Jira. I do like integration of Pig and HCatalog though I would argue instead of spending hours to figure it out, consider looking into Spark as it's under a heavy development and benefits from a wider community. If my workaround fits your needs, please consider accepting my answer as best.

PentaReddy · ‎12-19-2016

Yes, This is testing only and i am doing it out of my curiosity. Thanks you very much for your valuable responses.

Could you let me know how to open an enhancement Jira request ?

aervits · ‎12-19-2016

@Praveen PentaReddy create an Apache Jira account and open a ticket on https://issues.apache.org/jira/browse/pig/?selectedTab=com.atlassian.jira.jira-projects-plugin:summa... apache pig jira page.

PentaReddy · ‎12-19-2016

Thanks, i have raised an enhancement Jira.

aervits · ‎12-20-2016

Can you paste a link to your Jira and accept my answer as workaround. Thanks

aervits · ‎12-29-2016

@Praveen PentaReddy

to close the loop on this, turns out append is the default behavior and if you read the comments in https://issues.apache.org/jira/browse/HIVE-6897 you can see that it is not advisable to force an overwrite of a table via HCatalog. So to turn the feature off completely and not promote "bad" behavior I did the following

grunt> sql alter table codeZ set TBLPROPERTIES ('immutable' = 'true');
2016-12-29 20:49:51,924 [main] INFO  org.apache.pig.tools.grunt.GruntParser - Going to run hcat command: alter table codeZ set TBLPROPERTIES ('immutable' = 'true');
OK
Time taken: 2.041 seconds
grunt> a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader();
2016-12-29 20:51:00,125 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-12-29 20:51:00,163 [main] INFO  hive.metastore - Connected to metastore.
grunt> b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader();
grunt> c = join b by code, a by code;
grunt> d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
grunt> store d into 'codeZ' using org.apache.hive.hcatalog.pig.HCatStorer();
2016-12-29 20:52:26,894 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000:
<line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow:
org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez
Details at logfile: /home/guest/pig_1483044536026.log
grunt> quit
2016-12-29 20:56:25,336 [main] INFO  org.apache.pig.Main - Pig script completed in 7 minutes, 29 seconds and 397 milliseconds (449397 ms)
[guest@sandbox ~]$ less /home/guest/pig_1483044536026.log

and snippet from log

ERROR 6000:
<line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow:
org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez


org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias d
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:707)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1075)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:566)
        at org.apache.pig.Main.main(Main.java:178)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 6000:
<line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow:
org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:95)
        at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
        at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
        at org.apache.pig.PigServer$Graph.compile(PigServer.java:1851)
        at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1527)
        at org.apache.pig.PigServer.execute(PigServer.java:1440)
        at org.apache.pig.PigServer.access$500(PigServer.java:118)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1773)
        ... 14 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez
        at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.handleDuplicatePublish(FileOutputFormatContainer.java:206)
        at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.checkOutputSpecs(FileOutputFormatContainer.java:121)
        at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:65)
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69)

in other words, it is not advisable to overwrite via HCatStorer as Hive handles append/overwrite. The only workaround here is to use a temporary table as I suggested earlier.

PentaReddy · ‎12-16-2016

Any Help highly appreciated.

PentaReddy · ‎12-20-2016

https://issues.apache.org/jira/browse/PIG-5079

cstanca · ‎12-20-2016

@Praveen PentaReddy

Thanks for submitting the JIRA ticket. I reviewed Artem's response and it seems that until the enhancement is implemented, his response is still the best option as of right now.

Cloudera Community

Support Questions

HCatStorer is not overwriting a Hive table. or How to overwrite a hive table using HCatStorer ?