Created 08-02-2016 05:56 PM
Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig .
However while storing using HCatStore , the job failed with the error .
ob commit failed: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:264) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:262) ... 5 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://sandbox.hortonworks.com:8020/input/externalHiveData/part=1990, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:609) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:565) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:928) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249) ... 10 more
CREATE TABLE testing.emp_tab_int( empid string, name string, year int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile
load data local inpath '/somepath' overwrite into table testing.emp_tab_int ;
CREATE TABLE testing.emp_tab_part_int( empid string, name string, year int) PARTITIONED BY (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile
INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int;
A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int);
B =foreach A generate $0..,$2 as part;
Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here
hadoop fs -cat /input/incr_dat
em204,ajay,2005
em205,sikha,1990
em206,satya,1991
em207,krishna,1991
em2000,hello am new data,1990
em2001,hello am too new data ,1990 em20080,hello this is new data,2050
Created 08-04-2016 01:51 PM
@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:
Created 08-02-2016 06:05 PM
According to the documentation, you can't update partitioned or bucketed columns.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
Partitioning columns cannot be updated. Bucketing columns cannot be updated.
Created 08-03-2016 07:13 AM
Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig.
However i learned that there HCatalog cant overwrite into hive's existing partition
Its nice to have pig directly write into hive's existing partition.
Is there any patch ,,, or i
Created 08-03-2016 06:31 PM
Created 08-03-2016 06:38 PM
the limitation of HCatStorer is that table must be HCatalog managed table, it cannot be a regular Hive table. Also, datatypes must be supported by HCatalog, any other datatypes will cause problems. https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore @Prasanna Kulkarni
Created 08-04-2016 01:51 PM
@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity: