Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to append or overwrite the existin partition in Hive using HCatStorer.

avatar
Explorer

Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig .

However while storing using HCatStore , the job failed with the error .

ob commit failed: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:264) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:262) ... 5 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://sandbox.hortonworks.com:8020/input/externalHiveData/part=1990, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:609) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:565) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:928) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249) ... 10 more

CREATE TABLE testing.emp_tab_int( empid string, name string, year int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile

load data local inpath '/somepath' overwrite into table testing.emp_tab_int ;

CREATE TABLE testing.emp_tab_part_int( empid string, name string, year int) PARTITIONED BY (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile

INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int;

A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int);

B =foreach A generate $0..,$2 as part;

Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here

hadoop fs -cat /input/incr_dat

em204,ajay,2005

em205,sikha,1990

em206,satya,1991

em207,krishna,1991

em2000,hello am new data,1990

em2001,hello am too new data ,1990 em20080,hello this is new data,2050

1 ACCEPTED SOLUTION

avatar
Super Guru

@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:

https://issues.apache.org/jira/browse/HIVE-6897

https://issues.apache.org/jira/browse/HCATALOG-551

View solution in original post

5 REPLIES 5

avatar
Super Guru

According to the documentation, you can't update partitioned or bucketed columns.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update

Partitioning columns cannot be updated.
Bucketing columns cannot be updated.

avatar
Explorer

Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig.

However i learned that there HCatalog cant overwrite into hive's existing partition

Its nice to have pig directly write into hive's existing partition.

Is there any patch ,,, or i

avatar
Master Mentor

avatar
Master Mentor

the limitation of HCatStorer is that table must be HCatalog managed table, it cannot be a regular Hive table. Also, datatypes must be supported by HCatalog, any other datatypes will cause problems. https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore @Prasanna Kulkarni

avatar
Super Guru

@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:

https://issues.apache.org/jira/browse/HIVE-6897

https://issues.apache.org/jira/browse/HCATALOG-551