Support Questions

Find answers, ask questions, and share your expertise

How to append or overwrite the existin partition in Hive using HCatStorer.

avatar
Explorer

Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig .

However while storing using HCatStore , the job failed with the error .

ob commit failed: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:264) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:262) ... 5 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://sandbox.hortonworks.com:8020/input/externalHiveData/part=1990, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:609) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:565) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:928) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249) ... 10 more

CREATE TABLE testing.emp_tab_int( empid string, name string, year int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile

load data local inpath '/somepath' overwrite into table testing.emp_tab_int ;

CREATE TABLE testing.emp_tab_part_int( empid string, name string, year int) PARTITIONED BY (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile

INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int;

A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int);

B =foreach A generate $0..,$2 as part;

Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here

hadoop fs -cat /input/incr_dat

em204,ajay,2005

em205,sikha,1990

em206,satya,1991

em207,krishna,1991

em2000,hello am new data,1990

em2001,hello am too new data ,1990 em20080,hello this is new data,2050

1 ACCEPTED SOLUTION

avatar
Super Guru

@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:

https://issues.apache.org/jira/browse/HIVE-6897

https://issues.apache.org/jira/browse/HCATALOG-551

View solution in original post

5 REPLIES 5

avatar
Super Guru

According to the documentation, you can't update partitioned or bucketed columns.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update

Partitioning columns cannot be updated.
Bucketing columns cannot be updated.

avatar
Explorer

Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig.

However i learned that there HCatalog cant overwrite into hive's existing partition

Its nice to have pig directly write into hive's existing partition.

Is there any patch ,,, or i

avatar
Master Mentor

avatar
Master Mentor

the limitation of HCatStorer is that table must be HCatalog managed table, it cannot be a regular Hive table. Also, datatypes must be supported by HCatalog, any other datatypes will cause problems. https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore @Prasanna Kulkarni

avatar
Super Guru

@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:

https://issues.apache.org/jira/browse/HIVE-6897

https://issues.apache.org/jira/browse/HCATALOG-551