Created 08-02-2016 05:56 PM
Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig .
However while storing using HCatStore , the job failed with the error .
ob commit failed: java.lang.reflect.InvocationTargetException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob( at$EventProcessor.handleJobCommit( at$ at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( at sun.reflect.DelegatingMethodAccessorImpl.invoke( at java.lang.reflect.Method.invoke( at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob( ... 5 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs( at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs( at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions( at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob( ... 10 more
CREATE TABLE testing.emp_tab_int( empid string, name string, year int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile
load data local inpath '/somepath' overwrite into table testing.emp_tab_int ;
CREATE TABLE testing.emp_tab_part_int( empid string, name string, year int) PARTITIONED BY (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile
INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int;
A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int);
B =foreach A generate $0..,$2 as part;
Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here
hadoop fs -cat /input/incr_dat
em2000,hello am new data,1990
em2001,hello am too new data ,1990 em20080,hello this is new data,2050
Created 08-04-2016 01:51 PM
@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity:
Created 08-02-2016 06:05 PM
According to the documentation, you can't update partitioned or bucketed columns.
Partitioning columns cannot be updated. Bucketing columns cannot be updated.
Created 08-03-2016 07:13 AM
Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig.
However i learned that there HCatalog cant overwrite into hive's existing partition
Its nice to have pig directly write into hive's existing partition.
Is there any patch ,,, or i
Created 08-03-2016 06:31 PM
Created 08-03-2016 06:38 PM
the limitation of HCatStorer is that table must be HCatalog managed table, it cannot be a regular Hive table. Also, datatypes must be supported by HCatalog, any other datatypes will cause problems. @Prasanna Kulkarni
Created 08-04-2016 01:51 PM
@Prasanna Kulkarni It looks like there is are JIRAs for this. They are not resolved and there hasn't been any recent activity: