Support Questions

Find answers, ask questions, and share your expertise

HCatStorer is not overwriting a Hive table. or How to overwrite a hive table using HCatStorer ?

avatar
Expert Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Mentor

it's easier to drop a table and recreate, delete from a table and truncate are not supported by API

set hcat.bin /usr/bin/hcat;
sql drop table if exists codes;
sql create table codes(code string, description string, total_emp int, salary int);
a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader();
b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader();
c = join b by code, a by code;d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();

View solution in original post

19 REPLIES 19

avatar
Master Mentor

it's easier to drop a table and recreate, delete from a table and truncate are not supported by API

set hcat.bin /usr/bin/hcat;
sql drop table if exists codes;
sql create table codes(code string, description string, total_emp int, salary int);
a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader();
b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader();
c = join b by code, a by code;d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();

avatar
Expert Contributor

Thanks for you response, The above code is writing back the data to another table after the transformation.however, i don't want to delete the data/ drop the table and i want to overwrite the same table after my transformation in pig.

avatar
Master Mentor

then you need a temporary table to store the intermediary data. This is ugly but works, there's probably a better way but it's late where I am 🙂

set hcat.bin /usr/bin/hcat;
a = load 'codes' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
sql drop table if exists codes_temp;
sql create table codes_temp(code string, description string, total_emp int, salary int);
store b into 'codes_temp' using org.apache.hive.hcatalog.pig.HCatStorer();
sql drop table if exists codes;
sql create table codes(code string, description string, total_emp int, salary int);
c = load 'codes_temp' using org.apache.hive.hcatalog.pig.HCatLoader();
d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();
sql drop table if exists codes_temp;

avatar
Master Mentor

@Praveen PentaReddy I think I understand what you mean now, you may have run into a bug. What version of Pig and HCatalog are you running? I just tested on Sandbox with HDP 2.5 and it works

set hcat.bin /usr/bin/hcat;
a = load 'codes' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate $0 as code, $1 as description, $2 as total_emp, $3 as salary;
store b into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();

my codes table contained 823 rows, after this execution, it contains 1646 as expected

per hcatalog wiki

You can write to a non-partitioned table simply by using HCatStorer. The contents of the table will be overwritten:

store z into 'web_data' using org.apache.hive.hcatalog.pig.HCatStorer();

https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-HCatStorer

avatar
Expert Contributor

Thanks for your replay.

Should i need to add the first step(set command bin/hcat) every time when i launch Hcat.

I am using Hadoop 2.7.1.2.4.2.0-258 (version)

Apache Pig version 0.15.0.2.4.2.0-258 (rexported) - verison

Hive 1.2.1000.2.4.2.0-258 - version,

avatar
Master Mentor
@Praveen PentaReddy

that's interesting, do you have another environment to test your script on? Like I said I have no issues on my end. The answer to your question about set command, I answered that in the following thread https://community.hortonworks.com/questions/1954/hcatbin-is-not-defined-define-it-to-be-your-hcat-s....

If my answer was useful and resolved your issue, please accept it as best.

avatar
Expert Contributor

Yaah.. i have another version and let me know test on it and will get back with you with the result.

avatar
Expert Contributor

Per your last conversation it looks like HcatStorer is not overwriting the data, rather it is appending the data.

In your last test, you were having the record count of 823 before the transformation and after the transformation you stored back the results and the result count is 1646.(which is nothing but appending the data).

My expectations are below.

Read the records of 823 and do transformation on it and update back the result and once updated the result count should be 823(overwriting the data with new transformation data) and not 1643.

Is this kind of overwriting the data is possible through HCatStorer ? if not is there any other alternatives ways of overwriting the data in pig ( Eg :- PigStorage or any other functions).

avatar
Master Mentor

Is my workaround with a temporary table not good enough for your use case? Save output of your transformation to a temp table, drop and create your original table and save your temp result back.