Created 12-16-2016 06:45 AM
Created 12-16-2016 06:50 AM
it's easier to drop a table and recreate, delete from a table and truncate are not supported by API
set hcat.bin /usr/bin/hcat; sql drop table if exists codes; sql create table codes(code string, description string, total_emp int, salary int); a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader(); b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader(); c = join b by code, a by code;d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();
Created 12-16-2016 06:50 AM
it's easier to drop a table and recreate, delete from a table and truncate are not supported by API
set hcat.bin /usr/bin/hcat; sql drop table if exists codes; sql create table codes(code string, description string, total_emp int, salary int); a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader(); b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader(); c = join b by code, a by code;d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();
Created 12-16-2016 06:55 AM
Thanks for you response, The above code is writing back the data to another table after the transformation.however, i don't want to delete the data/ drop the table and i want to overwrite the same table after my transformation in pig.
Created 12-16-2016 07:16 AM
then you need a temporary table to store the intermediary data. This is ugly but works, there's probably a better way but it's late where I am 🙂
set hcat.bin /usr/bin/hcat; a = load 'codes' using org.apache.hive.hcatalog.pig.HCatLoader(); b = foreach a generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; sql drop table if exists codes_temp; sql create table codes_temp(code string, description string, total_emp int, salary int); store b into 'codes_temp' using org.apache.hive.hcatalog.pig.HCatStorer(); sql drop table if exists codes; sql create table codes(code string, description string, total_emp int, salary int); c = load 'codes_temp' using org.apache.hive.hcatalog.pig.HCatLoader(); d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; store d into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer(); sql drop table if exists codes_temp;
Created 12-16-2016 07:26 AM
@Praveen PentaReddy I think I understand what you mean now, you may have run into a bug. What version of Pig and HCatalog are you running? I just tested on Sandbox with HDP 2.5 and it works
set hcat.bin /usr/bin/hcat; a = load 'codes' using org.apache.hive.hcatalog.pig.HCatLoader(); b = foreach a generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; store b into 'codes' using org.apache.hive.hcatalog.pig.HCatStorer();
my codes table contained 823 rows, after this execution, it contains 1646 as expected
per hcatalog wiki
You can write to a non-partitioned table simply by using HCatStorer. The contents of the table will be overwritten:
store z into 'web_data' using org.apache.hive.hcatalog.pig.HCatStorer();
https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-HCatStorer
Created 12-16-2016 03:56 PM
Thanks for your replay.
Should i need to add the first step(set command bin/hcat) every time when i launch Hcat.
I am using Hadoop 2.7.1.2.4.2.0-258 (version)
Apache Pig version 0.15.0.2.4.2.0-258 (rexported) - verison
Hive 1.2.1000.2.4.2.0-258 - version,
Created 12-16-2016 04:08 PM
that's interesting, do you have another environment to test your script on? Like I said I have no issues on my end. The answer to your question about set command, I answered that in the following thread https://community.hortonworks.com/questions/1954/hcatbin-is-not-defined-define-it-to-be-your-hcat-s....
If my answer was useful and resolved your issue, please accept it as best.
Created 12-16-2016 04:13 PM
Yaah.. i have another version and let me know test on it and will get back with you with the result.
Created 12-16-2016 06:29 PM
Per your last conversation it looks like HcatStorer is not overwriting the data, rather it is appending the data.
In your last test, you were having the record count of 823 before the transformation and after the transformation you stored back the results and the result count is 1646.(which is nothing but appending the data).
My expectations are below.
Read the records of 823 and do transformation on it and update back the result and once updated the result count should be 823(overwriting the data with new transformation data) and not 1643.
Is this kind of overwriting the data is possible through HCatStorer ? if not is there any other alternatives ways of overwriting the data in pig ( Eg :- PigStorage or any other functions).
Created 12-17-2016 03:25 AM
Is my workaround with a temporary table not good enough for your use case? Save output of your transformation to a temp table, drop and create your original table and save your temp result back.