Support Questions
Find answers, ask questions, and share your expertise

Loading data from pig to Hive

Explorer

Hi

I'm trying to load data from pig to hive using hcatalog. It's just for training...

I have the following data:

year name salary 2015 Marc 100 2016 Marc 200 2017 Marc 300 2015 Lucy 100 2016 Lucy 200 2017 Lucy 300 2015 John 100 2016 John 200 2017 John 300

i created a table on Hive:

create table salary ( year int, name string, salary int );

and the following script in pig:

a = load '/user/horton/salary'; b = FOREACH a GENERATE $0 as year:int, $1 as name:chararray, $2 as salary:int; store b into 'salary' using org.apache.hive.hcatalog.pig.HCatStorer();

but, calling pig -useHCatalog, I obtain an error:

prg.apache.pig.data.DataByteArray cannot be cast to java.lang.Integer

What's wrong?

Any suggestion will be appreciated.

10 REPLIES 10

Explorer

Hey Mauro ,

You seem to have a data type compatibility issue ...

Which version of Hive and Pig are you using ?

Check this link : https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-DataTypeMappin...

Cheers

Explorer

Also make sure that l FOREACH statement in your Pig script matches with Hive DDL schema. if you dont provide a schema evrything will be considered byteArray

Explorer
Hi, I'm working on EC2 hortonworks machine for HDPCD certification exam.

Explorer

Is your Hive table already created ?

Explorer
Hi, yes, as a first step I created the table on hive

Explorer

Ok , when you load your file in pig , could you use a pig storage with the proper field separator ? similar to the following

A = LOAD 'myfile.txt' USING PigStorage('\t') AS (f1,f2,f3);

Explorer

you can change the field separator "\t" accordingly

Explorer
Hi, It's quite late... I will try and I will update the thread soon. Thanks for the moment. Bye

Explorer

No problem , Good luck

Explorer
Hi, I seems it is not able to cast in the right way. The following three pieces of code work properly: a = load '/user/horton/salary' using PigStorage('\t') as ( year:int, name:chararray, salary:int ); store a into 'salary' using org.apache.hive.hcatalog.pig.HCatStorer(); a = load '/user/horton/salary' as ( year:int, name:chararray, salary:int ); store a into 'salary' using org.apache.hive.hcatalog.pig.HCatStorer(); a = load '/user/horton/salary'; b = FOREACH a GENERATE (int)$0 as year:int, (chararray)$1 as name:chararray, (int)$2 as salary:int; store b into 'salary' using org.apache.hive.hcatalog.pig.HCatStorer(); Thanks
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.