Created 09-10-2016 08:12 AM
I am new to pig.trying filter the text file and store it in hbase
here is the sample input file
sample.txt
{"pattern":"google_1473491793_265244074740","tweets":[{"tweet::created_at":"18:47:31 ","tweet::id":"252479809098223616","tweet::user_id":"450990391","tweet::text":"rt @joey7barton: ..give a google about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ..."}]} {"pattern":"facebook_1473491793_265244074740","tweets":[{"tweet::created_at":"11:33:16 ","tweet::id":"252370526411051008","tweet::user_id":"845912316","tweet::text":"@maarionymcmb facebook mere ta dit tu va resté chez toi dnc tu restes !"}]}
Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {t1:tuple(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern == 'google_*'; grouped = foreach (group A by pattern){tweets1 = foreach data generate tweets.(created_at),tweets.(id),tweets.(user_id),tweets.(text); generate group as pattern1,tweets1;}
But i got the error when run grouped
2016-09-10 13:38:52,995 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 41, column 57> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)
Please correct me what i am doung wrong.
thank you
Created 09-11-2016 07:04 AM
I think i got it on my own.
As gkeys said, i made it too complex.
But at last I have realized that I don't need the 3rd step which is grouping, and it is successfully stored into the hbase.
Here is the Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern =='google_*'; STORE A into 'hbase://tablename' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tweets:tweets');
Created 09-10-2016 12:31 PM
There is a lot going on here -- when writing a complex script like this, the following approach is useful to build and debug:
Let me know how that goes.
Created 09-10-2016 02:24 PM
Thank you for your valuable suggestions gkeys.
I didnt expected that it will beome a complex script like this.
As i said that I am just a beginner in Pig.
So please suggest me solve for the same.
Created 09-11-2016 07:04 AM
I think i got it on my own.
As gkeys said, i made it too complex.
But at last I have realized that I don't need the 3rd step which is grouping, and it is successfully stored into the hbase.
Here is the Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern =='google_*'; STORE A into 'hbase://tablename' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tweets:tweets');
Created 09-11-2016 12:48 PM
Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂