- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
PIG script Error
- Labels:
-
Apache Hadoop
-
Apache HBase
-
Apache Pig
Created ‎09-10-2016 08:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am new to pig.trying filter the text file and store it in hbase
here is the sample input file
sample.txt
{"pattern":"google_1473491793_265244074740","tweets":[{"tweet::created_at":"18:47:31 ","tweet::id":"252479809098223616","tweet::user_id":"450990391","tweet::text":"rt @joey7barton: ..give a google about whether the americans wins a ryder cup. i mean surely he has slightly more important matters. #fami ..."}]} {"pattern":"facebook_1473491793_265244074740","tweets":[{"tweet::created_at":"11:33:16 ","tweet::id":"252370526411051008","tweet::user_id":"845912316","tweet::text":"@maarionymcmb facebook mere ta dit tu va resté chez toi dnc tu restes !"}]}
Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {t1:tuple(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern == 'google_*'; grouped = foreach (group A by pattern){tweets1 = foreach data generate tweets.(created_at),tweets.(id),tweets.(user_id),tweets.(text); generate group as pattern1,tweets1;}
But i got the error when run grouped
2016-09-10 13:38:52,995 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: <line 41, column 57> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)
Please correct me what i am doung wrong.
thank you
Created ‎09-11-2016 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think i got it on my own.
As gkeys said, i made it too complex.
But at last I have realized that I don't need the 3rd step which is grouping, and it is successfully stored into the hbase.
Here is the Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern =='google_*'; STORE A into 'hbase://tablename' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tweets:tweets');
Created ‎09-10-2016 12:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a lot going on here -- when writing a complex script like this, the following approach is useful to build and debug:
- run locally against a small subset of records (pig -x local -f <scriptOnLocalFileSystem>.pig). This makes each instance of the script run faster.
- build each statement line by line until you get to the failure statement (run the first statement, add the second and run, etc until it fails). When it fails you need to focus on the last statement and fix it.
- These steps are good for finding grammar issues (which it looks like you have based on the error statement). If you also want to make sure your data is being processed correctly, put a DUMP statement after each line during each iteration. That way you can inspect the results of each statement
- If using inline statements like your grouped = statement, separate out at first until it works. This makes the issue easier to isolate.
Let me know how that goes.
Created ‎09-10-2016 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your valuable suggestions gkeys.
I didnt expected that it will beome a complex script like this.
As i said that I am just a beginner in Pig.
So please suggest me solve for the same.
Created ‎09-11-2016 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think i got it on my own.
As gkeys said, i made it too complex.
But at last I have realized that I don't need the 3rd step which is grouping, and it is successfully stored into the hbase.
Here is the Script:-
data = load 'sample.txt' using JsonLoader('pattern:chararray, tweets: bag {(tweet::created_at: chararray,tweet::id: chararray,tweet::user_id: chararray,tweet::text: chararray)}'); A = FILTER data BY pattern =='google_*'; STORE A into 'hbase://tablename' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('tweets:tweets');
Created ‎09-11-2016 12:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂
