Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PIg JsonLoader (UDF_WARNING_1): Bad map field, could not find start of object

avatar
Expert Contributor

Hi,

I am trying to process a sample tweet and get the complete tweet by filtering text on a particular word.

I have used following script for the same.

1:- twitter = LOAD 'sample.json' USING JsonLoader('coordinates:map[], created_at:chararray, entities:map[], favorited:chararray,id:int,favorite_count:int, id_str:chararray,metadata:map[], in_reply_to_screen_name:chararray, in_reply_to_status_id_str:chararray, place:map[], possibly_sensitive:chararray, retweet_count:int, source:chararray, text:chararray, truncated:chararray, user:map[], withheld_in_countries:{t:(country:chararray)}');

2:- filtered = FILTER twitter BY (text MATCHES '.*word.*');

3:- extracted = FOREACH filtered GENERATE text, id;

4:- dump etracted;

When i ran the script it has successfully done by showing the success at the end.

But there is no output and also i found something like

2016-08-30 17:40:22,054 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, could not find start of record 2016-08-30 17:40:22,054 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad map field, could not find start of object, field 2 2016-08-30 17:40:22,054 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.JsonLoader(UDF_WARNING_1): Bad record, returning null for {"cordinates":{"type":"Point","coordinates":["-82.695728","38.502019"]},"created_at":"Wed May 29 15:47:17 +0000 2013","current_user_retweet":null,"entities":{"hashtags":[{"indices":["64","73"],"text":"palecity"}],"symbols":[],"urls":[{"expanded_url":"http://path.com/p/2OpKGV","indices":["103","125"],"display_url":"path.com/p/2OpKGV","url":"http://t.co/s4X71J1xEv"}],"user_mentions":[]},"favorited":"false","id_str":"339769924257988608","in_reply_to_screen_name":null,"in_reply_to_status_id_str":null,"place":{"id":"14bdb1a6511724ec","place_type":"city","bounding_box":{"type":"Polygon","coordinates":[[["-82.735154","38.485755"],["-82.735154","38.545196"],["-82.674361","38.545196"],["-82.674361","38.485755"]]]},"name":"Russell","attributes":{},"country_code":"US","url":"http://api.twitter.com/1/geo/id/14bdb1a6511724ec.json","full_name":"Russell, KY","country":"United States"},"possibly_sensitive":"false","retweet_count":0,"source":"<a href=\"https://path.com/\" rel=\"nofollow\">Path</a>","text":"Getting ready word to lay out poolside for the first time this year! #palecity (at Jeff's Big Deck-South) — http://t.co/s4X71J1xEv","truncated":"false","user":{"location":"AshRussFonte, KY","default_profile":"false","profile_background_tile":"false","statuses_count":"375","lang":"en","profile_link_color":"0084B4","profile_banner_url":"https://pbs.twimg.com/profile_banners/32952244/1348409088","id":"32952244","following":null,"protected":"false","favourites_count":"60","profile_text_color":"333333","contributors_enabled":"false","verified":"false","description":"Wife, daughter, SLP and celebrity gossip enthusiast.","name":"Beth ","profile_sidebar_border_color":"C0DEED","profile_background_color":"C0DEED","created_at":"Sat Apr 18 17:36:45 +0000 2009","default_profile_image":"false","followers_count":"19","geo_enabled":"true","profile_image_url_https":"https://si0.twimg.com/profile_images/3624785207/ede699f51f98b4da3ee700da3a7ed973_normal.jpeg","profile_background_image_url":"http://a0.twimg.com/profile_background_images/75611058/599663425_xB7Dp-M.jpg","profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/75611058/599663425_xB7Dp-M.jpg","follow_request_sent":null,"url":null,"utc_offset":"-18000","time_zone":"Eastern Time (US & Canada)","notifications":null,"friends_count":"293","profile_use_background_image":"true","profile_sidebar_fill_color":"DDEEF6","screen_name":"BBS610","id_str":"32952244","profile_image_url":"http://a0.twimg.com/profile_images/3624785207/ede699f51f98b4da3ee700da3a7ed973_normal.jpeg","is_translator":"false","listed_count":"0"},"withheld_copyright":null,"withheld_in_countries":null,"withheld_scope":null}

I think the json that i have loaded is not a well formatted. please surges me for the above.

thank you.

Mohan.V

1 ACCEPTED SOLUTION

avatar
Expert Contributor

I got it on my own

I think it is because of the difference versions that i have used in my script.

When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys.

script:-

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER json-simple-1.1.1.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
extracted =foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;

dump extracted;

And it worked fine.

View solution in original post

3 REPLIES 3

avatar
Master Guru

The built-in JsonLoader has a somewhat limited functionality and expects all entries (tweets) to have the same order of elements as given in the Pig schema. So, first make sure this condition is satisfied. For example, you have in your schema "id:int" but in the record returned by warnings you don't have an integer element at that position. Also, element names are not preserved, Pig takes them one by one as given in the input, so you can as well name them a, b, c, ... You may also wish to try Elephant Bird JsonLoader which has more advanced features.

avatar
Expert Contributor

Thanks for your reply Predrag Minovic.

I have tried by using Elephant Bird JsonLoader.

script:

REGISTER piggybank.jar
REGISTER json-simple-1.1.1.jar
REGISTER elephant-bird-pig-4.3.jar
REGISTER elephant-bird-core-4.1.jar
REGISTER elephant-bird-hadoop-compat-4.3.jar
json = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader('created_at:chararray, id:chararray, id_str:chararray, text:chararray, source:chararray, in_reply_to_status_id:chararray, in_reply_to_status_id_str:chararray, in_reply_to_user_id:chararray, in_reply_to_user_id_str:chararray, in_reply_to_screen_name:chararray, geo:chararray, coordinates:chararray, place:chararray, contributors:chararray, is_quote_status:bytearray, retweet_count:long, favorite_count:chararray, entities:map[], favorited:bytearray, retweeted:bytearray, possibly_sensitive:bytearray, lang:chararray');
describe json
Schema for json unknown.			

Please suggest me.

avatar
Expert Contributor

I got it on my own

I think it is because of the difference versions that i have used in my script.

When i used the same versions of elephant bird then it worked fine for me as suggested by @gkeys.

script:-

REGISTER elephant-bird-core-4.1.jar 
REGISTER elephant-bird-hadoop-compat-4.1.jar 
REGISTER elephant-bird-pig-4.1.jar 
REGISTER json-simple-1.1.1.jar
twitter = LOAD 'sample.json' USING com.twitter.elephantbird.pig.load.JsonLoader();
extracted =foreach twitter generate (chararray)$0#'created_at' as created_at,(chararray)$0#'id' as id,(chararray)$0#'id_str' as id_str,(chararray)$0#'text' as text,(chararray)$0#'source' as source,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'entities') as entities,(boolean)$0#'favorited' as favorited,(long)$0#'favorite_count' as favorite_count,(long)$0#'retweet_count' as retweet_count,(boolean)$0#'retweeted' as retweeted,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'place') as place;

dump extracted;

And it worked fine.