Support Questions
Find answers, ask questions, and share your expertise

Converting Nested Json into flat file using pig script

Converting Nested Json into flat file using pig script

Explorer

I am facing issues in converting nested json into flat file using pig. I am trying to process using various methods but none of them are giving me the exact solution. Below is the sample dataset and different methods I am using. The below method is printing only 1st 3 columns. Can someone provide me the exact solution that prints all the data.

{"id":"abcdefgh","key":"abcdefgh","value":{"rev":"1-cdfasefqwvc"},"doc":{"_id":"abcdefgh","_rev":"1-cdfasefqwvc","geometry":{"coordinates":[0.40799993419202753,51.5822858308476],"type":"Point"},"properties":{"frTraj":{"altitude":0,"customerLinkID":"12345","distance":138.7360491082252,"drivingDirection":1,"fixedLat":71.5823084728586,"fixedLon":0.19202754079399934,"heading":0,"latitude":81.582625,"linkID":"12345","longitude":0.824043,"moID":"123457","speed":0.2129800021648407,"tripID":"abbcdd-1234-abcd-a84a-abcd","ts":12345678},"packet":{"meta":{"account":"abcd","event":"track"},"payload":{"asset":"1234567","connection_id":12433453542,"connection_id_str":"325443543432","fields":{"GPS_SPEED":{"b64_value":"wsfwef","value":0.80000212900000003}},"id":443547204223,"id_str":"7443540707204223","index":4435403,"loc":[0.44435443,51.44354625],"received_at":"2022-04-22T14:39:14Z","recorded_at":"2012-04-22T14:37:56Z","recorded_at_ms":"2000-04-22T14:37:56.000Z","streams_received_at":"2022-04-22T14:39:14Z"}},"playId":"123456"},"type":"feature"}},
{"id":"abcdefgh","key":"abcdefgh","value":{"rev":"1-cdfasefqwvc"},"doc":{"_id":"abcdefgh","_rev":"1-cdfasefqwvc","geometry":{"coordinates":[0.40799993419202753,51.5822858308476],"type":"Point"},"properties":{"frTraj":{"altitude":0,"customerLinkID":"12345","distance":138.7360491082252,"drivingDirection":1,"fixedLat":71.5823084728586,"fixedLon":0.19202754079399934,"heading":0,"latitude":81.582625,"linkID":"12345","longitude":0.824043,"moID":"123457","speed":0.2129800021648407,"tripID":"abbcdd-1234-abcd-a84a-abcd","ts":12345678},"packet":{"meta":{"account":"abcd","event":"track"},"payload":{"asset":"1234567","connection_id":12433453542,"connection_id_str":"325443543432","fields":{"GPS_SPEED":{"b64_value":"wsfwef","value":0.80000212900000003}},"id":443547204223,"id_str":"7443540707204223","index":4435403,"loc":[0.44435443,51.44354625],"received_at":"2022-04-22T14:39:14Z","recorded_at":"2012-04-22T14:37:56Z","recorded_at_ms":"2000-04-22T14:37:56.000Z","streams_received_at":"2022-04-22T14:39:14Z"}},"playId":"123456"},"type":"feature"}}
a = LOAD '/user/rahul/cloudtest.json' USING JsonLoader ('id:chararray, key:chararray, value:(rev:chararray), doc:(id:chararray, rev:chararray,geometry:(coordinates: (i: chararray, j: chararray), type:chararray),properties:(frTraj:(altitude:chararray, customerLinkID:chararray, distance:chararray, drivingDirection:chararray, fixedLat:chararray, fixedLon:chararray, heading:chararray, latitude:chararray, linkID:chararray, longitude:chararray, moID:chararray, speed:chararray, tripID:chararray, ts:chararray),packet:(meta:(account:chararray, event:chararray), payload:(asset:chararray, connectionid:chararray, connectionidstr:chararray, fields:( GPSSPEED: (b64value: chararray, value: chararray)), id:chararray, idstr:chararray, index:chararray, loc:(i: chararray, j: chararray), receivedat:chararray, recordedat:chararray, recordedatms:chararray, streamsreceivedat:chararray)),playId:chararray),type:chararray)');
dump a;

Thanks in Advance.

-Rahul

8 REPLIES 8

Re: Converting Nested Json into flat file using pig script

Re: Converting Nested Json into flat file using pig script

Explorer

Hi Neeraj,

I already checked this document and its not working when i am trying with the same syntax that it is showing.

Thanks,

Rahul

Re: Converting Nested Json into flat file using pig script

@Rahul Reddy Kamuru Whats the error?

Re: Converting Nested Json into flat file using pig script

Explorer

I am getting syntax error. Unexpected symbol at or near 'chararray'.

Thanks,

Rahul

Re: Converting Nested Json into flat file using pig script

@Rahul Reddy Kamuru
Did you get any information on line number ? 
Unexpected symbol at or near 'chararray'.

a = LOAD '/user/rahul/cloudtest.json' USING JsonLoader ('id:chararray, key:chararray, value:(rev:chararray), doc:(id:chararray, rev:chararray,geometry:(coordinates: (i: chararray, j: chararray), type:chararray),properties:(frTraj:(altitude:chararray, customerLinkID:chararray, distance:chararray, drivingDirection:chararray, fixedLat:chararray, fixedLon:chararray, heading:chararray, latitude:chararray, linkID:chararray, longitude:chararray, moID:chararray, speed:chararray, tripID:chararray, ts:chararray),packet:(meta:(account:chararray, event:chararray), payload:(asset:chararray, connectionid:chararray, connectionidstr:chararray, fields:( GPSSPEED: (b64value: chararray, value: chararray)), id:chararray, idstr:chararray, index:chararray, loc:(i: chararray, j: chararray), receivedat:chararray, recordedat:chararray, recordedatms:chararray, streamsreceivedat:chararray)),playId:chararray),type:chararray)'); 
dump a;

Re: Converting Nested Json into flat file using pig script

Mentor

@Rahul Reddy Kamuru

Which version of Pig do you use?

I noticed some white spaces hence the syntax error. Unexpected symbol at or near 'chararray'.

eg (i: chararray, j: chararray)

fields:( GPSSPEED: (b64value: chararray, value: chararray))

fields:( GPSSPEED: (b64value: chararray, value: chararray))

Take note depending on the pig version it will e interpreted differently! Can you use Vi to remove the white space

Here is a quick snippet

First delete the blank lines:

:g/^\s*$/d

Then use a substitution (:s///) over each line (%) to replace all (g) continuous whitespace (\s\+) with a comma (,).

:%s/\s\+/,/g

Re: Converting Nested Json into flat file using pig script

Mentor

@Rahul Reddy Kamuru

Geoffrey is correct, you have blanks in your schema. Pasting the findings here as answer doesn't fit in comments section. Once blanks are removed, I get the following result

output of describe a;

a: {id: chararray,key: chararray,value: (rev: chararray),doc: (id: chararray,rev: chararray,geometry: (coordinates: (i: chararray,j: chararray),type: chararray),properties: (frTraj: (altitude: chararray,customerLinkID: chararray,distance: chararray,drivingDirection: chararray,fixedLat: chararray,fixedLon: chararray,heading: chararray,latitude: chararray,linkID: chararray,longitude: chararray,moID: chararray,speed: chararray,tripID: chararray,ts: chararray),packet: (meta: (account: chararray,event: chararray),payload: (asset: chararray,connectionid: chararray,connectionidstr: chararray,fields: (GPSSPEED: (b64value: chararray,value: chararray)),id: chararray,idstr: chararray,index: chararray,loc: (i: chararray,j: chararray),receivedat: chararray,recordedat: chararray,recordedatms: chararray,streamsreceivedat: chararray)),playId: chararray),type: chararray)}

output of dump a;

(abcdefgh,abcdefgh,(1-cdfasefqwvc),)
(abcdefgh,abcdefgh,(1-cdfasefqwvc),)

script without blanks

a = LOAD 'sample.json' USING JsonLoader ('id:chararray, key:chararray, value:(rev:chararray), doc:(id:chararray, rev:chararray,geometry:(coordinates: (i:chararray, j:chararray), type:chararray),properties:(frTraj:(altitude:chararray, customerLinkID:chararray, distance:chararray, drivingDirection:chararray, fixedLat:chararray, fixedLon:chararray, heading:chararray, latitude:chararray, linkID:chararray, longitude:chararray, moID:chararray, speed:chararray, tripID:chararray, ts:chararray),packet:(meta:(account:chararray, event:chararray), payload:(asset:chararray, connectionid:chararray, connectionidstr:chararray, fields:( GPSSPEED:(b64value:chararray, value:chararray)), id:chararray, idstr:chararray, index:chararray, loc:(i:chararray, j:chararray), receivedat:chararray, recordedat:chararray, recordedatms:chararray, streamsreceivedat:chararray)),playId:chararray),type:chararray)');
describe a;
dump a;

Re: Converting Nested Json into flat file using pig script

New Contributor
https://www.welookups.com
  1. a = LOAD 'sample.json' USING JsonLoader('id:chararray, key:chararray, value:(rev:chararray), doc:(id:chararray, rev:chararray,geometry:(coordinates: (i:chararray, j:chararray), type:chararray),properties:(frTraj:(altitude:chararray, customerLinkID:chararray, distance:chararray, drivingDirection:chararray, fixedLat:chararray, fixedLon:chararray, heading:chararray, latitude:chararray, linkID:chararray, longitude:chararray, moID:chararray, speed:chararray, tripID:chararray, ts:chararray),packet:(meta:(account:chararray, event:chararray), payload:(asset:chararray, connectionid:chararray, connectionidstr:chararray, fields:( GPSSPEED:(b64value:chararray, value:chararray)), id:chararray, idstr:chararray, index:chararray, loc:(i:chararray, j:chararray), receivedat:chararray, recordedat:chararray, recordedatms:chararray, streamsreceivedat:chararray)),playId:chararray),type:chararray)');
  2. describe a;
  3. dump a;
,
www.welookups.com

a = LOAD 'sample.json' USING JsonLoader ('id:chararray, key:chararray, value:(rev:chararray), doc:(id:chararray, rev:chararray,geometry:(coordinates: (i:chararray, j:chararray), type:chararray),properties:(frTraj:(altitude:chararray, customerLinkID:chararray, distance:chararray, drivingDirection:chararray, fixedLat:chararray, fixedLon:chararray, heading:chararray, latitude:chararray, linkID:chararray, longitude:chararray, moID:chararray, speed:chararray, tripID:chararray, ts:chararray),packet:(meta:(account:chararray, event:chararray), payload:(asset:chararray, connectionid:chararray, connectionidstr:chararray, fields:( GPSSPEED:(b64value:chararray, value:chararray)), id:chararray, idstr:chararray, index:chararray, loc:(i:chararray, j:chararray), receivedat:chararray, recordedat:chararray, recordedatms:chararray, streamsreceivedat:chararray)),playId:chararray),type:chararray)');describe a;dump a;










web tutorial