Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Parse nested json using Spark RDD

avatar
Expert Contributor

I am trying to parse a nested json document using RDD rather than DataFrame. The reason I cannot use DataFrame (the typical code is like spark.read.json) is that the document structure is very complicated. The schema detected by the reader is useless because child nodes at the same level have different schemas. So I try the script below.

 

import json
s='{"key1":{"myid": "123","myname":"test"}}'
rdd=sc.parallelize(s).map(json.loads)

 

My next step will be using map transformation to parse json string but I do not know where to start. I tried the script below but it failed.

 

rdd2=rdd.map(lambda j: (j[x]) for x in j)

 

I would appreciate any resource on using RDD transformation to parse json.

1 ACCEPTED SOLUTION

avatar
Super Collaborator
2 REPLIES 2

avatar
Super Collaborator

avatar
Expert Contributor

@RangaReddy

 

The link is exactly what I need. Thanks for your help.