Support Questions

Find answers, ask questions, and share your expertise

Parse nested json using Spark RDD

Expert Contributor

I am trying to parse a nested json document using RDD rather than DataFrame. The reason I cannot use DataFrame (the typical code is like spark.read.json) is that the document structure is very complicated. The schema detected by the reader is useless because child nodes at the same level have different schemas. So I try the script below.

 

import json
s='{"key1":{"myid": "123","myname":"test"}}'
rdd=sc.parallelize(s).map(json.loads)

 

My next step will be using map transformation to parse json string but I do not know where to start. I tried the script below but it failed.

 

rdd2=rdd.map(lambda j: (j[x]) for x in j)

 

I would appreciate any resource on using RDD transformation to parse json.

1 ACCEPTED SOLUTION

Expert Contributor
2 REPLIES 2

Expert Contributor

Expert Contributor

@RangaReddy

 

The link is exactly what I need. Thanks for your help.