- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Parse nested json using Spark RDD
- Labels:
-
Apache Spark
Created ‎09-09-2021 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to parse a nested json document using RDD rather than DataFrame. The reason I cannot use DataFrame (the typical code is like spark.read.json) is that the document structure is very complicated. The schema detected by the reader is useless because child nodes at the same level have different schemas. So I try the script below.
import json
s='{"key1":{"myid": "123","myname":"test"}}'
rdd=sc.parallelize(s).map(json.loads)
My next step will be using map transformation to parse json string but I do not know where to start. I tried the script below but it failed.
rdd2=rdd.map(lambda j: (j[x]) for x in j)
I would appreciate any resource on using RDD transformation to parse json.
Created ‎09-14-2021 10:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎09-14-2021 10:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎09-15-2021 10:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
