Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

JSON Schema for dynamic key field in Spark Structured Streaming?

avatar
Contributor

I receive JSON data from kafka with from_json() method. It expects schema from me. My JSON structure like this;

 

{     "Items": {         "key1": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "key2": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "key3": [             {                 "id": "",                 "name": "",                 "val": ""             }         ]     }
}

 

Key1, Key2, Key3 are dynamic. So, they may be changed. For example, another json is;

{     "Items": {         "hortoworks": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "community": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "question": [             {                 "id": "",                 "name": "",                 "val": ""             }         ]     }
}


These key names are unknown. But "id-name-val" fields inside these keys are the same.

I must define a json schema for read data from Kafka in Spark Structured Streaming. How can I do this?

1 REPLY 1

avatar
New Contributor

Hi @sosyalmedya_ogu ,

 

Did you get any formidable workaround for this?


I have ran into similar use-case where the JSON might have a change in schema.The producer application for our Kafka listens to an external API endpoint so we do not have control over the schema. Therefore, I am looking for the solution to handle dynamic JSON schema while processing this in Structured Streaming.

 

Any help would be highly appreciated.

 

Thanks,

Kumar Rohit