Support Questions

Find answers, ask questions, and share your expertise

JSON Schema for dynamic key field in Spark Structured Streaming?

avatar
Contributor

I receive JSON data from kafka with from_json() method. It expects schema from me. My JSON structure like this;

 

{     "Items": {         "key1": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "key2": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "key3": [             {                 "id": "",                 "name": "",                 "val": ""             }         ]     }
}

 

Key1, Key2, Key3 are dynamic. So, they may be changed. For example, another json is;

{     "Items": {         "hortoworks": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "community": [             {                 "id": "",                 "name": "",                 "val": ""             }         ],         "question": [             {                 "id": "",                 "name": "",                 "val": ""             }         ]     }
}


These key names are unknown. But "id-name-val" fields inside these keys are the same.

I must define a json schema for read data from Kafka in Spark Structured Streaming. How can I do this?

1 REPLY 1

avatar
New Contributor

Hi @sosyalmedya_ogu ,

 

Did you get any formidable workaround for this?


I have ran into similar use-case where the JSON might have a change in schema.The producer application for our Kafka listens to an external API endpoint so we do not have control over the schema. Therefore, I am looking for the solution to handle dynamic JSON schema while processing this in Structured Streaming.

 

Any help would be highly appreciated.

 

Thanks,

Kumar Rohit