Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

JSON Schema for dynamic key field in Spark Structured Streaming?

JSON Schema for dynamic key field in Spark Structured Streaming?

New Contributor

I receive JSON data from kafka with from_json() method. It expects schema from me. My JSON structure like this;


{
    "Items": {
        "key1": [
            {
                "id": "",
                "name": "",
                "val": ""
            }
        ],
        "key2": [
            {
                "id": "", 
               "name": "",
                "val": ""
            }
        ],
        "key3": [
            {
                "id": "",
                "name": "",
                "val": ""
            }
        ]
    }
}


Key1, Key2, Key3 are dynamic. So, they may be changed. For example, another json is;

{
    "Items": {
        "hortoworks": [
            {
                "id": "",
                "name": "",
                "val": ""
            }
        ],
        "community": [
            {
                "id": "", 
               "name": "",
                "val": ""
            }
        ],
        "question": [
            {
                "id": "",
                "name": "",
                "val": ""
            }
        ]
    }
}


These key names are unknown. But "id-name-val" fields inside these keys are the same.

I must define a json schema for read data from Kafka in Spark Structured Streaming. How can I do this?