Support Questions

Find answers, ask questions, and share your expertise

Best Practice - JSON to Avro, data type preserving

avatar
Super Collaborator

Hi All, What is the best approach to convert a JSON to AVRO preserving the source datatypes. My source JSON has a field with timestamp (value would look like 2017-01-26T00:00:00-05:00) which I need to eventually insert to a hive table with column type timestamp. When I infer the schema, I get String for the timestamp field. Is there some pre-formatting that I can do on the timestamp field so that it gets inferred as timestamp field. Current flow is as below - JSON>>AVRO(infer/manually add schema)>>Streaming Insert to hive

21 REPLIES 21

avatar
Super Collaborator

i tried on 1.1 it worked fine.

json2avro.xml

{    
"type": "record",
    "name": "testdata",
    "fields": [
        {
            "name": "timestamp",
            "type": {"type":"string","logicalType":"timestamp-millis"}
        },
        {
            "name": "c1",
            "type": "double"
        },
        {
            "name": "c2",
            "type": "double"
        },
        {
            "name": "c3",
            "type": "double"
        },
        {
            "name": "c4",
            "type": "double"
        },
        {
            "name": "c5",
            "type": "double"
        },
        {
            "name": "c6",
            "type": "double"
        },
        {
            "name": "c7",
            "type": "double"
        },
        {
            "name": "c8",
            "type": "double"
        },
        {
            "name": "c9",
            "type": "double"
        },
        {
            "name": "c10",
            "type": "double"
        },
        {
            "name": "c11",
            "type": "double"
        },
        {
            "name": "postal_code",
            "type": "string"
        },
        {
            "name": "country",
            "type": "string"
        }
    ]
}

avatar
Rising Star

Hi @Arun A K

This is a known issue where the datatypes are not preserved. https://issues.apache.org/jira/browse/NIFI-2624 which talks about Oracle/SQL datatypes not being preserved. You should also check out https://gist.github.com/ijokarumawak/69b29fa7b11c2ada656823db614af373

As mentioned by @Karthik Narayanan, best approach would be to use Record Oriented processors.