Created 01-26-2017 05:37 PM
Hi All, What is the best approach to convert a JSON to AVRO preserving the source datatypes. My source JSON has a field with timestamp (value would look like 2017-01-26T00:00:00-05:00) which I need to eventually insert to a hive table with column type timestamp. When I infer the schema, I get String for the timestamp field. Is there some pre-formatting that I can do on the timestamp field so that it gets inferred as timestamp field. Current flow is as below - JSON>>AVRO(infer/manually add schema)>>Streaming Insert to hive
Created 01-26-2017 05:56 PM
What are you using for your AVRO Schema? If you specify that the field as timestamp.
https://avro.apache.org/docs/1.8.0/spec.html#Timestamp+%28millisecond+precision%29
Can you post an example JSON file, AVRO Schema, data provenance of the run, Hive DDL
Standard Format "YYYY-MM-DD HH:MM:SS.fffffffff
"
For the schema, copy the inferred one from a data provenance run and then change the type from string to timestamp and save that schema and use that for the next run.
If there's empty strings or null in the first 20 records, I am thinking you will get String as a type.
Created 01-26-2017 06:31 PM
My Sample JSON File is
{ "timestamp": "2017-01-26T00:00:00-05:00", "c1": 73.0, "c2": 36.5, "c3": 43.8, "c4": 0.1, "c5": 75.4, "c6": 997.8, "c7": 0.5, "c8": 4.58, "c9": 43.8, "c10": 1.5, "c11": 40.6, "postal_code": "08863", "country": "us" }And the Avro schema inferred by NiFi is
{ "type": "record", "name": "date", "fields": [ { "name": "timestamp", "type": "string" }, { "name": "c1", "type": "double", "doc": "Type inferred from '73'" }, { "name": "c2", "type": "double", "doc": "Type inferred from '36.5'" }, { "name": "c3", "type": "double", "doc": "Type inferred from '43.8'" }, { "name": "c4", "type": "double", "doc": "Type inferred from '0'" }, { "name": "c5", "type": "double", "doc": "Type inferred from '75.4'" }, { "name": "c6", "type": "double", "doc": "Type inferred from '997.8'" }, { "name": "c7", "type": "double", "doc": "Type inferred from '0'" }, { "name": "c8", "type": "double", "doc": "Type inferred from '4.58'" }, { "name": "c9", "type": "double", "doc": "Type inferred from '43.8'" }, { "name": "c10", "type": "double", "doc": "Type inferred from '1.5'" }, { "name": "c11", "type": "double", "doc": "Type inferred from '40.6'" }, { "name": "postal_code", "type": "string", "doc": "Type inferred from '\"08863\"'" }, { "name": "country", "type": "string", "doc": "Type inferred from '\"us\"'" } ] }
Created 01-26-2017 07:57 PM
@Timothy Spann : I tried editing the avro schema manually adding "type": "timestamp-millis" in place of "string". However the processor does not accept this and notifies a "Schema Validation Failure"
Created 01-26-2017 08:40 PM
try as timestamp and not timestamp-millis. It may not support Avro 1.8
Created 01-27-2017 06:45 PM
Thanks @Timothy Spann The Avro version is 1.7.7 (looking at the pom.xml). Is there a timestamp type in avro 1.7.7? I guess not. I could see the logicalType timestamp-* in 1.8 Should I assume that the timestamp would not work.
Created 01-27-2017 07:08 PM
What version of HDF are you using? can you upgrade?
No timestamps in 1.7.7 http://avro.apache.org/docs/1.7.7/spec.html#schema_primitive
I seem to remember that being added in 1.8
Here are some AVRO notes I did last year for a meetup: https://github.com/airisdata/avroparquet
Created 01-27-2017 09:39 PM
@Timothy Spann This is NiFi 1.1.1. I believe avro is still 1.7.7. Not sure if it is 1.8.
Created 01-26-2017 08:40 PM
Created 01-26-2017 06:06 PM
{ "type" : "record", "namespace" : "hivetable", "name" : "hivetable", "fields" : [ { "name" : "status" , "type" : "string" }, { "name" : "rtimestamp" , "type" : "timestamp-millis" }, { "name" : "country" , "type" : "string" }, { "name" : "ip" , "type" : "string" } ] }