Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

avro-tools.jar and ConvertJsonToAvro Processor totally different behaviour with same Schema?

Highlighted

avro-tools.jar and ConvertJsonToAvro Processor totally different behaviour with same Schema?

New Contributor

Hi,

so I ran into a problem which is quite weird:

My test schema is:

{
  "type": "record",
  "name": "projectName",
  "namespace": "some.namespace",
  "fields": [
    {
      "name": "dataTime",
      "type": "long"
    },
    {
      "name": "fileTime",
      "type": "long"
    },
    {
      "name": "dataType",
      "type": "string"
    },
    {
      "name": "mediaType",
      "type": "string"
    },
    {
      "name": "tSimpleStreamData",
      "type": [
        "null",
        {
          "type": "record",
          "name": "tSimpleStreamData",
          "fields": [
            {
              "name": "streamName",
              "type": "string"
            },
            {
              "name": "value",
              "type": "double"
            }
          ], "default" : null
        }
      ],
      "default": null
    }
  ]
}

Now I have 3 files:

1) data.json

{"dataTime":1485504480599628,"fileTime":1485504480597669,"dataType":"tSimpleStreamData","mediaType":"StructuredData","tSimpleStreamData":{"some.namespace.tSimpleStreamData": {"streamName":"pressure","value":6.2999999999999989}}}

2) data_without_namespace.json

{"dataTime":1485504480599628,"fileTime":1485504480597669,"dataType":"tSimpleStreamData","mediaType":"StructuredData","tSimpleStreamData":{"tSimpleStreamData": {"streamName":"pressure","value":6.2999999999999989}}}

3) data_without_type_object.json

{"dataTime":1485504480599628,"fileTime":1485504480597669,"dataType":"tSimpleStreamData","mediaType":"StructuredData","tSimpleStreamData":{"streamName":"pressure","value":6.2999999999999989}}

When I run on Windows with avro-tools-1.8.1.jar:

1) Works fine, no error.

2) Fails. Error:

Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union bran
ch tSimpleStreamData
        at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:29
0)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:2
67)
        at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(Gene
ricDatumReader.java:178)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:152)
        at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumRead
er.java:240)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumRea
der.java:230)
        at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(Gene
ricDatumReader.java:174)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:152)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:144)
        at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)

        at org.apache.avro.tool.Main.run(Main.java:87)
        at org.apache.avro.tool.Main.main(Main.java:76)

3) Fails. Error (error is different to the one in 2):

Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union bran
ch streamName
        at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:29
0)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:2
67)
        at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(Gene
ricDatumReader.java:178)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:152)
        at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumRead
er.java:240)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumRea
der.java:230)
        at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(Gene
ricDatumReader.java:174)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:152)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.ja
va:144)
        at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)

        at org.apache.avro.tool.Main.run(Main.java:87)
        at org.apache.avro.tool.Main.main(Main.java:76)

Now when I run this with Nifi 1.1.2, I get the following behaviour:

1) Fails:

15360-a.jpg

2) Fails too:

15371-b.jpg

3) Works:

15372-c.jpg

Can someone enlighten me what's going on here? Why is the behaviour completely different when using the avro-tools and the ConvertJsonToAvro processor?

Is this intended? It's hard to model my Schemas when I can barely test them on my Windows PC before getting the data and schema over to the NiFi cluster.

Thanks!

2 REPLIES 2

Re: avro-tools.jar and ConvertJsonToAvro Processor totally different behaviour with same Schema?

ConvertJsonToAvro uses the Kite library to perform the conversion, which is likely doing something different than avro-tools.

The Kite bundle in NiFi has ConvertJsonToAvro, ConvertCsvToAvro, InferAvroSchema, and ConvertAvroSchema processors.

It might be interesting to take each one of your JSON documents and send them through InferAvroSchema to see what schema it produces and see how that compares to your schema.

Also worth mentioning is that in Apache NiFi 1.2.0 (just released) there is a new record reader/writer concept and a new ConvertRecord processor where you could select a JSON reader and an Avro writer to convert between them. This conversion is implemented directly by NiFi.

Re: avro-tools.jar and ConvertJsonToAvro Processor totally different behaviour with same Schema?

New Contributor

Would you say that the NiFi 1.2.0 conversion is better than the old one that used the Kite library? Can I expect the new ConvertRecord processor to match the avro-tools or is it still different?