Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error while ingesting Plain CSV to SAM via NIFI

avatar
Explorer

I'm trying to upgrade a existing visualization(Kafka>Flink>Druid>Superset) solution to work with HWX SAM & Registry.

Currently the NIFI Works as a HTTP proxy to collect events and push to kafka, I'm trying to convert the events(CSV) to avro in this stage and push to kafka so that SAM can consume.

Output of the SplitContent is something similar to "abc,def,ghi,jkl,,"

I'm getting this error in storm UI

com.hortonworks.registries.schemaregistry.serde.SerDesException: Unknown protocol id [49] received while deserializing the payload at com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapsho

Is there something I should pay closer attention to when processing CSV? Troubleshooting recommendations ?

40414-screen-shot-2017-09-19-at-114110-am.png

1 ACCEPTED SOLUTION

avatar
Master Guru

The reader on the SAM side is trying to read the encoded schema reference, but it is likely not there. The AvroRecordSetWriter being used by PublishKafkaRecord_0_10 must be configured with a "Schema Write Strategy" of "Hortonworks Content Encoded Schema Reference".

View solution in original post

4 REPLIES 4

avatar
Contributor

@Roshan Dissanayake

Can you please show the configuration of publishkafka reader and writer CS?

This looks to be an issue while setting the attributes of the flowfile when it is being sent to retrieve the Schema from registry.

avatar
Master Guru

The reader on the SAM side is trying to read the encoded schema reference, but it is likely not there. The AvroRecordSetWriter being used by PublishKafkaRecord_0_10 must be configured with a "Schema Write Strategy" of "Hortonworks Content Encoded Schema Reference".

avatar
Explorer

@mkalyanpur

CSVReader 1.2.0.3.0.1.1-5 & AvroRecordSetWriter 1.2.0.3.0.1.1-5 are as follows.

And my avro schema in the registry is similar to this with bunch of more string fields.

{
  "type": "record",
  "name": "tracking_sdk_event",
  "fields": [
    {
      "name": "timeStamp",
      "type": "long",
      "default": null
    },
    {
      "name": "isoTime",
      "type": "string",
      "default": null
    }
  ] 
}

@Bryan Bende

After changing the "Schema Write Strategy" to "Hortonworks Content Encoded Schema Reference" I'm getting an error with the timeStamp field. I have attached an image of it.


avrorecordsetwriter-1203011-5.pngscreen-shot-2017-09-20-at-112540-am.pngcsvreader-1203011-5.png

avatar
Master Guru

If you want to have a default value of "null" then the type of your field needs to be a union of null and the real type.

For example, for timestamp you would need: "type": ["long", "null"]