- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi InferAVROSchema
- Labels:
-
Apache NiFi
Created ‎12-08-2016 09:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
How does Infer schema works in the flow, does it keep inferring for every single dataflow, is that a good approach? shoudn't we use convertCSVToAVRO by providing a avsc file created by Kite.
Thanks.
Avijeet
Created ‎12-08-2016 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Each CSV or JSON that comes in the InferAvroSchema could be different so it will infer the schema for each flow file and put the schema where you specify the schema destination, either flow file content or a flow file attribute. Then you can use that attribute in ConvertCsvToAvro as the schema by referencing ${inferred.avro.schema}.
If you are sending only one type of CSV in to ConvertCsvToAvro then it would be more efficient for you to define the Avro schema you want and not use InferAvroSchema.
Created ‎12-08-2016 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Each CSV or JSON that comes in the InferAvroSchema could be different so it will infer the schema for each flow file and put the schema where you specify the schema destination, either flow file content or a flow file attribute. Then you can use that attribute in ConvertCsvToAvro as the schema by referencing ${inferred.avro.schema}.
If you are sending only one type of CSV in to ConvertCsvToAvro then it would be more efficient for you to define the Avro schema you want and not use InferAvroSchema.
Created ‎12-08-2016 02:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI @Bryan Bende, Thanks.
will it not be the case when a stream contains messages for one particular schema, I noticed KAFKA is trying to implement something similar, putting a Inferschema in a dataflow seems like a dangerous thing to do.
Created ‎12-08-2016 03:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It depends how you construct your dataflow in NiFi... You could set it up so that you have several logical streams that each have their own ConvertCsvToAvro processor, or you could have several processors feeding into the same ConvertCsvToAvro processor.
Kafka itself does not enforce anything related to a schema, but Confluent has a schema registry with serializers and deserializers and they can enforce that any message being written to a topic must conform to the schema for that topic.
Created ‎12-08-2016 03:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Avijeet Dash Take a look at this template for some examples.
Created ‎12-09-2016 08:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been using InferAvroSchema in dataflows for a while and:
1. It infers the schema for each file on input
2. saves the schema into ${inferred.avro.schema} attribute for that flowfile
3. it is not good for production use
As schema inferrence is only a guess, I would recommend you to infer your schema once (double check manually for correctness) and then use it as a static schema in ConvertAvroTo... processors (prepend RouteOnAttribute if you need different schemas). In production, this is what you want. Sometimes, the data can be misleading for inferrence. For example, I have input CSV with empty column, which in fact is nullable long column. Schema inferrence cannot guess it is nullable long. So for one input file, where the values are filled in as numbers, it guesses long type, and for another, where the column is empty, it guesses nullable string...
Created ‎12-12-2016 05:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Michal Klempa I agree. Thanks.
