Created on 08-17-2022 05:36 AM - edited 08-17-2022 05:40 AM
I am trying to read a kafka topic that contains a csv file with no headers. The file is read as it should be when the csv contains two rows or more (no headers) but when I use a csv file with only one row (no headers) the processor Read from Kafka doesn't do anything (no error message, and no file ingested. (0 in 0 out)
The controller services show below are :
- CSVReader
CSVRecordSetWriter :
Created on 08-26-2022 09:59 PM - edited 08-26-2022 10:15 PM
@Omarb ,
Initially I thought this was a problem with the CSVRecordSetWriter, but I was mistaken.
The issue here is that even though your CSVReader is set to ignore the header line, it has Schema Access Strategy set to "Infer Schema", and this will cause the reader to consume the first line of the flow file to infer the schema, even though the other property tells it to ignore it.
To avoid this, set the Schema Access Strategy property to "Use 'Schema Text' Property" and provide a schema that matches your flowfile structure. For example:
"type": "record",
"name": "MyFlowFile",
"fields": [
{ "name": "col_a", "type": "string" },
{ "name": "col_b", "type": "string" },
{ "name": "col_c", "type": "string" },
...
]
}
This will stop the first line being "consumed" by the reader.
Cheers,
André
Created 08-22-2022 06:38 AM
@Omarb Not sure if this is helpful, but sometimes I do something like this:
Take the operational test, write the schema, then capture it from one of your tests (check attributes for the schema). Now re-use that schema object for the non-operational test instead of infer-schema. I only like to infer schema to help me write the schema, especially if complicated.
Hope this helps,
Steven
Created on 08-26-2022 09:59 PM - edited 08-26-2022 10:15 PM
@Omarb ,
Initially I thought this was a problem with the CSVRecordSetWriter, but I was mistaken.
The issue here is that even though your CSVReader is set to ignore the header line, it has Schema Access Strategy set to "Infer Schema", and this will cause the reader to consume the first line of the flow file to infer the schema, even though the other property tells it to ignore it.
To avoid this, set the Schema Access Strategy property to "Use 'Schema Text' Property" and provide a schema that matches your flowfile structure. For example:
"type": "record",
"name": "MyFlowFile",
"fields": [
{ "name": "col_a", "type": "string" },
{ "name": "col_b", "type": "string" },
{ "name": "col_c", "type": "string" },
...
]
}
This will stop the first line being "consumed" by the reader.
Cheers,
André
Created 09-04-2022 10:29 PM
@Omarb, Has any of the replies helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,