- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Unable to read topic containing a csv file with one row and no headers in Nifi
- Labels:
-
Apache NiFi
Created on 08-17-2022 05:36 AM - edited 08-17-2022 05:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to read a kafka topic that contains a csv file with no headers. The file is read as it should be when the csv contains two rows or more (no headers) but when I use a csv file with only one row (no headers) the processor Read from Kafka doesn't do anything (no error message, and no file ingested. (0 in 0 out)
The controller services show below are :
- CSVReader
CSVRecordSetWriter :
Created on 08-26-2022 09:59 PM - edited 08-26-2022 10:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Omarb ,
Initially I thought this was a problem with the CSVRecordSetWriter, but I was mistaken.
The issue here is that even though your CSVReader is set to ignore the header line, it has Schema Access Strategy set to "Infer Schema", and this will cause the reader to consume the first line of the flow file to infer the schema, even though the other property tells it to ignore it.
To avoid this, set the Schema Access Strategy property to "Use 'Schema Text' Property" and provide a schema that matches your flowfile structure. For example:
"type": "record",
"name": "MyFlowFile",
"fields": [
{ "name": "col_a", "type": "string" },
{ "name": "col_b", "type": "string" },
{ "name": "col_c", "type": "string" },
...
]
}
This will stop the first line being "consumed" by the reader.
Cheers,
André
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created 08-22-2022 06:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Omarb Not sure if this is helpful, but sometimes I do something like this:
Take the operational test, write the schema, then capture it from one of your tests (check attributes for the schema). Now re-use that schema object for the non-operational test instead of infer-schema. I only like to infer schema to help me write the schema, especially if complicated.
Hope this helps,
Steven
Created on 08-26-2022 09:59 PM - edited 08-26-2022 10:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Omarb ,
Initially I thought this was a problem with the CSVRecordSetWriter, but I was mistaken.
The issue here is that even though your CSVReader is set to ignore the header line, it has Schema Access Strategy set to "Infer Schema", and this will cause the reader to consume the first line of the flow file to infer the schema, even though the other property tells it to ignore it.
To avoid this, set the Schema Access Strategy property to "Use 'Schema Text' Property" and provide a schema that matches your flowfile structure. For example:
"type": "record",
"name": "MyFlowFile",
"fields": [
{ "name": "col_a", "type": "string" },
{ "name": "col_b", "type": "string" },
{ "name": "col_c", "type": "string" },
...
]
}
This will stop the first line being "consumed" by the reader.
Cheers,
André
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created 09-04-2022 10:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Omarb, Has any of the replies helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
