Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFI - Converting CSV to Avro, header contains spaces

avatar
Contributor

I've had pretty good success converting csv to json and avro using the ConvertRecord processor.

However I'm having issues converting a csv file with spaces in the header (column names)

Ex CSV:

"Date of Birth"

01-23-1981

Is there a way to replace the spaces ' ' with '_' on just the header row? Is there another way to handle column/field names with spaces when using the ConvertRecord procesors when converting to avro?

1 ACCEPTED SOLUTION

avatar
Master Guru

One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.

Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.

It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.

View solution in original post

4 REPLIES 4

avatar
Master Guru

One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.

Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.

It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.

avatar
Master Guru

Adding to Bryan's answer, if you have the schema available to put in the registry, you can set it to Validate Field Names to false, meaning you could have field names defined in the Avro schema that do not conform to the stricter Avro rules.

We should consider adding this property to readers that generate their own schema, such as CSVReader...

avatar
Contributor

If you use an "invalid" schema will it be able to write to avro? I can see how that could work for transforming from csv to json - but I don't think it will work for avro, due to the rules.

avatar
Master Guru

Yeah that's true, I misread the first sentence of your question and was thinking conversion to JSON only, my bad