Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFI - Converting CSV to Avro, header contains spaces

Solved Go to solution
Highlighted

NiFI - Converting CSV to Avro, header contains spaces

New Contributor

I've had pretty good success converting csv to json and avro using the ConvertRecord processor.

However I'm having issues converting a csv file with spaces in the header (column names)

Ex CSV:

"Date of Birth"

01-23-1981

Is there a way to replace the spaces ' ' with '_' on just the header row? Is there another way to handle column/field names with spaces when using the ConvertRecord procesors when converting to avro?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: NiFI - Converting CSV to Avro, header contains spaces

One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.

Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.

It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.

4 REPLIES 4

Re: NiFI - Converting CSV to Avro, header contains spaces

One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.

Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.

It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.

Re: NiFI - Converting CSV to Avro, header contains spaces

Adding to Bryan's answer, if you have the schema available to put in the registry, you can set it to Validate Field Names to false, meaning you could have field names defined in the Avro schema that do not conform to the stricter Avro rules.

We should consider adding this property to readers that generate their own schema, such as CSVReader...

Re: NiFI - Converting CSV to Avro, header contains spaces

New Contributor

If you use an "invalid" schema will it be able to write to avro? I can see how that could work for transforming from csv to json - but I don't think it will work for avro, due to the rules.

Re: NiFI - Converting CSV to Avro, header contains spaces

Yeah that's true, I misread the first sentence of your question and was thinking conversion to JSON only, my bad