Created 05-29-2018 07:02 PM
I've had pretty good success converting csv to json and avro using the ConvertRecord processor.
However I'm having issues converting a csv file with spaces in the header (column names)
Ex CSV:
"Date of Birth"
01-23-1981
Is there a way to replace the spaces ' ' with '_' on just the header row? Is there another way to handle column/field names with spaces when using the ConvertRecord procesors when converting to avro?
Created 05-29-2018 07:49 PM
One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.
Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.
It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.
Created 05-29-2018 07:49 PM
One way would be to define the schema ahead of time in one of the schema registries, and then have your CSVReader's Schema Access Strategy set to "Schema Name" so that it uses the schema from the registry, and then tell it to ignore the first line of the CSV. The downside is you have to define the schema rather than just using the column headers.
Besides that, the next easiest option would probably be to use ExecuteScript to write a simple script that reads the first line and converts the spaces in the column names to underscores, and then wrote it back out converted along with all the other unmodified lines.
It is possible there might be a way to do it with ReplaceText, but I'm not exactly sure how to modify only the first line.
Created 05-30-2018 03:06 AM
Adding to Bryan's answer, if you have the schema available to put in the registry, you can set it to Validate Field Names to false, meaning you could have field names defined in the Avro schema that do not conform to the stricter Avro rules.
We should consider adding this property to readers that generate their own schema, such as CSVReader...
Created 05-30-2018 02:30 PM
If you use an "invalid" schema will it be able to write to avro? I can see how that could work for transforming from csv to json - but I don't think it will work for avro, due to the rules.
Created 05-30-2018 09:58 PM
Yeah that's true, I misread the first sentence of your question and was thinking conversion to JSON only, my bad