I have text files that sometimes have various delimiters such as quotation marks, commas, and tabs etc what processor can I use to handle such delimiters and how to I configure it to handle such delimiters in my file? ConvertCsvToAvro "properties" section has similar properties I want to achieve.Thanks
In your flow from the other post you are getting a file from FetchSFTP, then splitting each line with SplitText, then using ExtractText to parse out the values, and ReplaceText to construct a SQL statement.
The ExtractText processor is the one that needs to understand the delimiter in order get the value of each column. Since your flow was working you must have already configured ExtractText with a pattern to parse the line right? So are you just asking how to handle more delimiters?
There isn't a specific processor that is made just for parsing delimited lines, mostly because you can do that with ExtractText already. You should only need one pattern to parse the whole line, lets say I have a simple CSV with 4 columns, you could have one property like this:
csv = (.+),(.+),(.+),(.+)
That will add attributes csv.1, csv.2, csv.3, csv.4 containing each respective column.
You could have different instances of ExtractText to handle the different types of delimiters, would need to route the data to each appropriately.
For a more user friendly option you could implement a custom processor like ParseCSV or ParseDelimited where you had a property to take the delimiter and then used some kind of parsing library, or your own code, to parse the line.
A second alternative is to write a Groovy or Jython script to do the parsing and use the ExecuteScript processor.
@Bryan Bende let's say I want to go the route where I use different ExtractText to handle different delimiters how I do about that?Am quite confused here(a vivid example will ve helpful). From my understanding, ExtractText processor will parse a file regardless of the delimiters of the file but what actually matters is the regular expression used to extract the data?correct me if wrong. I also tried replicating your example above, the ingestion of the flow file was successful but there was no data in data the databases tables.