I've googled everywhere for this and everything I run across its super complicated. It should be relatively simple to do. The recommendations show to look at the "Example_With_CSV.xml" from here.
So given a flowfile thats a CSV.
2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True)"
$date = 2017-09-20 23:49:49:38.637
$id = 162929511757
$instanceid = 36095
$comment = "Failed, to fit max attempts (1=>3), fit failing, entirely (Fit Failure=True)"
$csv.date = ...
$csv.id = ...
$csv.instanceid = ...
$csv.comment = ..
Is there another easier option to do this besides RegEx? I can't stand to do anything with RegEx as how unreadable, and overly complicated they are. To me there should be a significantly easier way of doing this than with RegEx.
Are you using a
version of NiFi greater than or equal to v1.2? If so you will have
access to the Record-Oriented processors available in these versions
of NiFi which provide a great way to handle CSV data and should
provide significant performance improvements. Mark Payne provides an excellent example here of some of these processors
and their advantages. There are also other articles on
Record-Oriented processors on this community site here and from other
NiFi rockstars like @Bryan Bende on their personal blogs here. I’m
not sure of your downstream usecase but in my experience storing CSV
data as attributes on flow files is most commonly used for flow file
routing or data restructuring and persistence (hdfs, a database
etc.). If this is one of your intended downstream purposes once the
data is in Record-Oriented form you get quite a lot of flexibility in
what you can do with it including executing SQL queries against your
records using a QueryRecord processor, and inserting your records
directly into a database using a PutDatabaseRecord processor. There
is a little additional set-up with this process as you need to define
an Avro schema that represents your CSV file (and select/define a
schema registry) but conveniently NiFi even has an out of the box
InferAvroSchema processor to help you fast track this process (See @Timothy Spann article here). If you check the release notes for each NiFi
version you’ll be able to see what processors are available to you
as the Record-Oriented paradigm is relatively new and growing quickly
with each NiFi release. If you’re not at NiFi v1.2 I’d suggest
upgrading to the latest NiFi version if possible as having gone
through the upgrade process a number of times it’s a relatively straight
forward provided you have configured your NiFi instance according
to the suggested best practice. The Record-Oriented processors are
an exciting addition to the NiFi toolkit so I’d suggest investing
the time to embrace them as they will make your flows cleaner, more
performant, and provide greater flexibility in how you handle your
I ended up not using NiFi for this. Looking back I tried forcing a solution out of NiFi thst wasn’t a good fit. I spent several weeks and entirely too long trying to solve the most simple case of this project (formatting some text and dumping it to a db).
I could certainly see NiFi being useful for moving source data files around from the folders I’m working with (copying, moving etc.) but doing any amount of logic or manipulation of anything but a happy path is extremely tedious and seemingly difficult to do.
Knowing that I was going to have to do a lot more work on the data to make it even close to usable, I just scrapped NiFi and implement it in Python.
After dealing with this data and running into edge cases over and over again that I wasn’t even aware about when I wrote this topic, the data IMO was just too dirty and had too many exceptions to deal with, with NiFi. On top of that this wasn’t just the import of the data, not even using it so I would have had to have another tool to actually process the data to put it into a usable form anyways.
Appreciate the response. You took the time to respond so I figured it was reasonable to respond even though I didn’t end up using the solution.