Created 02-28-2018 08:31 AM
I have 1M+ records in a CSV file, and I have many of these files; where Data and Time are separate columns.
What i need to do is combine them into TimeStamp.
Is it efficient and / or possible to combine columns into a timestamp in the flowfiles via a record processor of some sort?
Created 02-28-2018 03:15 PM
Using UpdateRecord should be efficient.
Created 02-28-2018 03:53 PM
As mentioned by Bryan, you can use UpdateRecord processor, plus you can use distributed processing using RPG or multiple concurrent tasks to distribute the workload into multiple instances and process them parallelly for a better performance.
Created 02-28-2018 06:29 PM
That should not be a problem. I would make sure you have enough RAM and CPUs on your machine. SSD drives for NiFi repositories and enough nodes in your cluster.
Are these wide records? How big is each file? How many do you want to process at once? Where are you landing them?