Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

NIFI Large Dataset

Explorer

@Matt Burgess

I have 1M+ records in a CSV file, and I have many of these files; where Data and Time are separate columns.

What i need to do is combine them into TimeStamp.

Is it efficient and / or possible to combine columns into a timestamp in the flowfiles via a record processor of some sort?

3 REPLIES 3

Using UpdateRecord should be efficient.

As mentioned by Bryan, you can use UpdateRecord processor, plus you can use distributed processing using RPG or multiple concurrent tasks to distribute the workload into multiple instances and process them parallelly for a better performance.

Super Guru

That should not be a problem. I would make sure you have enough RAM and CPUs on your machine. SSD drives for NiFi repositories and enough nodes in your cluster.

Are these wide records? How big is each file? How many do you want to process at once? Where are you landing them?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.