Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi MergeRecord CRON scheduling

NiFi MergeRecord CRON scheduling

New Contributor

Hi guys,

Working on a 3 machine NiFi cluster and trying to get MergeRecord to merge all the items in its queue every hour.

I set CRON scheduling to 0 0 * * * ? and all sorts of combinations of values for bin size and age but nothing worked, it either just gets a few items at a time or none at all.

If I set it to timer driven all the bin settings work exactly as I expect them to.

I think there is some sort of connection between having it run once a hour and the max bin age, seems to be some sort of a deadlock condition but I don't understand exactly what's going on.

Thanks a lot!

3 REPLIES 3

Re: NiFi MergeRecord CRON scheduling

Super Guru

see my example here https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte...

1. Try with just one machine, primary node

2. Please post your template, logs and more details. How big are the files? what types? how many are coming, what is in the data provenance, what is in the queue, a screen shot.

With that chron setting it will only run once a day at midnight and only get 1 record at a pop. that timer is for a single run. so you have 1 record added a midnight. it will build up in probably 3 years at that pace.

after the merge just put an updateattribute with a once an hour cron.

This is once an hour: 0 * * * *

Also take a screen shot of the merge properties.

Make it timer driven and set max bin age to 1 hour

Re: NiFi MergeRecord CRON scheduling

Super Guru

Get rid of cron scheduling and just set max bin age to 1 hour. that should be what you are looking for. I like to set the merge for size and have the time as a back up in case you have a busy hour. Nice to have even size files.

Re: NiFi MergeRecord CRON scheduling

New Contributor

Thanks for your replies!

I have small json objects coming in the MergeRecord processor and I'm merging them in a csv file

The upper limit is 1,000,000 per hour, processing is very fast

As a workaround I have it setup with min bin size to 1,000,000 and max bin age 1h but this means I have to start it exactly on the hour plus it drifts as time goes by. Because the processing time is not fixed (different loads for different hours) even if I set the bin age to something like 59min it will still drift up or down

Don't have an account?
Coming from Hortonworks? Activate your account here