Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi : How can we make sure 100% data collection

Solved Go to solution
Highlighted

NiFi : How can we make sure 100% data collection

Contributor

How can we make sure 100% data collection in nifi?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: NiFi : How can we make sure 100% data collection

Master Guru
@mel mendoza

NiFi supports many different protocols that can be used for data ingestion. Many of those are fault tolerant but some like UDP are not. For the fault tolerant protocols, NiFi is built in such a way to ensure ingestion of the data at least once. NiFi does this in three phases:

1. Receives data over fault tolerant protocol

2. Commits Session to NiFi.

3. Depending on processor either deletes/acknowledges success to the data source or saves state about the data that was ingested.

With this model comes the small possibility that some NiFi fault or server failure between phase 2 and 3 could result in phase 3 not happening. In that case, upon recovery, that particular data may be ingested a second time.

Thanks,

Matt

View solution in original post

3 REPLIES 3
Highlighted

Re: NiFi : How can we make sure 100% data collection

Master Guru
@mel mendoza

NiFi supports many different protocols that can be used for data ingestion. Many of those are fault tolerant but some like UDP are not. For the fault tolerant protocols, NiFi is built in such a way to ensure ingestion of the data at least once. NiFi does this in three phases:

1. Receives data over fault tolerant protocol

2. Commits Session to NiFi.

3. Depending on processor either deletes/acknowledges success to the data source or saves state about the data that was ingested.

With this model comes the small possibility that some NiFi fault or server failure between phase 2 and 3 could result in phase 3 not happening. In that case, upon recovery, that particular data may be ingested a second time.

Thanks,

Matt

View solution in original post

Highlighted

Re: NiFi : How can we make sure 100% data collection

Master Guru

@mel mendoza

By default NiFi logs processor level events to the nifi-app.log. The default overall nifi-app.log log level in the latest releases is set to WARN. This means that only WARN and ERROR log level events are written to the logs. The logs that report successful data delivery would be INFO level events, so you would need to adjust the NiFi logging to get the output you are looking for. Just setting the default logging level to INFO for the nifi-app.log may make things way to noisy in teh log. NiFi's logging is configured in the logback.xml file. You will see in the logback.xml that NiFi has three default appenders for nifi-app.log, nifi-bootstrap.log and the nifi-user.log.

While you cannot configure logging down to a specific processor, you can configure logging against a specific processor class. So it is possible to create a new appender (Would create a new log if desired) and then create additional loggers for the specific processor classes you want INFO level enabled for.

If you feel i have addressed your original question, please accept my answer.

Matt

Highlighted

Re: NiFi : How can we make sure 100% data collection

Contributor

@Matt

Does NiFi have logfiel for every successful Put(PutFile/PutHDFS/etc)?If yes, is it located in $NIFI_HOME/logs, or do I need to add processor to write successful collection? Basically we just want logfile for successful/failed put or collection of data, then process it for visualization.

Don't have an account?
Coming from Hortonworks? Activate your account here