Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Very slow event processing in HUNK using Nifi

avatar
Expert Contributor

I have an unusual issue. I am am using both Flume and Nifi to process data in Hadoop and then read those HDFS files with HUNK (Splunk for Hadoop Analytics). When I run a query on the data ingested through Flume, it is read quickly (about 100,000+ events per second). ** Lately ** when I run a query on the data ingested by Nifi, it is slow, processing about 1000-2000 events per second. I say **lately** because just two days ago, the query of the Nifi ingested data was quick as the Flume data. 😮 Whaa?

Some specifics:

Flume: folders in HDFS separated and configured by Year/Month/Day/(one single log file for the whole day)

Nifi: folders in HDFS separated and configured by Year/Month/Day/Hour/(many log files, about 13KB each - done this way in order to prevent possible data corruption of a cessation of data ingestion through Nifi while a data is being appended to a single file such as in Flume)

I'd say it was the difference in file structure except: (1) the more granular file directory structure of the Nifi ingested files are suppose to produce FASTER searches in HUNK (2) As I said, just two days ago, the queries being run on this Nifi ingested data was just as fast as the data being ingested by Flume.

I can provide more info but Im looking for some ideas on what might be causing this or some ideas on how to troubleshoot such a problem.

1 ACCEPTED SOLUTION

avatar
Master Guru

I don't really know anything about HUNK so I can't speak to why it would be slower, maybe it hit some threshold where theres finally enough smaller files under the NiFi data that now it is slowing down?

Just wanted to mention that after your previous post I asked a couple of people about the possibility of partially appended data during an error scenario, and the consensus seemed to be that HDFS wouldn't let this happen. So you might be fine just appending to one file per hour or day from NiFi.

View solution in original post

4 REPLIES 4

avatar
Master Guru

I don't really know anything about HUNK so I can't speak to why it would be slower, maybe it hit some threshold where theres finally enough smaller files under the NiFi data that now it is slowing down?

Just wanted to mention that after your previous post I asked a couple of people about the possibility of partially appended data during an error scenario, and the consensus seemed to be that HDFS wouldn't let this happen. So you might be fine just appending to one file per hour or day from NiFi.

avatar
Expert Contributor

Thanks for following up on that!

avatar
Expert Contributor

Not exactly the solution but led me to the solution. See my comment above.

Thanks for following through on the filesize HDFS corruption due to partially appended files issue. I appreciate it.

avatar
Expert Contributor

The issue was my chosen filesize.

When I went back to creating files that were 1.0 MB in size, it would read those files very quickly then return to reading 1000-2000 events per second. I had files that were 13 KB each so I think maybe.... the opening, reading, closing is causing a slowing down of processing?

Either way its fixed in relation to this problem.