Member since
01-05-2017
153
Posts
10
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3469 | 02-20-2018 07:40 PM | |
2421 | 05-04-2017 06:46 PM |
03-15-2017
05:03 PM
Not exactly the solution but led me to the solution. See my comment above. Thanks for following through on the filesize HDFS corruption due to partially appended files issue. I appreciate it.
... View more
03-15-2017
05:03 PM
The issue was my chosen filesize. When I went back to creating files that were 1.0 MB in size, it would read those files very quickly then return to reading 1000-2000 events per second. I had files that were 13 KB each so I think maybe.... the opening, reading, closing is causing a slowing down of processing? Either way its fixed in relation to this problem.
... View more
03-15-2017
04:49 PM
Thanks for following up on that!
... View more
03-15-2017
02:53 PM
I have an unusual issue.
I am am using both Flume and Nifi to process data in Hadoop and then read those HDFS files with HUNK (Splunk for Hadoop Analytics). When I run a query on the data ingested through Flume, it is read quickly (about 100,000+ events per second). ** Lately ** when I run a query on the data ingested by Nifi, it is slow, processing about 1000-2000 events per second. I say **lately** because just two days ago, the query of the Nifi ingested data was quick as the Flume data. 😮 Whaa? Some specifics: Flume: folders in HDFS separated and configured by Year/Month/Day/(one single log file for the whole day) Nifi: folders in HDFS separated and configured by Year/Month/Day/Hour/(many log files, about 13KB each - done this way in order to prevent possible data corruption of a cessation of data ingestion through Nifi while a data is being appended to a single file such as in Flume) I'd say it was the difference in file structure except: (1) the more granular file directory structure of the Nifi ingested files are suppose to produce FASTER searches in HUNK (2) As I said, just two days ago, the queries being run on this Nifi ingested data was just as fast as the data being ingested by Flume. I can provide more info but Im looking for some ideas on what might be causing this or some ideas on how to troubleshoot such a problem.
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
Apache NiFi
03-07-2017
07:41 PM
Its interesting because I am trying your methods of having the bin only complete according to a period of time and neither are working. I have added an attribute called hour which retrieves the yyyy-MM-dd-HH and saves it. Then I tell the MergeProcessor Correlation Attribute Name property to group according to "hour". I can see the actual Attribute when I view the files in the queue and the hour attribute looks correct ... it almost seems like the value in Minimum Group Size is overriding the Correlation Attribute Name. Is there a way to tell the MergeProcessor to ONLY use the Correlation Attribute Name to judge bin size and ignore the number of entries and Group Size? Attached is a screenshot of my MergeProcessor config values and a screenshot of the value of my "hour" attribute.
... View more
03-07-2017
07:09 PM
Got it! Thanks!
... View more
03-07-2017
07:06 PM
Crap. I thought the Zookeeper port and the Kafka brokers port were the same. Ill hunt down what ports my Kafka broker is listening on - not sure where to begin looking. Any suggestion? ( Im sure Ill find it through Google)
... View more
03-07-2017
06:51 PM
For some reason even though my Kafka version is listed as kafka_2.10-0.10.0.2.5.0.0-1245.jar (which I am pretty sure is Kafka 0.10), the only processor that works for me is GetKafka. Consume_Kafka and Consume_Kafka_0_10 both will hang and not process any data... if I stop the nifi flow it does give an error seen in the screenshot. I have left it running up to 10 mins waiting for it to connect but it never seems to. I have also including a screenshot of my KafkaConsume properties window which is using the same values as the ones I put in GetKafka when it works.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
03-07-2017
06:26 PM
I accepted this because your solution was in a comment down below, for future reference for others. Using the colon in the filename was the problem.
... View more