About elloyd

elloyd · ‎01-26-2017

Hello We did an install of Hortonworks Ambari using the official guide with default settings and surprisingly, it does not list the option to Add Service -> Nifi. We have that ability in Hortonworks VM with Ambari but not with our actual local install. How would we go about adding Nifi to the potential services to be installed via Ambari in our local cluster? Thanks

elloyd · ‎01-24-2017

This may be a simple question but I have searched for information on it and cannot find any. I am exploring various data ingestion tools that can be managed through Ambari (configured, started, stopped, restarted) and I know Flume works this way. I was hoping Kafka-Connect could be done like this but I've seen evidence that isn't so. Now I am looking at Spark Streaming and hoping theres a way to start, stop and restart a Spark Streaming job kind of like you do with Flume by creating custom interceptors that are .jar files and referencing them in the config...? Any insight would be greatly appreciated.

elloyd · ‎01-17-2017

Issue is resolved. It has to do with the cluster set up we have. Once I turned off the flume agents on all but the one machine we are using in the config, we experienced no data loss and only one .tmp file.

elloyd · ‎01-17-2017

Issue is resolved. It has to do with the cluster set up we have. Once I turned off the flume agents on all but the one machine we are using in the config, we experienced no data loss and only one .tmp file.

elloyd · ‎01-16-2017

Wow, more unusual behavior now... stopping one of the slaves on our cluster caused one of the tmp files to resolve itself and then immediately another tmp file appeared. I'm going to try stopping all the flumes in m1, m2, and the 5 slaves and only start the one on m1

elloyd · ‎01-16-2017

Thanks but actually while syslog is our original source its not the source for the hdfs sink. We have syslog source -> kafka sink Then kafka source -> hdfs sink The data loss isn't occurring but as I specified in my other question, I'm still getting two .tmp files. Im trying to discover if I could have more than one flume agent running but I don't see but one flume agent running. I thought maybe since we have two masters in our cluster, I would stop one of them and let the other run - see if maybe if somehow a flume on master1 was creating one tmp file and a flume on master2 is creating the other and Im getting unusual results. For instance, it seemed to indicate this was the issue when I stopped m1 and then only one .tmp resolved itself. When I stopped m2, the other resolved itself. Oddly enough though when I started m1 again, two tmp files appeared. And when I stopped it, only one resolved itself! Then I started m2 again and a new tmp file appeared! I'm completely baffled. I dont see how m2 could be generating into hdfs as in the configuration file we never mention the IP address of m2, only of m1... I am starting to think theres a concept of the clustering that is causing this that I don't understand.

elloyd · ‎01-10-2017

I do notice that sometimes it will crash when I restart flume but the data loss occurs in times when I am not restarting it so I doubt that is the culprit. You've given me a lot of different avenues to explore as possible causes. We are running a cluster with two masters and 5 slaves. When I do the rolling restart it restarts all 7 so flume is running on all 7 machines. But in the config file, the only kafka sink we are getting data from is listed (see config file above)... its true, we have data being sent from Syslog to Kafka on master1 and master2 but only master1's Kafka source a flume agent is retrieving data from. So there is only one flume agent active, I believe.

elloyd · ‎01-10-2017

For clarity of how two files are divided Ill use variables file1 and file2 to illustrate: file1 begins with an event at 20:10:53 and continues without skipping events until 20:23:20 file2 begins with an events 20:23:20 and continued till 20:26:53 If it follows the same pattern as in the past, file2 will stop at some point say 20:30:00 and then file1 will start having events appended to it where file2 left off and it goes back and forth, back and forth

elloyd · ‎01-10-2017

Now it seems another .tmp file has arrived as I did a refresh. As a side note, I do a rolling restart on the flumes that exist in our cluster each time.. but I think the flume that is being used to grab this data is from one server. Plus this double .tmp files didn't exist a week ago (was only putting one .tmp file in each folder as we wanted - sadly with the data loss though...)

elloyd · ‎01-10-2017

/topics/firewall/01-10-2017/firewall1.1484078093784.log.tmp /topics/firewall/01-10-2017/firewall1.1484076220477.log.tmp I only have one agent running. As a side note, I have removed the hdfs.batchSize, channel1.byteCapacity, and channel1.byteCapacityBufferPercentage parameters and started it up again. It then only started producing one .tmp file: /topics/firewall/01-10-2017/firewall1.1484079147746.log.tmp This would lead me to believe that those parameters were the culprit but not necessarily. As a side note, the reason I included those parameters is I was experiencing data loss as per the other question I put on here: https://community.hortonworks.com/questions/76473/data-loss-missing-using-flume-with-kafka-source-an.html I expect now that I've returned to my configuration that experienced data loss I will see these missing events again (yet now I only have it going to one file again) Does Flume split them up for some reason into multiple files because of the batchSize or byteCapacity parameters?

Online	Offline
Last Visited	‎03-14-2018 05:14 PM

Member Since	‎01-05-2017 02:25 PM
Last Visited	‎03-14-2018 05:14 PM
Posts	153
Kudos received	10

Cloudera Community

Re: TailFile cannot find directory/file which exis...

Re: Unusual data placement on file rollover in Nif...

Don't see Nifi listed in Add Service options in Am...

Managing Spark Streaming from Ambari?

Re: Kafka source to HDFS sink through Flume puttin...

Re: Data loss (missing) using Flume with Kafka sou...

Re: Data loss (missing) using Flume with Kafka sou...

Re: Data loss (missing) using Flume with Kafka sou...

Re: Data loss (missing) using Flume with Kafka sou...

Re: Kafka source to HDFS sink through Flume puttin...

Re: Kafka source to HDFS sink through Flume puttin...

Re: Kafka source to HDFS sink through Flume puttin...