Support Questions

alexmc · ‎04-28-2016

I am thinking about setting up a Logstash infrastructure to monitor my system. (It happens to be Hortonworks HDP Hadoop cluster, but assume it isn't). So I have various things which generate logs and I want to transfer these logs outside my system to a new system - such as ElasticSearch inside Logstash. And I want to do this securely.

I don't really want Flume for this as there are better tools.

Now I might use Logstash forwarders - which most recently seems to be a new system called "Beats" - in particular FileBeat. However I would prefer to use Apache NiFi because of its security reputation. I would like to use HDF as I am a Hortonworks Partner and we are already using HDP.

Can anyone say:

"Yes this makes sense", "Yes, I have done it", "You need to read URL blah blah blah"?

Or have I got the wrong end of the stick?

PS I know that Ambari Metrics moves operational logs from the Hadoop cluster into the HDFS system - this is separate from that.

ccasano · ‎04-28-2016

Alex – This makes sense to me. If you’re tailing files or listening to syslog, NiFi has great easy processors that can easily forward this information to a downstream search tool (SOLR, ES, etc) or even persist the information in long term storage (HDFS). You can encrypt and compress the data as you capture it, send it over a secure wire and do the simple event processing you need in order to route the information to the appropriate end point. There are also some processors such as ScanContent or RouteContent that can allow you to route message based on patterns (think Regex and Whitelists) that you find in the message payload (i.e. route errors here, info there) or create priorities for those messages.

The other place where NiFi helps tremendously is around data conversion. For example, convert AvroToJSON or CSVtoAvro or AttributesToJSON. These help you get messages into the proper files streams to be indexed by your search tool.

The one place I would look at closely is the amount of log parsing you need to do. For unique formats, you may need to create a custom processor in NiFi to assist you with extracting log attributes. There are processors such as EvaluateXPath or EvaluateXQuery that allow you to use XPath to find attribute value pair information in XML and JSON which is extremely helpful and may be all you need. Otherwise, it’s really easy to get started and play around with your use case to see if there’s a fit.

View solution in original post

ravi1 · ‎04-28-2016

Advantage of using HDF here is that you can do any preprocessing/filtering on your logs before you put into ElasticSearch. This is one of the common usecase where logs are preprocessed before putting into a system like Splunk/Logstash.

alexmc · ‎04-28-2016

Thanks. That is helpful

ccasano · ‎04-28-2016

Alex – This makes sense to me. If you’re tailing files or listening to syslog, NiFi has great easy processors that can easily forward this information to a downstream search tool (SOLR, ES, etc) or even persist the information in long term storage (HDFS). You can encrypt and compress the data as you capture it, send it over a secure wire and do the simple event processing you need in order to route the information to the appropriate end point. There are also some processors such as ScanContent or RouteContent that can allow you to route message based on patterns (think Regex and Whitelists) that you find in the message payload (i.e. route errors here, info there) or create priorities for those messages.

The other place where NiFi helps tremendously is around data conversion. For example, convert AvroToJSON or CSVtoAvro or AttributesToJSON. These help you get messages into the proper files streams to be indexed by your search tool.

The one place I would look at closely is the amount of log parsing you need to do. For unique formats, you may need to create a custom processor in NiFi to assist you with extracting log attributes. There are processors such as EvaluateXPath or EvaluateXQuery that allow you to use XPath to find attribute value pair information in XML and JSON which is extremely helpful and may be all you need. Otherwise, it’s really easy to get started and play around with your use case to see if there’s a fit.

alexmc · ‎04-28-2016

Thanks people. That is very helpful. It sounds like I have some learning to do 🙂

Cloudera Community

Support Questions

HDF be used to feed Logstash?