Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Rising Star

One of the key design principles of Apache Metron is that it should be easily extensible. We envision many users using Metron as a platform and building custom capabilities on top of it; one of which will be to add new telemetry data sources. In this multi-part article series, we will walk you through how to add a new data telemetry data source: Squid proxy logs.

This multi-part article series consists of the following:

  1. This Article: Sets up the use case for this multi-part article series
  2. Use Case 1: Collecting and Parsing Telemetry Events - This tutorial walks you through how to collect/ingest events into Metron and then parse them.
  3. Use Case 2: Enriching Telemetry Data - Describes how to enrich elements of telemetry events with Apache Metron.
  4. Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds - Describes how to add new threat intel feeds to the system and how those feeds can be used to cross-reference every telemetry event that comes in. When a hit occurs, an alert will be generated and displayed on the Metron UI.

Setting up the Use Case Scenario

Customer Foo has installed Metron TP1 and they are using the out-of-the-box data sources (PCAP, YAF/Netflow, Snort, and Bro). They love Metron! But now they want to add a new data source to the platform: Squid proxy logs.

Customer Foo's Requirements

The following are the customer's requirements for Metron with respect to this new data source:

  1. The proxy events from Squid logs need to be ingested in real-time.
  2. The proxy logs must be parsed into a standardized JSON structure that Metron can understand.
  3. In real-time, the Squid proxy event needs to be enriched so that the domain names are enriched with the IP information.
  4. In real-time, the IP within the proxy event must be checked for threat intel feeds.
  5. If there is a threat intel hit, an alert needs to be raised.
  6. The end user must be able to see the new telemetry events and the alerts from the new data source.
  7. All of these requirements will need to be implemented easily without writing any new Java code.

What is Squid?

Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. For more information on Squid see Squid-cache.org.

How Metron Enriches a Squid Telemetry Event

When you make an outbound http connection to https://www.cnn.com from a given host, the following entry is added to a Squid file called access.log.

3808-squid-log.png

The following represents the magic that Metron will do to this telemetry event as it is streamed through the platform in real-time:

3809-squid-event-metron-magic.png

Key Points

Some key points to highlight as you go this multi-part article series

  • We will be adding a net new data source without writing any code. Metron strives for easy extensibility and this is a good example of it.
  • This is a repeatable pattern for a majority of telemetry data sources.
Read the next article, on how to collect and push data into Metron and then parse data in the Metron platform: Collecting and Parsing Telemetry Data.
1,330 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 12:34 PM
Updated by:
 
Contributors
Top Kudoed Authors