Created on 05-02-201605:22 PM - edited 08-17-201912:33 PM
In previous article of the sereies, Adding a New Telemetry Data Source to Apache Metron, we walked through how to add a new data source squid to Apache Metron. The inevitable next question is how I can enrich the telemetry events in real-time as it flows through the platform. Enrichment is critical when identifying threats or as we like to call it "finding the needle in the haystack". The customers requirement are the following
The proxy events from Squid logs needs to ingested in real-time.
The proxy logs has to be parsed into a standardized JSON structure that Metron can understand.
In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information
In real-time, the IP with in the proxy event must be checked against for threat intel feeds.
If there is a threat intel hit, an alert needs to be raised
The end user must be able to see the new telemetry events and the alerts from the new data source.
All of this requirements will need to be implemented easily without writing any new java code.
In this article, we will walk you through how to do 3.
Metron Enrichment Framework Explained
Step 1: Enrichment Source
Whois data is expensive so we will not be providing it. Instead we wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows:
Cut and paste this data into a file called "whois_ref.csv" on your virtual machine. This csv file represents our enrichment source
The schema of this enrichment source is domain|owner|registeredCountry|registeredTimestamp. Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a pull pointer exception.
We need to now configure an extractor config file that describes the enrichment source.
Please cut and paste this file into a file called "extractor_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run
Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run
Now that we have the enrichment source and enrichment config defined, we can now run the loader to move the data from the enrichment source to the Metron enrichment Store and store the enrichment config in zookeeper.
After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called enrichment. To verify that the logs were properly ingested into Hbase run the following command:
You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated:
In order to demonstrate the enrichment capabilities of Metron you need to drop all existing indexes for Squid where the data was ingested prior to enrichments being enabled. To do so go back to the head plugin and deleted the indexes like so:
Make sure you delete all Squid indexes. Re-ingest the data (see previous blog post) and the messages should be automatically enriched.
In the Metron-UI, refresh the dashboard and view the data in the Squid Panel in the dashboard:
Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country)