Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
Rising Star

Platform Theme Key Features

Fully Automated Scripted Install of Metron on AWS

One of the largest hurdles we have heard about from the community and customers working with the original OpenSoc code base was that it was nearly impossible to get the application up and running. Hence, our engineering team collaborated with the community to provide a scripted automated install of Metron on AWS.

The install only requires the user’s AWS credentials, a set of ansible scripts/playbooks, and Ambari BluePrints / APIs and AWS APIs to deploy the full end to end Metron application. The below table summarizes the steps that occur during the automated install.

StepDescriptionComponents Deployed
Step 1Spin up EC2 instances where HDP and Metron will be installed and deployed
  • 10 m4.xlarge instances
Step 2 Spin up an AWS VPC
  • 1 AWS VPC
Step 3Install Ambari Server and Agents via Ansible Scripts
  • Ambari Server on master node
  • Ambari Agents on slave nodes
Step 4Using Ambari Blueprints and APIS, install 7 Node HDP 2.3 Cluster with the following Services: HDFS, YARN, Zookeeper, Storm, Hbase, and Kafka. The blueprint used to deploy the HDP cluster can be found here: Metron Small Cluster Ambari BluePrint
  • 7 Node HDP Cluster
  • HDP Services: HDFS, YARN, Zookeeper, Storm, HBase & Kafka
Step 5 Install 2 Node Elastic Search Cluster
  • 2 Node ES 1.7 Cluster
Step 6Installation and Starting of the following data source probes: BRO, Snort, PCAP probe, YAF (netflow). This entails the following:
  • Install and Start C++ PCAP Probe that captures PCAP data and pushed into Kafka Topic
  • Install and Start YAF probe to capture netflow data
  • Installation of BRO, Kafka Bro Plugin and starting these services
  • Install and Start SNORT with community SNORT rules configured
  • C++ PCAP Probe
  • YAF/Netflow Probe
  • BRO Server and Bro Kafka Plugin
  • Snort Server

Step 7

Deployment of 5 Metron Storm Topologies:
  • 4 Parser Topologies for each Data Source supported (PCAP, Bro, YAF, SNORT)
  • 1 Common Enrichment topology
  • Install and Deployment of 5 Storm Topologies

Step 8

Configuration of Kafka Topics and Hbase Tables
Step 9Install mySQL to store GeoIP enrichment data. The mySQL DB will be populated with GeoIP information from Maxmind Geolite
  • Install of MySQL with GeoIP information
Step 10Installation of a Metron UI for the SOC Analyst and Investigator persona.
  • Metron UI (Kibana Dashboard)

Deployment Architecture After Install

The installer will take about 60-90 minutes to execute fully. However, it could vary drastically based on how AWS is feeling during the execution. After the installer finishes, the deployment architecture of the app will look like the following.

Metron Storm Topology Refactor / Re-Architecture

Another area of focus for Metron TP1 was to address the following challenges with the old OpenSoc Topology architecture which were:

  • Code was extremely brittle
  • Storm Topologies were designed without taking advantage of full parallelism
  • Numerous“redundant” topologies
  • Management of the app was difficult due to a number of complex topologies
  • Very complex to add new Data Sources to the platform
  • Very little unit and integration Testing

Some key re-architecture and refactor work done in TP1 to address these challenges were the following:

  • Made the Metron code base simpler and easier to maintain by converting all Storm topologies to use flux configuration (declarative way to wire topologies together).
  • Ability to to add new data source parsers without writing code using the Grok Framework parser.
  • Enrichment, model and threat intel intel cross reference are now done in parallel as opposed to sequentially in the storm configuration
  • Minimized the incremental costs of adding new topologies by having one common enrichment topology for all data sources
  • All App configuration is stored in Zookeeper allowing one to manage app config at runtime without stopping the topology
  • Improved code with new unit and integration test harness utilities

Old OpenSoc Architecture

In the Old OpenSoc Architecture, some key limitations were the following:

  • For every new data source, a new complex storm topology had to be added
  • Each enrichment, threat intel reference and model execution was done sequentially
  • No in-memory caching for enrichments or threat intel checks
  • No Loader frameworks to load Enrichment or Threat Intel Stores

The below diagram illustrates the old architecture.

New Metron Architecture

With the new Metron Architecture, the key changes are:

  • Adding a new data source means simply adding new normalizing/parser topology
  • 1 common enrichment topology can be used for all data sources
  • Using the Splitter/Joiner pattern, enrichments/models/threat intel execution is done in parallel
  • Loader frameworks have been added to load the Enrichment and Threat Intel Stores
  • Fast Cache has been added for enrichment and threat intel look ups

The below diagram illustrates the new architecture.

Telemetry Data Source Theme Key Features

PCAP - Packet Capture

PCAP represents the most granular data collected in Metron consisting of individual packets and frames. Metron uses a DPDK which provides a set of libraries and drivers for fast packet collection and processing.

See the following for more details: Metron Packet Capture Probe Design


Netflow data represents rolled up PCAP data up to the flow/session level, a summary of the sequence of packets between two machines up to the layer 4 protocol. If one doesn’t want to ingest PCAP due to space constraints and load exerted on infrastructure, then netflow is recommended. Metron uses YAF (Yet Another Flowmeter) to generate IPFIX (Netflow) data from Metrons PCAP robe. Hence the output of the the YAF probe is IPFIX instead of the raw packets.

See the following for more details: Metron YAF Capture Design


Bro is an IDS (Intrusion Detection System) but Metron uses Bro primarily as a Deep Packet Inspection (DPI) metadata generator.The metadata consists of network activity details up to layer 7 which is application level protocol (DNS, HTTP, FTP, SSH, SSL). Extracting DPI Metadata (layer 7 visibility) is expensive, and thus, is performed only on selected protocols. Hence, the recommendation is to turn on DPI for HTTP and DNS Protocols. Hence, while the PCAP probe records every single packet it sees on the wire, the DPI metadata is extracted only for a subset of these packets. This metadata is one of the most valuable network data for analytics.

See the following for more details: Metron Bro Capture Design


Snort is a popular Network Intrusion Prevention System (NIPS). Snort monitors network traffic and produces alerts that are generated based on signatures from community rules. Metron plays the output of the packet capture probe to Snort and whenever Snort alerts are triggered

Metron uses Apache Flume to pipe these alerts to a Kafka topic.

See the following for more details: Metron Snort Capture Design

Why are these Network Telemetry Sources Important?

A common question is why we focused first on these initial set of network telemetry data sources. Keep in mind that the end vision of Apache Metron is to be an analytics platform. These 4 network telemetry data sources are some of the key data sources required for some of the next generation ML, MLP and statistical models that we are planning to build in future releases. The below table describes some of these models and the data input requirements.

Analytics Pack Analytics Pack DescriptionTelemetry Data Source


Domain PackA collection of Machine Learning models that identify anomalies for incoming and outgoing connections made to a specific domain that appear to be malicious
  • Bro
UEBA PackA collection of Machine Learning models that monitor assets and users known to belegitimate to identify anomalies from their normal behavior.
  • Bro
  • User Enrichment
  • Asset Enrichment
  • User Auth Logs
  • Asset Inventory Logs
Relevancy/Correlation Engine PackA collection of Machine Learning models that identify alerts that are related within the massive volumes of alerts being processed by the cyber solutions.
  • Snort
  • Surracata
  • Third Party Alerts

Protocol Anomaly Pack

A collection of Machine Learning models that identifies if there anything unusual about network traffic monitored via deep packet inspection (PCAP)
  • PCAP
  • YAF/Netflow
  • Bro

The system is configurable so that one can enable only the data sources of interest.

In future Metron tech previews, we will be adding support for these types of security data sources:

  • FireEye
  • Palo Alto Network
  • Active Directory
  • BlueCoat
  • SourceFire
  • Bit9 CarbonBlack
  • Lancope
  • Cisco ISE

Real-time Data Processing Theme Key Features

Enrichment Services

The below diagram illustrates the Enrichment framework that was built in Metron TP1. The key components of the framework are:

  • Enrichment Loader Framework - A framework that bulk loads or polls data from an enrichment source. The framework supports plugging in any enrichment source
  • Enrichment Store - The Store where all enrichment data is stored. HBase will be the primary store. The store will also provide services to de-dup and age data.
  • Enrichment Bolt - A Storm Bolt that enriches metron telemetry events
  • Enrichment Cache - Cache used by the bolt so that look ups to the enrichment store is cache

The specific enrichments supported in Metron TP1 is below.


DescriptionEnrichment Source, Store, Loader Type, Refresh RateMetron Message Field Name that will Enriched
GeoIPTags on GeoIP (lat-lon coordinates + City/State/Country) to any external IP address. This can be applied both to alerts as well as metadata telemetries to be able to map them to a geo location.
  • Enrich Source: Maxmind Geolite
  • Metron Store: MySQL (Will Use HBase in next TP)
  • Loader Type: Bulk load from HDFS
  • Refresh Rate: Every 3 months

Src_ip, dest_ip

HostEnriches IP with Host details
  • Enrich Source: Enterprise Inventory/Asset Store
  • Metron Store: HFDS
  • Loader Type: Bulk load from HDFS

More details can be found here: Metron Enrichment Services

Threat Intel Services

The Threat Intel framework is very similar to the Enrichment framework. See below architecture diagram.

The specific threat intel services supported in TP1 is below.

Threat FeedFeed DescriptionFeed FormatRefresh Rate
SoltraThreat Intel AggregatorStix/TaxiiPoll every 5 minutes
Hail a TaxiRepository of Open Source Cyber Threat Intellegence feeds in STIX format.


Poll every 5 minutes

More details can be found here: Metron Threat Intel Services

Not applicable

Hello @George Vetticaden

Again, as concerned in source of Metron, there is no Hive but metron new architecture picture which is shown in above, shows as HDFS Bolt to Hive in enrichment storm topology. However, currently, there is only available HBase that is used in Metron for big data store?

Also as shown in Metron currently just one Index Bolt(actually writer bolts) to HDFS and/or ES/Solrin enrichment topology, also for only pcap only is writen to HDFS and HBase without indexing, so there is no Alert Bolt and Kafka Bolt?

Are they planned new feature or?

Thanks in advance.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎04-06-2016 01:33 AM
Updated by:
Top Kudoed Authors