Community Articles

gvetticaden1 · ‎04-06-2016

Platform Theme Key Features

Fully Automated Scripted Install of Metron on AWS

One of the largest hurdles we have heard about from the community and customers working with the original OpenSoc code base was that it was nearly impossible to get the application up and running. Hence, our engineering team collaborated with the community to provide a scripted automated install of Metron on AWS.

The install only requires the user’s AWS credentials, a set of ansible scripts/playbooks, and Ambari BluePrints / APIs and AWS APIs to deploy the full end to end Metron application. The below table summarizes the steps that occur during the automated install.

Step	Description	Components Deployed
Step 1	Spin up EC2 instances where HDP and Metron will be installed and deployed	10 m4.xlarge instances
Step 2	Spin up an AWS VPC	1 AWS VPC
Step 3	Install Ambari Server and Agents via Ansible Scripts	Ambari Server 2.1.2.1 on master node Ambari Agents on slave nodes
Step 4	Using Ambari Blueprints and APIS, install 7 Node HDP 2.3 Cluster with the following Services: HDFS, YARN, Zookeeper, Storm, Hbase, and Kafka. The blueprint used to deploy the HDP cluster can be found here: Metron Small Cluster Ambari BluePrint	7 Node HDP Cluster HDP Services: HDFS, YARN, Zookeeper, Storm, HBase & Kafka
Step 5	Install 2 Node Elastic Search Cluster	2 Node ES 1.7 Cluster
Step 6	Installation and Starting of the following data source probes: BRO, Snort, PCAP probe, YAF (netflow). This entails the following: Install and Start C++ PCAP Probe that captures PCAP data and pushed into Kafka Topic Install and Start YAF probe to capture netflow data Installation of BRO, Kafka Bro Plugin and starting these services Install and Start SNORT with community SNORT rules configured	C++ PCAP Probe YAF/Netflow Probe BRO Server and Bro Kafka Plugin Snort Server
Step 7	Deployment of 5 Metron Storm Topologies: 4 Parser Topologies for each Data Source supported (PCAP, Bro, YAF, SNORT) 1 Common Enrichment topology	Install and Deployment of 5 Storm Topologies
Step 8	Configuration of Kafka Topics and Hbase Tables
Step 9	Install mySQL to store GeoIP enrichment data. The mySQL DB will be populated with GeoIP information from Maxmind Geolite	Install of MySQL with GeoIP information
Step 10	Installation of a Metron UI for the SOC Analyst and Investigator persona.	Metron UI (Kibana Dashboard)

Deployment Architecture After Install

The installer will take about 60-90 minutes to execute fully. However, it could vary drastically based on how AWS is feeling during the execution. After the installer finishes, the deployment architecture of the app will look like the following.

Metron Storm Topology Refactor / Re-Architecture

Another area of focus for Metron TP1 was to address the following challenges with the old OpenSoc Topology architecture which were:

Code was extremely brittle
Storm Topologies were designed without taking advantage of full parallelism
Numerous“redundant” topologies
Management of the app was difficult due to a number of complex topologies
Very complex to add new Data Sources to the platform
Very little unit and integration Testing

Some key re-architecture and refactor work done in TP1 to address these challenges were the following:

Made the Metron code base simpler and easier to maintain by converting all Storm topologies to use flux configuration (declarative way to wire topologies together).
Ability to to add new data source parsers without writing code using the Grok Framework parser.
Enrichment, model and threat intel intel cross reference are now done in parallel as opposed to sequentially in the storm configuration
Minimized the incremental costs of adding new topologies by having one common enrichment topology for all data sources
All App configuration is stored in Zookeeper allowing one to manage app config at runtime without stopping the topology
Improved code with new unit and integration test harness utilities

Old OpenSoc Architecture

In the Old OpenSoc Architecture, some key limitations were the following:

For every new data source, a new complex storm topology had to be added
Each enrichment, threat intel reference and model execution was done sequentially
No in-memory caching for enrichments or threat intel checks
No Loader frameworks to load Enrichment or Threat Intel Stores

The below diagram illustrates the old architecture.

New Metron Architecture

With the new Metron Architecture, the key changes are:

Adding a new data source means simply adding new normalizing/parser topology
1 common enrichment topology can be used for all data sources
Using the Splitter/Joiner pattern, enrichments/models/threat intel execution is done in parallel
Loader frameworks have been added to load the Enrichment and Threat Intel Stores
Fast Cache has been added for enrichment and threat intel look ups

The below diagram illustrates the new architecture.

Telemetry Data Source Theme Key Features

PCAP - Packet Capture

PCAP represents the most granular data collected in Metron consisting of individual packets and frames. Metron uses a DPDK which provides a set of libraries and drivers for fast packet collection and processing.

See the following for more details: Metron Packet Capture Probe Design

YAF/Netflow

Netflow data represents rolled up PCAP data up to the flow/session level, a summary of the sequence of packets between two machines up to the layer 4 protocol. If one doesn’t want to ingest PCAP due to space constraints and load exerted on infrastructure, then netflow is recommended. Metron uses YAF (Yet Another Flowmeter) to generate IPFIX (Netflow) data from Metrons PCAP robe. Hence the output of the the YAF probe is IPFIX instead of the raw packets.

See the following for more details: Metron YAF Capture Design

Bro

Bro is an IDS (Intrusion Detection System) but Metron uses Bro primarily as a Deep Packet Inspection (DPI) metadata generator.The metadata consists of network activity details up to layer 7 which is application level protocol (DNS, HTTP, FTP, SSH, SSL). Extracting DPI Metadata (layer 7 visibility) is expensive, and thus, is performed only on selected protocols. Hence, the recommendation is to turn on DPI for HTTP and DNS Protocols. Hence, while the PCAP probe records every single packet it sees on the wire, the DPI metadata is extracted only for a subset of these packets. This metadata is one of the most valuable network data for analytics.

See the following for more details: Metron Bro Capture Design

Snort

Snort is a popular Network Intrusion Prevention System (NIPS). Snort monitors network traffic and produces alerts that are generated based on signatures from community rules. Metron plays the output of the packet capture probe to Snort and whenever Snort alerts are triggered

Metron uses Apache Flume to pipe these alerts to a Kafka topic.

See the following for more details: Metron Snort Capture Design

Why are these Network Telemetry Sources Important?

A common question is why we focused first on these initial set of network telemetry data sources. Keep in mind that the end vision of Apache Metron is to be an analytics platform. These 4 network telemetry data sources are some of the key data sources required for some of the next generation ML, MLP and statistical models that we are planning to build in future releases. The below table describes some of these models and the data input requirements.

Analytics Pack	Analytics Pack Description	Telemetry Data Source Required
Domain Pack	A collection of Machine Learning models that identify anomalies for incoming and outgoing connections made to a specific domain that appear to be malicious	Bro
UEBA Pack	A collection of Machine Learning models that monitor assets and users known to belegitimate to identify anomalies from their normal behavior.	Bro User Enrichment Asset Enrichment User Auth Logs Asset Inventory Logs
Relevancy/Correlation Engine Pack	A collection of Machine Learning models that identify alerts that are related within the massive volumes of alerts being processed by the cyber solutions.	Snort Surracata Third Party Alerts
Protocol Anomaly Pack	A collection of Machine Learning models that identifies if there anything unusual about network traffic monitored via deep packet inspection (PCAP)	PCAP YAF/Netflow Bro

The system is configurable so that one can enable only the data sources of interest.

In future Metron tech previews, we will be adding support for these types of security data sources:

FireEye
Palo Alto Network
Active Directory
BlueCoat
SourceFire
Bit9 CarbonBlack
Lancope
Cisco ISE

Real-time Data Processing Theme Key Features

Enrichment Services

The below diagram illustrates the Enrichment framework that was built in Metron TP1. The key components of the framework are:

Enrichment Loader Framework - A framework that bulk loads or polls data from an enrichment source. The framework supports plugging in any enrichment source
Enrichment Store - The Store where all enrichment data is stored. HBase will be the primary store. The store will also provide services to de-dup and age data.
Enrichment Bolt - A Storm Bolt that enriches metron telemetry events
Enrichment Cache - Cache used by the bolt so that look ups to the enrichment store is cache

The specific enrichments supported in Metron TP1 is below.

Enrichment	Description	Enrichment Source, Store, Loader Type, Refresh Rate	Metron Message Field Name that will Enriched
GeoIP	Tags on GeoIP (lat-lon coordinates + City/State/Country) to any external IP address. This can be applied both to alerts as well as metadata telemetries to be able to map them to a geo location.	Enrich Source: Maxmind Geolite Metron Store: MySQL (Will Use HBase in next TP) Loader Type: Bulk load from HDFS Refresh Rate: Every 3 months	Src_ip, dest_ip
Host	Enriches IP with Host details	Enrich Source: Enterprise Inventory/Asset Store Metron Store: HFDS Loader Type: Bulk load from HDFS	dest_ip

More details can be found here: Metron Enrichment Services

Threat Intel Services

The Threat Intel framework is very similar to the Enrichment framework. See below architecture diagram.

The specific threat intel services supported in TP1 is below.

Threat Feed	Feed Description	Feed Format	Refresh Rate
Soltra	Threat Intel Aggregator	Stix/Taxii	Poll every 5 minutes
Hail a Taxi	Repository of Open Source Cyber Threat Intellegence feeds in STIX format.	Stix/Taxii	Poll every 5 minutes

More details can be found here: Metron Threat Intel Services

hakansel05 · ‎04-14-2016

Hello @George Vetticaden

Again, as concerned in source of Metron, there is no Hive but metron new architecture picture which is shown in above, shows as HDFS Bolt to Hive in enrichment storm topology. However, currently, there is only available HBase that is used in Metron for big data store?

Also as shown in Metron currently just one Index Bolt(actually writer bolts) to HDFS and/or ES/Solrin enrichment topology, also for only pcap only is writen to HDFS and HBase without indexing, so there is no Alert Bolt and Kafka Bolt?

Are they planned new feature or?

Thanks in advance.

Cloudera Community