- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 04-06-2016 01:33 AM
Platform Theme Key Features
Fully Automated Scripted Install of Metron on AWS
One of the largest hurdles we have heard about from the community and customers working with the original OpenSoc code base was that it was nearly impossible to get the application up and running. Hence, our engineering team collaborated with the community to provide a scripted automated install of Metron on AWS.
The install only requires the user’s AWS credentials, a set of ansible scripts/playbooks, and Ambari BluePrints / APIs and AWS APIs to deploy the full end to end Metron application. The below table summarizes the steps that occur during the automated install.
Step | Description | Components Deployed |
Step 1 | Spin up EC2 instances where HDP and Metron will be installed and deployed |
|
Step 2 | Spin up an AWS VPC |
|
Step 3 | Install Ambari Server and Agents via Ansible Scripts |
|
Step 4 | Using Ambari Blueprints and APIS, install 7 Node HDP 2.3 Cluster with the following Services: HDFS, YARN, Zookeeper, Storm, Hbase, and Kafka. The blueprint used to deploy the HDP cluster can be found here: Metron Small Cluster Ambari BluePrint |
|
Step 5 | Install 2 Node Elastic Search Cluster |
|
Step 6 | Installation and Starting of the following data source probes: BRO, Snort, PCAP probe, YAF (netflow). This entails the following:
|
|
Step 7 | Deployment of 5 Metron Storm Topologies:
|
|
Step 8 | Configuration of Kafka Topics and Hbase Tables | |
Step 9 | Install mySQL to store GeoIP enrichment data. The mySQL DB will be populated with GeoIP information from Maxmind Geolite |
|
Step 10 | Installation of a Metron UI for the SOC Analyst and Investigator persona. |
|
Deployment Architecture After Install
The installer will take about 60-90 minutes to execute fully. However, it could vary drastically based on how AWS is feeling during the execution. After the installer finishes, the deployment architecture of the app will look like the following.
Metron Storm Topology Refactor / Re-Architecture
Another area of focus for Metron TP1 was to address the following challenges with the old OpenSoc Topology architecture which were:
- Code was extremely brittle
- Storm Topologies were designed without taking advantage of full parallelism
- Numerous“redundant” topologies
- Management of the app was difficult due to a number of complex topologies
- Very complex to add new Data Sources to the platform
- Very little unit and integration Testing
Some key re-architecture and refactor work done in TP1 to address these challenges were the following:
- Made the Metron code base simpler and easier to maintain by converting all Storm topologies to use flux configuration (declarative way to wire topologies together).
- Ability to to add new data source parsers without writing code using the Grok Framework parser.
- Enrichment, model and threat intel intel cross reference are now done in parallel as opposed to sequentially in the storm configuration
- Minimized the incremental costs of adding new topologies by having one common enrichment topology for all data sources
- All App configuration is stored in Zookeeper allowing one to manage app config at runtime without stopping the topology
- Improved code with new unit and integration test harness utilities
Old OpenSoc Architecture
In the Old OpenSoc Architecture, some key limitations were the following:
- For every new data source, a new complex storm topology had to be added
- Each enrichment, threat intel reference and model execution was done sequentially
- No in-memory caching for enrichments or threat intel checks
- No Loader frameworks to load Enrichment or Threat Intel Stores
The below diagram illustrates the old architecture.
New Metron Architecture
With the new Metron Architecture, the key changes are:
- Adding a new data source means simply adding new normalizing/parser topology
- 1 common enrichment topology can be used for all data sources
- Using the Splitter/Joiner pattern, enrichments/models/threat intel execution is done in parallel
- Loader frameworks have been added to load the Enrichment and Threat Intel Stores
- Fast Cache has been added for enrichment and threat intel look ups
The below diagram illustrates the new architecture.
Telemetry Data Source Theme Key Features
PCAP - Packet Capture
PCAP represents the most granular data collected in Metron consisting of individual packets and frames. Metron uses a DPDK which provides a set of libraries and drivers for fast packet collection and processing.
See the following for more details: Metron Packet Capture Probe Design
YAF/Netflow
Netflow data represents rolled up PCAP data up to the flow/session level, a summary of the sequence of packets between two machines up to the layer 4 protocol. If one doesn’t want to ingest PCAP due to space constraints and load exerted on infrastructure, then netflow is recommended. Metron uses YAF (Yet Another Flowmeter) to generate IPFIX (Netflow) data from Metrons PCAP robe. Hence the output of the the YAF probe is IPFIX instead of the raw packets.
See the following for more details: Metron YAF Capture Design
Bro
Bro is an IDS (Intrusion Detection System) but Metron uses Bro primarily as a Deep Packet Inspection (DPI) metadata generator.The metadata consists of network activity details up to layer 7 which is application level protocol (DNS, HTTP, FTP, SSH, SSL). Extracting DPI Metadata (layer 7 visibility) is expensive, and thus, is performed only on selected protocols. Hence, the recommendation is to turn on DPI for HTTP and DNS Protocols. Hence, while the PCAP probe records every single packet it sees on the wire, the DPI metadata is extracted only for a subset of these packets. This metadata is one of the most valuable network data for analytics.
See the following for more details: Metron Bro Capture Design
Snort
Snort is a popular Network Intrusion Prevention System (NIPS). Snort monitors network traffic and produces alerts that are generated based on signatures from community rules. Metron plays the output of the packet capture probe to Snort and whenever Snort alerts are triggered
Metron uses Apache Flume to pipe these alerts to a Kafka topic.
See the following for more details: Metron Snort Capture Design
Why are these Network Telemetry Sources Important?
A common question is why we focused first on these initial set of network telemetry data sources. Keep in mind that the end vision of Apache Metron is to be an analytics platform. These 4 network telemetry data sources are some of the key data sources required for some of the next generation ML, MLP and statistical models that we are planning to build in future releases. The below table describes some of these models and the data input requirements.
Analytics Pack | Analytics Pack Description | Telemetry Data Source Required |
Domain Pack | A collection of Machine Learning models that identify anomalies for incoming and outgoing connections made to a specific domain that appear to be malicious |
|
UEBA Pack | A collection of Machine Learning models that monitor assets and users known to belegitimate to identify anomalies from their normal behavior. |
|
Relevancy/Correlation Engine Pack | A collection of Machine Learning models that identify alerts that are related within the massive volumes of alerts being processed by the cyber solutions. |
|
Protocol Anomaly Pack | A collection of Machine Learning models that identifies if there anything unusual about network traffic monitored via deep packet inspection (PCAP) |
|
The system is configurable so that one can enable only the data sources of interest.
In future Metron tech previews, we will be adding support for these types of security data sources:
- FireEye
- Palo Alto Network
- Active Directory
- BlueCoat
- SourceFire
- Bit9 CarbonBlack
- Lancope
- Cisco ISE
Real-time Data Processing Theme Key Features
Enrichment Services
The below diagram illustrates the Enrichment framework that was built in Metron TP1. The key components of the framework are:
- Enrichment Loader Framework - A framework that bulk loads or polls data from an enrichment source. The framework supports plugging in any enrichment source
- Enrichment Store - The Store where all enrichment data is stored. HBase will be the primary store. The store will also provide services to de-dup and age data.
- Enrichment Bolt - A Storm Bolt that enriches metron telemetry events
- Enrichment Cache - Cache used by the bolt so that look ups to the enrichment store is cache
The specific enrichments supported in Metron TP1 is below.
Enrichment | Description | Enrichment Source, Store, Loader Type, Refresh Rate | Metron Message Field Name that will Enriched |
GeoIP | Tags on GeoIP (lat-lon coordinates + City/State/Country) to any external IP address. This can be applied both to alerts as well as metadata telemetries to be able to map them to a geo location. |
| Src_ip, dest_ip |
Host | Enriches IP with Host details |
| dest_ip |
More details can be found here: Metron Enrichment Services
Threat Intel Services
The Threat Intel framework is very similar to the Enrichment framework. See below architecture diagram.
The specific threat intel services supported in TP1 is below.
Threat Feed | Feed Description | Feed Format | Refresh Rate |
Soltra | Threat Intel Aggregator | Stix/Taxii | Poll every 5 minutes |
Hail a Taxi | Repository of Open Source Cyber Threat Intellegence feeds in STIX format. | Stix/Taxii | Poll every 5 minutes |
More details can be found here: Metron Threat Intel Services
Created on 04-14-2016 08:31 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hello @George Vetticaden
Again, as concerned in source of Metron, there is no Hive but metron new architecture picture which is shown in above, shows as HDFS Bolt to Hive in enrichment storm topology. However, currently, there is only available HBase that is used in Metron for big data store?
Also as shown in Metron currently just one Index Bolt(actually writer bolts) to HDFS and/or ES/Solrin enrichment topology, also for only pcap only is writen to HDFS and HBase without indexing, so there is no Alert Bolt and Kafka Bolt?
Are they planned new feature or?
Thanks in advance.