Member since
08-31-2015
81
Posts
115
Kudos Received
17
Solutions
05-05-2016
11:48 PM
2 Kudos
In previous article of the sereies, Enriching Telemetry Events, we walked through how to enrich a domain element of a given telemetry event with WhoIs data like home country, company associated with domain, etc. In this article, we will enrich with a special type of data called threat intel feeds. When a given telemetry event matches data in a threat Intel feed, an alert is generated. Again, the customers requirement are the following:
The proxy events from Squid logs needs to ingested in real-time. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand. In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information In real-time, the IP with in the proxy event must be checked against for threat intel feeds. If there is a threat intel hit, an alert needs to be raised. The end user must be able to see the new telemetry events and the alerts from the new data source. All of this requirements will need to be implemented easily without writing any new java code. In this article, we will walk you through how to do 4 and 5. Threat Intel Framework Explained Metron currently provides an extensible framework to plug in threat intel sources. Each threat intel source has two components: an enrichment data source and and enrichment bolt. The threat intelligence feeds are bulk loaded and streamed into a threat intelligence store similarly to how the enrichment feeds are loaded. The keys are loaded in a key-value format. The key is the indicator and the value is the JSON formatted description of what the indicator is. It is recommended to use a threat feed aggregator such as Soltra to dedup and normalize the feeds via Stix/Taxii. Metron provides an adapter that is able to read Soltra-produced Stix/Taxii feeds and stream them into Hbase, which is the data store of choice to back high speed threat intel lookups of Metron. Metron additionally provides a flat file and Stix bulk loader that can normalize, dedup, and bulk load or stream threat intel data into Hbase even without the use of a threat feed aggregator. The below diagram illustrates the architecture: Step 1: Threat Intel Feed Source Metron is designed to work with Stix/Taxii threat feeds, but can also be bulk loaded with threat data from a CSV file. In this example we will explore the CSV example. The same loader framework that is used for enrichment here is used for threat intelligence. Similarly to enrichments we need to setup a data.csv file, the extractor config JSON and the enrichment config JSON. For this example we will be using a Zeus malware tracker list located here: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist. Copy the data form the above link into a file called domainblocklist.txt on your VM. Run the following command to parse the above file to a csv file called domainblocklist.csv cat domainblocklist.txt | grep -v "^#" | grep -v "^$" | grep -v "^https" | awk '{print $1",abuse.ch”}' > domainblocklist.csv Now that we have the "Threat Intel Feed Source" , we need to now configure an extractor config file that describes the the source. Create a file called extractor_config_temp.json and put the following contents in it. {
"config" : {
"columns" : {
"domain" : 0
,"source" : 1
}
,"indicator_column" : "domain"
,"type" : "zeusList"
,"separator" : ","
}
,"extractor" : "CSV"
}
Run the following to remove the non-ascii characters we run the following: iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
Step 2: Configure Element to Threat Intel Feed Mapping We now have to configure what element of a tuple and what threat intel feed to cross-reference with.This configuration will be stored in zookeeper. The config looks like the following: {
"zkQuorum" : "node1:2181"
,"sensorToFieldList" : {
"bro" : {
"type" : "THREAT_INTEL"
,"fieldToEnrichmentTypes" : {
"url" : [ "zeusList" ]
}
}
}
}
Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json
iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json Step 3: Run the Threat Intel Loader Now that we have the threat intel source and threat intel config defined, we can now run the loader to move the data from the threat intel source to the Metron threat intel Store and store the enrichment config in zookeeper. /usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i abuse.csv -t threatintel -c t -e extractor_config.json
After this, the threat intel data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called threatintel. To verify that the logs were properly ingested into Hbase run the following command: hbase shell
scan 'threatintel'
You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated: /usr/metron/0.1BETA/bin/zk_load_configs.sh -z localhost:2181 Generate some data by using the squid client to execute http requests (do this about 20 times) squidclient http://www.alamman.com
squidclient http://www.atmape.ru
View the Threat Alerts in Metron UI
When the logs are ingested we get messages that has a hit against threat intel: Notice a couple of characteristics about this message. It has is_alert=true, which designates it as an alert message. Now that we have alerts coming through we need to visualize them in Kibana. First, we need to setup a pinned query to look for messages where is_alert=true: And then once we point the alerts table to this pinned query it looks like this:
... View more
Labels:
05-02-2016
05:22 PM
1 Kudo
In previous article of the sereies, Adding a New Telemetry Data Source to Apache Metron, we walked through how to add a new data source squid to Apache Metron. The inevitable next question is how I can enrich the telemetry events in real-time as it flows through the platform. Enrichment is critical when identifying threats or as we like to call it "finding the needle in the haystack". The customers requirement are the following
The proxy events from Squid logs needs to ingested in real-time. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand. In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information In real-time, the IP with in the proxy event must be checked against for threat intel feeds. If there is a threat intel hit, an alert needs to be raised The end user must be able to see the new telemetry events and the alerts from the new data source. All of this requirements will need to be implemented easily without writing any new java code. In this article, we will walk you through how to do 3. Metron Enrichment Framework Explained Step 1: Enrichment Source Whois data is expensive so we will not be providing it. Instead we wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows: google.com, "Google Inc.", "US", "Dns Admin",874306800000
work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000
capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000
cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000
cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000
news.com, "CBS Interactive Inc.", "US", "Domain Admin",833353200000
nba.com, "NBA Media Ventures, LLC", "US", "C/O Domain Administrator",786027600000
espn.com, "ESPN, Inc.", "US", "ESPN, Inc.",781268400000
pravda.com, "Internet Invest, Ltd. dba Imena.ua", "UA", "Whois privacy protection service",806583600000
hortonworks.com, "Hortonworks, Inc.", "US", "Domain Administrator",1303427404000
microsoft.com, "Microsoft Corporation", "US", "Domain Administrator",673156800000
yahoo.com, "Yahoo! Inc.", "US", "Domain Administrator",790416000000
rackspace.com, "Rackspace US, Inc.", "US", "Domain Admin",903092400000
Cut and paste this data into a file called "whois_ref.csv" on your virtual machine. This csv file represents our enrichment source The schema of this enrichment source is domain|owner|registeredCountry|registeredTimestamp. Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a pull pointer exception. We need to now configure an extractor config file that describes the enrichment source. {
"config" : {
"columns" : {
"domain" : 0
,"owner" : 1
,"home_country" : 2
,"registrar": 3
,"domain_created_timestamp": 4
}
,"indicator_column" : "domain"
,"type" : "whois"
,"separator" : ","
}
,"extractor" : "CSV"
}
Please cut and paste this file into a file called "extractor_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
Step 2: Configure Element to Enrichment Mapping We now have to configure what element of a tuple should be enriched with what enrichment type. This configuration will be stored in zookeeper. The config looks like the following: {
"zkQuorum" : "node1:2181"
,"sensorToFieldList" : {
"squid" : {
"type" : "ENRICHMENT"
,"fieldToEnrichmentTypes" : {
"url" : [ "whois" ]
}
}
}
}
Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json Step 3: Run the Enrichment Loader Now that we have the enrichment source and enrichment config defined, we can now run the loader to move the data from the enrichment source to the Metron enrichment Store and store the enrichment config in zookeeper. /usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e extractor_config.json
After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called enrichment. To verify that the logs were properly ingested into Hbase run the following command: hbase shell
scan 'enrichment'
You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated: /usr/metron/0.1BETA/bin/zk_load_configs.sh -z localhost:2181 Generate some data by using the squid client to execute http requests (do this about 20 times) squidclient http://www.cnn.com View the Enrichment Telemetry Events in Metron UI In order to demonstrate the enrichment capabilities of Metron you need to drop all existing indexes for Squid where the data was ingested prior to enrichments being enabled. To do so go back to the head plugin and deleted the indexes like so: Make sure you delete all Squid indexes. Re-ingest the data (see previous blog post) and the messages should be automatically enriched. In the Metron-UI, refresh the dashboard and view the data in the Squid Panel in the dashboard: Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country)
... View more
Labels:
05-02-2016
05:22 PM
3 Kudos
When adding a net new data source to Metron, the first step is to decide how to push the events from the new telemetry data source into Metron. You can use a number of data collection tools and that decision is decoupled from Metron. However, we recommend evaluating Apache Nifi as it is an excellent tool to do just that (this article uses Nifi to push data into Metron). The second step is to configure Metron to parse the telemetry data source so that downstream processing can be done on it. In this article we will walk you through how to perform both of these steps.
In the previous article of this blog series, we described the following set of requirements for Customer Foo who wanted to add the Squid telemetry data source Into Metron.
The proxy events from Squid logs need to be ingested in real-time.
The proxy logs must be parsed into a standardized JSON structure that Metron can understand.
In real-time, the squid proxy event must be enriched so that the domain names are enriched with the IP information.
In real-time, the IP within the proxy event must be checked for threat intel feeds.
If there is a threat intel hit, an alert needs to be raised.
The end user must be able to see the new telemetry events and the alerts from the new data source.
All of these requirements will need to be implemented easily without writing any new Java code.
In this article, we will walk you through how to perform steps 1, 2, and 6.
How to Parse the Squid Telemetry Data Source to Metron
The following steps guide you through how to add this new telemetry.
Step 1: Spin Up Single Node Vagrant VM
Download the code from https://github.com/apache/incubator-metron/archive/codelab-v1.0.tar.gz.
untar the file ( tar -zxvf incubator-metron-codelab-v1.0.tar.gz).
Navigate to the metron-platform directory and build the package: incubator-metron-codelab-v1.0/metron-platform and build it (mvn clean package -DskipTests=true)
Navigate to the codelab-platform directory: incubator-metron-codelab-v1.0/metron-deployment/vagrant/codelab-platform/
Follow the instructions here: https://github.com/apache/incubator-metron/tree/codelab-v1.0/metron-deployment/vagrant/codelab-platform. Note: The Metron Development Image is named launch_image.sh not launch_dev_image.sh.
Step 2: Create a Kafka Topic for the New Data Source
Every data source whose events you are streaming into Metron must have its own Kafka topic. The ingestion tool of choice (for example, Apache Nifi) will push events into this Kafka topic.
ssh to your VM
vagrant ssh
Create a Kafka topic called "squid" in the directory /usr/hdp/current/kafka-broker/bin/:
cd /usr/hdp/current/kafka-broker/bin/
./kafka-topics.sh --zookeeper localhost:2181 --create --topic squid --partitions 1 --replication-factor 1
List all of the Kafka topics to ensure that the new topic exists:
./kafka-topics.sh --zookeeper localhost:2181 --list
You should see the following list of Kafka topics:
bro
enrichment
pcap
snort
squid
yaf
Step 3: Install Squid
Install and start Squid:
sudo yum install squid
sudo service squid start
With Squid started, look at the the different log files that get created:
sudo su -
cd /var/log/squid
ls
You see that there are three types of logs available: access.log, cache.log, and squid.out. We are interested in access.log becasuse that is the log that records the proxy usage.
Initially the access.log is empty. Let's generate a few entries for the log, then list the new contents of the access.log. The "-h 127.0.0.1" indicates that the squidclient will only use the IPV4 interface.
squidclient -h 127.0.0.1 http://www.hostsite.com
squidclient -h 127.0.0.1 http://www.hostsite.com
cat /var/log/squid/access.log
In production environments you would configure your users web browsers to point to the proxy server, but for the sake of simplicity of this tutorial we will use the client that is packaged with the Squid installation. After we use the client to simulate proxy requests, the Squid log entries should look as follows:
1461576382.642 161 127.0.0.1 TCP_MISS/200 103701 GET http://www.hostsite.com/ - DIRECT/199.27.79.73 text/html
1461576442.228 159 127.0.0.1 TCP_MISS/200 137183 GET http://www.hostsite.com/ - DIRECT/66.210.41.9 text/html
Using the Squid log entries, we can determine the format of the log entires which is:
timestamp | time elapsed | remotehost | code/status | bytes | method | URL rfc931 peerstatus/peerhost | type
Step 4: Create a Grok Statement to Parse the Squid Telemetry Event
Now we are ready to tackle the Metron parsing topology setup.
The first thing we need to do is decide if we will be using the Java-based parser or the Grok-based parser for the new telemetry. In this example we will be using the Grok parser. Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check).
Next we need to define the Grok expression for our log. Refer to Grok documentation for additional details. In our case the pattern is:
WDOM [^(?:http:\/\/|www\.|https:\/\/)]([^\/]+) SQUID_DELIMITED %{NUMBER:timestamp} %{SPACE:UNWANTED} %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} http:\/\/\www.%{WDOM:url}\/ - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}
Notice the WDOM pattern (that is more tailored to Squid instead of using the generic Grok URL pattern) before defining the Squid log pattern. This is optional and is done for ease of use. Also, notice that we apply the UNWANTED tag for any part of the message that we don't want included in our resulting JSON structure. Finally, notice that we applied the naming convention to the IPV4 field by referencing the following list of field conventions.
The last thing we need to do is to validate the Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:
Now that the Grok pattern has been defined, we need to save it and move it to HDFS. Create a files called "squid" in the tmp directory and copy the Grok pattern into the file.
touch /tmp/squid
vi /tmp/squid
//copy the grok pattern above to the squid file
Now put the squid file into the directory where Metron stores its Grok parsers. Existing Grok parsers that ship with Metron are staged under /apps/metron/patterns/.
su - hdfs
hdfs dfs -put /tmp/squid /apps/metron/patterns/
exit
Step 5: Create a Flux configuration for the new Squid Storm Parser Topology
Now that the Grok pattern is staged in HDFS we need to define Storm Flux configuration for the Metron Parsing Topology. The configs are staged under /usr/metron/0.1BETA/config/topologies/ and each parsing topology has it's own set of configs. Each directory for a topology has a remote.yaml which is designed to be run on AWS and local/test.yaml designed to run locally on a single-node VM. Since we are going to be running locally on a VM we need to define a test.yaml for Squid. The easiest way to do this is to copy one of the existing Grok-based configs (YAF) and tailor it for Squid.
mkdir /usr/metron/0.1BETA/flux/squid
cp /usr/metron/0.1BETA/flux/yaf/remote.yaml /usr/metron/0.1BETA/flux/squid/remote.yaml
vi /usr/metron/0.1BETA/flux/squid/remote.yaml
And edit your config to look like this (replaced yaf with squid and replace the constructorArgs section ):
name: "squid"
config:
topology.workers: 1
components:
- id: "parser"
className: "org.apache.metron.parsers.GrokParser"
constructorArgs:
- "/apps/metron/patterns/squid"
- "SQUID_DELIMITED"
configMethods:
- name: "withTimestampField"
args:
- "timestamp"
- id: "writer"
className: "org.apache.metron.parsers.writer.KafkaWriter"
constructorArgs:
- "${kafka.broker}"
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "${kafka.zk}"
- id: "kafkaConfig"
className: "storm.kafka.SpoutConfig"
constructorArgs:
# zookeeper hosts
- ref: "zkHosts"
# topic name
- "squid"
# zk root
- ""
# id
- "squid"
properties:
- name: "ignoreZkOffsets"
value: true
- name: "startOffsetTime"
value: -1
- name: "socketTimeoutMs"
value: 1000000
spouts:
- id: "kafkaSpout"
className: "storm.kafka.KafkaSpout"
constructorArgs:
- ref: "kafkaConfig"
bolts:
- id: "parserBolt"
className: "org.apache.metron.parsers.bolt.ParserBolt"
constructorArgs:
- "${kafka.zk}"
- "squid"
- ref: "parser"
- ref: "writer"
streams:
- name: "spout -> bolt"
from: "kafkaSpout"
to: "parserBolt"
grouping:
type: SHUFFLE
Step 6: Deploy the new Parser Topology
Now that we have the Squid parser topology defined, lets deploy it to our cluster.
Deploy the new squid paser topology:
sudo storm jar /usr/metron/0.1BETA/lib/metron-parsers-0.1BETA.jar org.apache.storm.flux.Flux --filter /usr/metron/0.1BETA/config/elasticsearch.properties --remote /usr/metron/0.1BETA/flux/squid/remote.yaml
If you currently have four topologies in Storm, you need to kill one to make a worker available for Squid. To do this, from the Storm UI, click the name of the topology you want to kill in the Topology Summary section, then click Kill under Topology Actions. Storm will kill the topology and make a worker available for Squid.
Go to the Storm UI and you should now see new "squid" topology and ensure that the topology has no errors
This squid processor topology will ingest from the squid Kafka topic we created earlier and then parse the event with Metron's Grok framework using the grok pattern we defined earlier. The result of the parsing is a standard JSON Metron structure that then gets put on the "enrichment" Kafka topic for further processing.
But how does the squid events in the access.log get put into the "squid" Kafka topic such at the Parser topology can parse it? We will do that using Apache Nifi.
Using Apache Nifi to Stream data into Metron
Put simply NiFi was built to automate the flow of data between systems. Hence it is a fantastic tool to collect, ingest and push data to Metron. The below instructions on how to install configure and create the nifi flow to push squid events into Metron.
Install, Configure and and Start Apache Nifi
The following shows how to install Nifi on the VM. Do the following as root:
Download Nifi:
cd /usr/lib
wget http://public-repo-1.hortonworks.com/HDF/centos6/1.x/updates/1.2.0.0/HDF-1.2.0.0-91.tar.gz
tar -zxvf HDF-1.2.0.0-91.tar.gz
Edit Nifi Configuration to update the port of the nifi web app: nifi.web.http.port=8089
cd HDF-1.2.0.0/nifi
vi conf/nifi.properties
//update nifi.web.http.port to 8089
Install Nifi as service
bin/nifi.sh install nifi
Start the Nifi Service
service nifi start
Go to the Nifi Web: http://node1:8089/nifi/
Create a Nifi Flow to stream events to Metron
Now we will create a flow to capture events from squid and push them into metron
Drag a processor to the canvas (do this by the dragging the processor icon..first icon)
Search for TailFile processor and select Add. Right click on the processor and configure. In settings tab change the name to "Ingest Squid Events"
In properties, configure the following like the following:
Drag Another Processor the canvas
Search for PutKafka and select Add
Right click on the processor and configure. In Settings, change names to "Stream to Metron” click the checkbox for failure and success for relationship.
Under properties, set 3 properties
Known Brokers: node1:6667
Topic Name: squid
Client Name: nifi-squid
Create a connection by dragging the arrow from Ingest Squid Events to Stream to Metron
Select the entire Flow and click the play button (play button). you should see all processors green like the below:
Generate some data using squidclient (do this for about 20+ sites)
squidclient http://www.hostsite.com
You should see metrics on the processor of data being pushed into Metron.
Look at the Storm UI for the parser topology and you should see tuples coming in
After about 5 minutes, you should see a new Elastic Search index called squid_index* in the Elastic Admin UI
Verify Events are Indexed
By convention the index where the new messages will be indexed is called squid_index_[timestamp] and the document type is squid_doc.
In order to verify that the messages were indexed correctly, we can use the elastic search Head plugin.
Install the head plugin:
/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head/1.x
You should see the message: Installed mobz/elasticsearch-head/1.x into /usr/share/elasticsearch/plugins/head
2. Navigate to elastic head UI: http://node1:9200/_plugin/head/
3. Click on Browser tab and select squid doc on the left panel and then select one of the sample docs. You should see something like the following:
Configure Metron UI to view the Squid Telemetry Events
Now that we have Metron configured to parse, index and persist telemetry events and Nifi pushing data to Metron, lets now visualize this streaming telemetry data in the Metron UI.
Go to the Metron UI.
Add a New Pinned query
Click the + to add new pinned query
Create a query: _type: squid_doc
Click the colored circle icon, name the saved query and click Pin. See below
Add a new histogram panel for the Squid events
Click the add add panel + icon
Select histogram panel type
Set title as “Squid Events”
Change Time Field to: timestamp
Configure span to 12
In the queries dropdown select “Selected” and only select the “Squid Events” pinned query
Click Save and should see data in the histogram
You should now see the new Squid events
What Next?
The next article in the series covers Enriching Telemetry Data.
... View more
Labels:
05-02-2016
05:22 PM
3 Kudos
One of the key design principles of Apache Metron is that it should be easily extensible. We envision many users using Metron as a platform and building custom capabilities on top of it; one of which will be to add new telemetry data sources. In this multi-part article series, we will walk you through how to add a new data telemetry data source: Squid proxy logs. This multi-part article series consists of the following:
This Article: Sets up the use case for this multi-part article series Use Case 1: Collecting and Parsing Telemetry Events - This tutorial walks you through how to collect/ingest events into Metron and then parse them. Use Case 2: Enriching Telemetry Data - Describes how to enrich elements of telemetry events with Apache Metron. Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds - Describes how to add new threat intel feeds to the system and how those feeds can be used to cross-reference every telemetry event that comes in. When a hit occurs, an alert will be generated and displayed on the Metron UI. Setting up the Use Case Scenario Customer Foo has installed Metron TP1 and they are using the out-of-the-box data sources (PCAP, YAF/Netflow, Snort, and Bro). They love Metron! But now they want to add a new data source to the platform: Squid proxy logs. Customer Foo's Requirements The following are the customer's requirements for Metron with respect to this new data source:
The proxy events from Squid logs need to be ingested in real-time. The proxy logs must be parsed into a standardized JSON structure that Metron can understand. In real-time, the Squid proxy event needs to be enriched so that the domain names are enriched with the IP information. In real-time, the IP within the proxy event must be checked for threat intel feeds. If there is a threat intel hit, an alert needs to be raised. The end user must be able to see the new telemetry events and the alerts from the new data source. All of these requirements will need to be implemented easily without writing any new Java code. What is Squid? Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. For more information on Squid see Squid-cache.org. How Metron Enriches a Squid Telemetry Event When you make an outbound http connection to https://www.cnn.com from a given host, the following entry is added to a Squid file called access.log. The following represents the magic that Metron will do to this telemetry event as it is streamed through the platform in real-time: Key Points Some key points to highlight as you go this multi-part article series
We will be adding a net new data source without writing any code. Metron strives for easy extensibility and this is a good example of it. This is a repeatable pattern for a majority of telemetry data sources. Read the next article, on how to collect and push data into Metron and then parse data in the Metron platform: Collecting and Parsing Telemetry Data.
... View more
Labels:
04-27-2016
09:22 AM
Good question @Matt McKnight. We will have support for Solr indexing services in Metron TP2 which is slated for end of May. However in TP2, we will still only support Metron UI that is based on Kibana (based on Elastic). This will change in subsequent reelases. So net net, by middle/end of May we will support Solr indexing but you would have to write the UI that calls the SOLR Apis for search queries. Farther down the line, we will provide a custom UI (away from Kibana) that uses SOLR to do search. Make sense?
... View more
04-13-2016
02:07 PM
Good feedback @Hakan Akansel. I updated the article to be more clear on where the event gets persisted.
... View more
04-12-2016
01:16 PM
I ran into the following error when following these instructions: 2016-04-12 05:42:59,328 p=2472 u=gvetticaden | fatal: [obfuscated_ip]: UNREACHABLE! => {"changed": false, "msg": "SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue", "unreachable": true} To fix this issue, see the following thread: https://community.hortonworks.com/questions/24344/aws-unreachable-error-when-executing-metron-instal.html
... View more
04-06-2016
01:33 AM
7 Kudos
Platform Theme Key Features Fully Automated Scripted Install of Metron on AWS One of the largest hurdles we have heard about from the community and customers working with the original OpenSoc code base was that it was nearly impossible to get the application up and running. Hence, our engineering team collaborated with the community to provide a scripted automated install of Metron on AWS. The install only requires the user’s AWS credentials, a set of ansible scripts/playbooks, and Ambari BluePrints / APIs and AWS APIs to deploy the full end to end Metron application. The below table summarizes the steps that occur during the automated install. Step Description Components Deployed Step 1 Spin up EC2 instances where HDP and Metron will be installed and deployed 10 m4.xlarge instances Step 2 Spin up an AWS VPC 1 AWS VPC Step 3 Install Ambari Server and Agents via Ansible Scripts Ambari Server 2.1.2.1 on master node Ambari Agents on slave nodes Step 4 Using Ambari Blueprints and APIS, install 7 Node HDP 2.3 Cluster with the following Services: HDFS, YARN, Zookeeper, Storm, Hbase, and Kafka. The blueprint used to deploy the HDP cluster can be found here: Metron Small Cluster Ambari BluePrint 7 Node HDP Cluster HDP Services: HDFS, YARN, Zookeeper, Storm, HBase & Kafka Step 5 Install 2 Node Elastic Search Cluster 2 Node ES 1.7 Cluster Step 6 Installation and Starting of the following data source probes: BRO, Snort, PCAP probe, YAF (netflow). This entails the following: Install and Start C++ PCAP Probe that captures PCAP data and pushed into Kafka Topic Install and Start YAF probe to capture netflow data Installation of BRO, Kafka Bro Plugin and starting these services Install and Start SNORT with community SNORT rules configured C++ PCAP Probe YAF/Netflow Probe BRO Server and Bro Kafka Plugin Snort Server Step 7 Deployment of 5 Metron Storm Topologies: 4 Parser Topologies for each Data Source supported (PCAP, Bro, YAF, SNORT) 1 Common Enrichment topology Install and Deployment of 5 Storm Topologies Step 8 Configuration of Kafka Topics and Hbase Tables Step 9 Install mySQL to store GeoIP enrichment data. The mySQL DB will be populated with GeoIP information from Maxmind Geolite Install of MySQL with GeoIP information Step 10 Installation of a Metron UI for the SOC Analyst and Investigator persona. Metron UI (Kibana Dashboard) Deployment Architecture After Install The installer will take about 60-90 minutes to execute fully. However, it could vary drastically based on how AWS is feeling during the execution. After the installer finishes, the deployment architecture of the app will look like the following. Metron Storm Topology Refactor / Re-Architecture Another area of focus for Metron TP1 was to address the following challenges with the old OpenSoc Topology architecture which were:
Code was extremely brittle Storm Topologies were designed without taking advantage of full parallelism Numerous“redundant” topologies Management of the app was difficult due to a number of complex topologies Very complex to add new Data Sources to the platform Very little unit and integration Testing Some key re-architecture and refactor work done in TP1 to address these challenges were the following:
Made the Metron code base simpler and easier to maintain by converting all Storm topologies to use flux configuration (declarative way to wire topologies together). Ability to to add new data source parsers without writing code using the Grok Framework parser. Enrichment, model and threat intel intel cross reference are now done in parallel as opposed to sequentially in the storm configuration Minimized the incremental costs of adding new topologies by having one common enrichment topology for all data sources All App configuration is stored in Zookeeper allowing one to manage app config at runtime without stopping the topology Improved code with new unit and integration test harness utilities Old OpenSoc Architecture In the Old OpenSoc Architecture, some key limitations were the following:
For every new data source, a new complex storm topology had to be added Each enrichment, threat intel reference and model execution was done sequentially No in-memory caching for enrichments or threat intel checks No Loader frameworks to load Enrichment or Threat Intel Stores The below diagram illustrates the old architecture. New Metron Architecture With the new Metron Architecture, the key changes are:
Adding a new data source means simply adding new normalizing/parser topology 1 common enrichment topology can be used for all data sources Using the Splitter/Joiner pattern, enrichments/models/threat intel execution is done in parallel Loader frameworks have been added to load the Enrichment and Threat Intel Stores Fast Cache has been added for enrichment and threat intel look ups The below diagram illustrates the new architecture. Telemetry Data Source Theme Key Features PCAP - Packet Capture PCAP represents the most granular data collected in Metron consisting of individual packets and frames. Metron uses a DPDK which provides a set of libraries and drivers for fast packet collection and processing. See the following for more details: Metron Packet Capture Probe Design YAF/Netflow Netflow data represents rolled up PCAP data up to the flow/session level, a summary of the sequence of packets between two machines up to the layer 4 protocol. If one doesn’t want to ingest PCAP due to space constraints and load exerted on infrastructure, then netflow is recommended. Metron uses YAF (Yet Another Flowmeter) to generate IPFIX (Netflow) data from Metrons PCAP robe. Hence the output of the the YAF probe is IPFIX instead of the raw packets. See the following for more details: Metron YAF Capture Design Bro Bro is an IDS (Intrusion Detection System) but Metron uses Bro primarily as a Deep Packet Inspection (DPI) metadata generator.The metadata consists of network activity details up to layer 7 which is application level protocol (DNS, HTTP, FTP, SSH, SSL). Extracting DPI Metadata (layer 7 visibility) is expensive, and thus, is performed only on selected protocols. Hence, the recommendation is to turn on DPI for HTTP and DNS Protocols. Hence, while the PCAP probe records every single packet it sees on the wire, the DPI metadata is extracted only for a subset of these packets. This metadata is one of the most valuable network data for analytics. See the following for more details: Metron Bro Capture Design Snort Snort is a popular Network Intrusion Prevention System (NIPS). Snort monitors network traffic and produces alerts that are generated based on signatures from community rules. Metron plays the output of the packet capture probe to Snort and whenever Snort alerts are triggered Metron uses Apache Flume to pipe these alerts to a Kafka topic. See the following for more details: Metron Snort Capture Design Why are these Network Telemetry Sources Important? A common question is why we focused first on these initial set of network telemetry data sources. Keep in mind that the end vision of Apache Metron is to be an analytics platform. These 4 network telemetry data sources are some of the key data sources required for some of the next generation ML, MLP and statistical models that we are planning to build in future releases. The below table describes some of these models and the data input requirements. Analytics Pack Analytics Pack Description Telemetry Data Source Required Domain Pack A collection of Machine Learning models that identify anomalies for incoming and outgoing connections made to a specific domain that appear to be malicious Bro UEBA Pack A collection of Machine Learning models that monitor assets and users known to belegitimate to identify anomalies from their normal behavior. Bro User Enrichment Asset Enrichment User Auth Logs Asset Inventory Logs Relevancy/Correlation Engine Pack A collection of Machine Learning models that identify alerts that are related within the massive volumes of alerts being processed by the cyber solutions. Snort Surracata Third Party Alerts Protocol Anomaly Pack A collection of Machine Learning models that identifies if there anything unusual about network traffic monitored via deep packet inspection (PCAP) PCAP YAF/Netflow Bro The system is configurable so that one can enable only the data sources of interest. In future Metron tech previews, we will be adding support for these types of security data sources:
FireEye Palo Alto Network Active Directory BlueCoat SourceFire Bit9 CarbonBlack Lancope Cisco ISE Real-time Data Processing Theme Key Features Enrichment Services The below diagram illustrates the Enrichment framework that was built in Metron TP1. The key components of the framework are:
Enrichment Loader Framework - A framework that bulk loads or polls data from an enrichment source. The framework supports plugging in any enrichment source Enrichment Store - The Store where all enrichment data is stored. HBase will be the primary store. The store will also provide services to de-dup and age data. Enrichment Bolt - A Storm Bolt that enriches metron telemetry events Enrichment Cache - Cache used by the bolt so that look ups to the enrichment store is cache The specific enrichments supported in Metron TP1 is below.
Enrichment Description Enrichment Source, Store, Loader Type, Refresh Rate Metron Message Field Name that will Enriched GeoIP Tags on GeoIP (lat-lon coordinates + City/State/Country) to any external IP address. This can be applied both to alerts as well as metadata telemetries to be able to map them to a geo location. Enrich Source: Maxmind Geolite Metron Store: MySQL (Will Use HBase in next TP) Loader Type: Bulk load from HDFS Refresh Rate: Every 3 months Src_ip, dest_ip Host Enriches IP with Host details Enrich Source: Enterprise Inventory/Asset Store Metron Store: HFDS Loader Type: Bulk load from HDFS dest_ip More details can be found here: Metron Enrichment Services Threat Intel Services The Threat Intel framework is very similar to the Enrichment framework. See below architecture diagram. The specific threat intel services supported in TP1 is below.
Threat Feed Feed Description Feed Format Refresh Rate Soltra Threat Intel Aggregator Stix/Taxii Poll every 5 minutes Hail a Taxi Repository of Open Source Cyber Threat Intellegence feeds in STIX format. Stix/Taxii Poll every 5 minutes More details can be found here: Metron Threat Intel Services
... View more
Labels:
04-06-2016
12:48 AM
9 Kudos
Metron TP1 Features The following are key capabilities available in Metron TP1 broken up across its four key functional themes. How do I get Started? You can spin up the Metron TP1 in two ways:
Ansible based Vagrant Single Node VM Install
This the best place to play with Metron First. Detailed instructions how to do the install can be found in the following HCC Article: Apache Metron TP 1 Install Instructions- Single Node Vagrant Deployment
Fully Automated 10 Node Ansible Based Install on AWS using Ambari Blueprints and AWS APIs
If you want a more realistic setup of the Metron app, use this approach. Keep in mind that this install will spin up 10 m4.xlarge EC2 instance by default Detailed instructions how to do the install can be found in the following HCC Article: Apache Metron - First Steps in the Cloud Where do I get Help? Hortonworks has created new Track called CyberSecurity in the Hortonworks Community Connection (HCC). The link to the this new track in HCC is the following: HCC CyberSecurity Track. Apache Metron committers are subscribed to this track and are constantly monitoring it for any questions the community has on TP1. When asking a question about Metron TP1, please select the “CyberSecurity” Track and add the following tags: “Metron” and “tech-preview”. Platform Theme Features of Metron TP1 The below is a summary of the key platform features added in TP1: Feature Related Apache Metron JIRAS Support for HDP 2.3 Refactor Metron Topologies for Performance, Easier Manageability & Supportability METRON-56 METRON-33 Fully Automated Install of Metron on AWS on multi-node HDP cluster via Ansible scripts, Ambari blueprints and APIs. METRON-59 METRON-77 METRON-76 METRON-69 METRON-63 METRON-61 METRON-43 METRON-2 Single Node Vagrant Support for Metron for Development METRON-21 Unit and Integration Testing Frameworks, Code Test Coverage METRON-82 METRON-58 METRON-37 METRON-28 Telemetry Data Source Theme Features of Metron TP1 Metron TP1 focus is network telemetry data sources as described below. They represent the most valuable granular data one can collect and perform next generation analytics on. The Key Data collection features for Metron TP1 are the following: Feature Related Apache Metron JIRAS PCAP Ingest Data Services - Performant C++ probe that captures network packet and streams them into Kafka and gets bulk loaded into Metron METRON-79 METRON-79 METRON-73 METRON-55 METRON-39 YAF/Netflow Ingest Data Services - Ingests netflow data into Metron METRON-67 METRON-60 Bro Ingest Data Services - Custom BRO plugin that pushes out DPI (Deep Packet Inspection) metadata into Metron METRON-25 METRON-73 METRON-64 Snort Ingest Data Services - Stream snort generated alerts via Flume into Metron METRON-57 Grok Framework - Ability to add new Data Sources to Metron without writing new Parsing Topologies. For each new data source, grok expression file can be provided to normalized into Metron Event. METRON-66 Real-time Data Processing Theme Features of Metron TP1 For this theme, the key features in Metron TP1 are the following: Feature Related Apache Metron JIRAS Enrichment Services - OOO support for GeoIP and Host enrichments, extensible framework to plug-in new enrichments, & management Utilities for Enrichment Data METRON-32 METRON-43 Threat Intel Services - Integration with Soltra (Threat Intel Aggregrator) and Hail a Taxii, management Utilities for Threat Intel (Streaming and Bulk Load, aging out of data) METRON-35 METRON-50 Alerting Services - Alerts can be fired via a snort event or intel threat feed hit Indexing Services - Support for indexing via ElasticSearch METRON-36 METRON-56 METRON-66 Storage Services - persisting all enrichment telemetry data in HDFS and or HBase METRON-62 METRON-22 UI Theme Features of Metron TP1 There was less focus on the UI Theme but Metron TP1 does provide the following new UI features: Feature Related Apache Metron JIRAS Metron Investigator IO Dashboard for the SOC Analyst and Investigator Personas built on top of Kibana METRON-72 METRON-77 METRON-81 Histogram Panels for each of the data sources (YAF, Bro, Snort, PCAP) METRON-60 METRON-52 PCAP panel allow you to search for and download PCAP files METRON-72 METRON-77 METRON-81 Ability to customize the Metron UI with different data sources and different panel types. METRON-72 METRON-77 METRON-81
... View more
Labels:
04-05-2016
11:04 PM
3 Kudos
Metron User Personas There are six user personas for Metron: Persona Name Description SOC Analyst Profile: Beginner, Junior-level analyst Tools Used: SIEM tools/dashboards, Security endpoint UIs, Email/Ticketing/Workflow Systems
Responsibilities: Monitor security SIEM tools, search/investigate breaches, malware, review alerts and determine to escalate as tickets or filter out, follow security playbooks, investigate script kiddie attacks. SOC Investigator Profile: More advanced SME in cybersecurity, Experienced security analyst, understands more advanced features of security tools, thorough understanding of networking and platform architecture (routers, switches, firewalls, security), Ability to dig through and understand various logs (Network, firewall, proxy, app, etc..)
Tools Used: SIEM/Security tools, Scripting languages, SQL, command line
Responsibilities: Investigate more complicated/escalated alerts, investigate breaches, Takes the necessary steps to remove/quarantine the malware, breach or infected system, hunter for malware attacks, investigate more complicated attacks like ADT (Advanced Persistent Threats) SOC Manager Profile: Experience managing teams, security practitioner that has moved into management.
Tools Used: Workflow Systems (e.g: Remedy, JIRA), Ticket/Alerting Systems
Responsibilities: Assigns Metron Cases to Analysts. Verifies “completed” metron cases. Forensic Investigator Profile: E-discovery experience with security background.
Tools Used: SIEM and e-discovery tools
Responsibilities: Collect evidence on breach/attack incident, prepare lawyer’s response to breach, Security Platform Operations Engineer Profile: Computer Science, developer, and/or Dev/Ops Background. Experience with Big Data technologies and supported distributed applications/systems
Tools Used: Security Tools (SIEM, endpoint solutions, UEBA solutions), provisioning, management and monitoring tooling, various programming languages, Big Data and distributing computing platforms.
Responsibilities: Helps vet different security tools before bringing them into the enterprise. Establishes best practices and reference architecture with respect to provisioning, management and use of the security tools/ configures the system with respect to deployment/monitoring/etc. Maintains the probes to collect data, enrichment services, loading enrichment data, managing threat feeds, etc..Provides care and feeding of one or more point security solutions. Does capacity planning, system maintenance and upgrades. Security Data Scientist Profile: Computer Science / Math Background, security domain experience, dig through as much data as available and looks for patterns and build models
Tools Used: Python (scikit learn, Python Notebook), R, Rstudio, SAS, Jupyter, Spark (SparkML)
Responsibilities: Work with security data performing data munging, visualization, plotting, exploration, feature engineering and generation, trains, evaluates and scores models Why Metron? SOC Analyst & Investigator Perspective The above diagram illustrates the key steps in a typical analyst/investigator workflow. For certain steps in this workflow, Apache Metron provides keys capabilities not found in traditional security tools: Looking through Alerts
Centralized Alerts Console - Having a centralized dashboard for alerts and the telemetry events associated with the alert across all security data sources in your enterprise is a powerful feature within Metron that prevents the Analyst from jumping from one console to another. Meta Alerts - The long term vision of Metron is to provide a suite of analytical models and packs including Alerts Relevancy Engine and Meta-Alerts. Meta Alerts are generated by groupings or analytics models and provide a mechanism to shield the end user from being inundated with 1000s of granular alerts. Alerts labeled with threat intel data - Viewing alerts labeled with threat intel from third party feeds allows the analyst to decipher more quickly which alerts are legitimate vs false positives. Collecting Contextual data
Fully enriched messages - Analyst spend a lot of time manually enriching the raw alerts or events. With Metron, analysts work with the fully enriched message. Single Pane of Glass UI - Single pane of glass that not only has all alerts across different security data sources but also the same view that provides the enriched data Centralized real-time search - All alerts and telemetry events are indexed in real-time. Hence, the analyst has immediate access to search for all events. All logs in one place - All events with the enrichments and labels are stored in a single repository. Investigate
Granular access to PCAP - After identifying a legitimate threat, more advanced SOC investigators want the ability to download the raw packet data that caused the alert. Metron provides this capability. Replay old PCAP against new signatures - Metron can be configured to store raw pcap data in Hadoop for a configurable period of time. This corpus of pcap data can then be replayed to test new analytical models and new signatures. Tag Behavior for modeling by data scientists Raw messages used as evidentiary store Asset inventory and User Identity as enrichment sources. Note that the above 3 steps in the analyst workflow make up approximately 70% of the time. Metron will drastically decrease the analyst workflow time spend because everything the SOC analyst needs to know is in a single place. Why Metron? Data Scientist Perspective The above diagram illustrates the key steps in a typical data science workflow. For certain steps in this workflow, Apache Metron provides key capabilities not found in traditional security tools: Finding the data
All my data is in the same place - One of the biggest challenges faced by security data scientists is to find the data required to train and evaluate the score models. Metron provides a single repository where the enterprise’s security telemetry data are stored. Data exposed through a variety of APIs - The Metron security vault/repository provides different engines to access and work with the data including SQL, scripting languages, in-memory, java, scala, key-value columnar, REST APIs, User Portals, etc.. Standard Access Control Policies - All data stored in the Metron security vault is secured via Apache Ranger through access policies at a file system level (HDFS) and at processing engine level (Spark, Hive, HBase, Solr, etc..) Cleaning the data Metron normalizes telemetry events - As discussed in the first blog where we traced an event being processed by the platform, Metron normalizes all telemetry data into at least a standard 7 tuple json structure allowing data scientists to find and correlate data together more easily. Partial schema validation on ingest - Metron framework will validate data on ingest and will filter out bad data automatically which is something that data scientists, traditionally, spend a lot time doing. Munging Data Automatic data enrichment - Typically data scientists have to manually enrich data to create and test features or have to work with the data/platform team to do so. With Metron, events are enriched in real-time as it comes in and the enriched event is stored in the Metron security vault. Automatic application of class labels - Different types of metadata (threat intel information, etc…) is tagged on to the event which allows the data scientists to create feature matrixes for models more easily. Massively parallel computation framework - All the cleaning and munging of the data is using distributed technologies that allows the processing of these high velocity/ large volumes to be performant and scalable. Visualizing Data Real-time search + UI - Metron indexes all events and alerts and provides UI dashboard to perform real-time search. Apache Zeppelin Dashboards - Out of the box Zeppelin dashboards will be available that can be used by SOC analysts. With Zeppelin you can share the dashboards, substitute variables, and can quickly change graph types. An example of a dashboard would be to show all HTTP calls that resulted in 404 errors, visualized as a bar graph ordered by the number of failures. Integration with Jupyter - Jupyter notebooks will be provided to data scientists for common tasks such as exploration, visualization, plotting, evaluating features, etc.. Note that the above 4 steps in the data science workflow make up approximately 80% of the time. Metron will drastically reduce the time from hypothesis to model for the data scientist. Apache Metron Core Functional Themes Now that we have understanding of Metron’s user personas, we will now describe the four core functional themes that Metron will focus on. As the community around Metron continues to group, new features and enhancements will be prioritized across these four themes. The 4 core functional themes are the following: Apache Metron Release 0.1 and its Target Personas and Themes Over the last 4 months, the community led by Hortonworks, has been hard at work on Apache Metron’s first release (Metron 0.1) Now that we have described the User Personas and core themes for Metron, the following depicts where the engineering focus has been for Metron 0.1. As the diagram above illustrates, the key focus areas for Metron 0.1 are the following:
The Platform theme was the primary focus.. Before we can focus on the UI and supporting more telemetry data sources, we need to ensure that the platform is rock hard. This means ensuring an easy way to provision this very complex app and refactor/re-architecture work to ensure code is simpler and easier to maintain, adding new data sources in a declarative manner, performance and extensible improvements and improving the quality of the code. The persona of focus is the Security Platform Engineer. Metron 0.1 offers dashboard views for the SOC Analyst and SOC investigator.
... View more
Labels: