Member since
08-31-2015
81
Posts
115
Kudos Received
17
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3042 | 03-22-2017 03:51 PM | |
1860 | 05-04-2016 09:34 AM | |
1440 | 03-24-2016 03:07 PM | |
1603 | 03-24-2016 02:54 PM | |
1535 | 03-24-2016 02:47 PM |
03-22-2017
04:19 PM
@Satish Duggana can you please help answer this questions
... View more
03-22-2017
03:51 PM
One other tip. If you want to see what jars/classes are being used for each of the processors in SAM. Select Settings --> Component Definition Select Edit under Actions for the processor you are interested in. You will then see the details of the processor...
... View more
03-22-2017
03:28 PM
1 Kudo
Hi @Eric Brosch SAM (formerly known as StreamLine) uses Storm PMML Bolt Storm integration (https://github.com/apache/storm/tree/d5acec9e3b9473a0e8cf39c7e12393626a3ca426/external/storm-pmml) which uses JPMML evaluator (https://github.com/jpmml/jpmml) @Sriharsha Chintalapani
... View more
01-16-2017
03:23 PM
@Satish Duggana --> Thoughts?
... View more
05-05-2016
11:48 PM
2 Kudos
In previous article of the sereies, Enriching Telemetry Events, we walked through how to enrich a domain element of a given telemetry event with WhoIs data like home country, company associated with domain, etc. In this article, we will enrich with a special type of data called threat intel feeds. When a given telemetry event matches data in a threat Intel feed, an alert is generated. Again, the customers requirement are the following:
The proxy events from Squid logs needs to ingested in real-time. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand. In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information In real-time, the IP with in the proxy event must be checked against for threat intel feeds. If there is a threat intel hit, an alert needs to be raised. The end user must be able to see the new telemetry events and the alerts from the new data source. All of this requirements will need to be implemented easily without writing any new java code. In this article, we will walk you through how to do 4 and 5. Threat Intel Framework Explained Metron currently provides an extensible framework to plug in threat intel sources. Each threat intel source has two components: an enrichment data source and and enrichment bolt. The threat intelligence feeds are bulk loaded and streamed into a threat intelligence store similarly to how the enrichment feeds are loaded. The keys are loaded in a key-value format. The key is the indicator and the value is the JSON formatted description of what the indicator is. It is recommended to use a threat feed aggregator such as Soltra to dedup and normalize the feeds via Stix/Taxii. Metron provides an adapter that is able to read Soltra-produced Stix/Taxii feeds and stream them into Hbase, which is the data store of choice to back high speed threat intel lookups of Metron. Metron additionally provides a flat file and Stix bulk loader that can normalize, dedup, and bulk load or stream threat intel data into Hbase even without the use of a threat feed aggregator. The below diagram illustrates the architecture: Step 1: Threat Intel Feed Source Metron is designed to work with Stix/Taxii threat feeds, but can also be bulk loaded with threat data from a CSV file. In this example we will explore the CSV example. The same loader framework that is used for enrichment here is used for threat intelligence. Similarly to enrichments we need to setup a data.csv file, the extractor config JSON and the enrichment config JSON. For this example we will be using a Zeus malware tracker list located here: https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist. Copy the data form the above link into a file called domainblocklist.txt on your VM. Run the following command to parse the above file to a csv file called domainblocklist.csv cat domainblocklist.txt | grep -v "^#" | grep -v "^$" | grep -v "^https" | awk '{print $1",abuse.ch”}' > domainblocklist.csv Now that we have the "Threat Intel Feed Source" , we need to now configure an extractor config file that describes the the source. Create a file called extractor_config_temp.json and put the following contents in it. {
"config" : {
"columns" : {
"domain" : 0
,"source" : 1
}
,"indicator_column" : "domain"
,"type" : "zeusList"
,"separator" : ","
}
,"extractor" : "CSV"
}
Run the following to remove the non-ascii characters we run the following: iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
Step 2: Configure Element to Threat Intel Feed Mapping We now have to configure what element of a tuple and what threat intel feed to cross-reference with.This configuration will be stored in zookeeper. The config looks like the following: {
"zkQuorum" : "node1:2181"
,"sensorToFieldList" : {
"bro" : {
"type" : "THREAT_INTEL"
,"fieldToEnrichmentTypes" : {
"url" : [ "zeusList" ]
}
}
}
}
Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json
iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json Step 3: Run the Threat Intel Loader Now that we have the threat intel source and threat intel config defined, we can now run the loader to move the data from the threat intel source to the Metron threat intel Store and store the enrichment config in zookeeper. /usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i abuse.csv -t threatintel -c t -e extractor_config.json
After this, the threat intel data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called threatintel. To verify that the logs were properly ingested into Hbase run the following command: hbase shell
scan 'threatintel'
You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated: /usr/metron/0.1BETA/bin/zk_load_configs.sh -z localhost:2181 Generate some data by using the squid client to execute http requests (do this about 20 times) squidclient http://www.alamman.com
squidclient http://www.atmape.ru
View the Threat Alerts in Metron UI
When the logs are ingested we get messages that has a hit against threat intel: Notice a couple of characteristics about this message. It has is_alert=true, which designates it as an alert message. Now that we have alerts coming through we need to visualize them in Kibana. First, we need to setup a pinned query to look for messages where is_alert=true: And then once we point the alerts table to this pinned query it looks like this:
... View more
Labels:
05-04-2016
09:34 AM
1 Kudo
@Ryan Cicak good question. @drussell is correct. For Metron TP1, to prevent from using another m4.xlarge ec2 instance and give more resources to the other services, we chose only to use 1 zookeeper. But in production, we would have a minimum of at least 3 for the quorum. Given that we are going to be using zookeeper for other services managing metron's own configs (enrichment config, theat intel config, etc..) and in the future support for SOLR will require Zookeeper, possibly more than 3 will be required.
... View more
05-02-2016
05:22 PM
1 Kudo
In previous article of the sereies, Adding a New Telemetry Data Source to Apache Metron, we walked through how to add a new data source squid to Apache Metron. The inevitable next question is how I can enrich the telemetry events in real-time as it flows through the platform. Enrichment is critical when identifying threats or as we like to call it "finding the needle in the haystack". The customers requirement are the following
The proxy events from Squid logs needs to ingested in real-time. The proxy logs has to be parsed into a standardized JSON structure that Metron can understand. In real-time, the squid proxy event needs to be enriched so that the domain named are enriched with the IP information In real-time, the IP with in the proxy event must be checked against for threat intel feeds. If there is a threat intel hit, an alert needs to be raised The end user must be able to see the new telemetry events and the alerts from the new data source. All of this requirements will need to be implemented easily without writing any new java code. In this article, we will walk you through how to do 3. Metron Enrichment Framework Explained Step 1: Enrichment Source Whois data is expensive so we will not be providing it. Instead we wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows: google.com, "Google Inc.", "US", "Dns Admin",874306800000
work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000
capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000
cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000
cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000
news.com, "CBS Interactive Inc.", "US", "Domain Admin",833353200000
nba.com, "NBA Media Ventures, LLC", "US", "C/O Domain Administrator",786027600000
espn.com, "ESPN, Inc.", "US", "ESPN, Inc.",781268400000
pravda.com, "Internet Invest, Ltd. dba Imena.ua", "UA", "Whois privacy protection service",806583600000
hortonworks.com, "Hortonworks, Inc.", "US", "Domain Administrator",1303427404000
microsoft.com, "Microsoft Corporation", "US", "Domain Administrator",673156800000
yahoo.com, "Yahoo! Inc.", "US", "Domain Administrator",790416000000
rackspace.com, "Rackspace US, Inc.", "US", "Domain Admin",903092400000
Cut and paste this data into a file called "whois_ref.csv" on your virtual machine. This csv file represents our enrichment source The schema of this enrichment source is domain|owner|registeredCountry|registeredTimestamp. Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a pull pointer exception. We need to now configure an extractor config file that describes the enrichment source. {
"config" : {
"columns" : {
"domain" : 0
,"owner" : 1
,"home_country" : 2
,"registrar": 3
,"domain_created_timestamp": 4
}
,"indicator_column" : "domain"
,"type" : "whois"
,"separator" : ","
}
,"extractor" : "CSV"
}
Please cut and paste this file into a file called "extractor_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
Step 2: Configure Element to Enrichment Mapping We now have to configure what element of a tuple should be enriched with what enrichment type. This configuration will be stored in zookeeper. The config looks like the following: {
"zkQuorum" : "node1:2181"
,"sensorToFieldList" : {
"squid" : {
"type" : "ENRICHMENT"
,"fieldToEnrichmentTypes" : {
"url" : [ "whois" ]
}
}
}
}
Cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine. Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run iconv -c -f utf-8 -t ascii enrichment_config_temp.json -o enrichment_config.json Step 3: Run the Enrichment Loader Now that we have the enrichment source and enrichment config defined, we can now run the loader to move the data from the enrichment source to the Metron enrichment Store and store the enrichment config in zookeeper. /usr/metron/0.1BETA/bin/flatfile_loader.sh -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e extractor_config.json
After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established. The data will be populated into Hbase table called enrichment. To verify that the logs were properly ingested into Hbase run the following command: hbase shell
scan 'enrichment'
You should see the table bulk loaded with data from the CSV file. Now check if Zookeeper enrichment tag was properly populated: /usr/metron/0.1BETA/bin/zk_load_configs.sh -z localhost:2181 Generate some data by using the squid client to execute http requests (do this about 20 times) squidclient http://www.cnn.com View the Enrichment Telemetry Events in Metron UI In order to demonstrate the enrichment capabilities of Metron you need to drop all existing indexes for Squid where the data was ingested prior to enrichments being enabled. To do so go back to the head plugin and deleted the indexes like so: Make sure you delete all Squid indexes. Re-ingest the data (see previous blog post) and the messages should be automatically enriched. In the Metron-UI, refresh the dashboard and view the data in the Squid Panel in the dashboard: Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country)
... View more
Labels:
05-02-2016
05:22 PM
3 Kudos
When adding a net new data source to Metron, the first step is to decide how to push the events from the new telemetry data source into Metron. You can use a number of data collection tools and that decision is decoupled from Metron. However, we recommend evaluating Apache Nifi as it is an excellent tool to do just that (this article uses Nifi to push data into Metron). The second step is to configure Metron to parse the telemetry data source so that downstream processing can be done on it. In this article we will walk you through how to perform both of these steps.
In the previous article of this blog series, we described the following set of requirements for Customer Foo who wanted to add the Squid telemetry data source Into Metron.
The proxy events from Squid logs need to be ingested in real-time.
The proxy logs must be parsed into a standardized JSON structure that Metron can understand.
In real-time, the squid proxy event must be enriched so that the domain names are enriched with the IP information.
In real-time, the IP within the proxy event must be checked for threat intel feeds.
If there is a threat intel hit, an alert needs to be raised.
The end user must be able to see the new telemetry events and the alerts from the new data source.
All of these requirements will need to be implemented easily without writing any new Java code.
In this article, we will walk you through how to perform steps 1, 2, and 6.
How to Parse the Squid Telemetry Data Source to Metron
The following steps guide you through how to add this new telemetry.
Step 1: Spin Up Single Node Vagrant VM
Download the code from https://github.com/apache/incubator-metron/archive/codelab-v1.0.tar.gz.
untar the file ( tar -zxvf incubator-metron-codelab-v1.0.tar.gz).
Navigate to the metron-platform directory and build the package: incubator-metron-codelab-v1.0/metron-platform and build it (mvn clean package -DskipTests=true)
Navigate to the codelab-platform directory: incubator-metron-codelab-v1.0/metron-deployment/vagrant/codelab-platform/
Follow the instructions here: https://github.com/apache/incubator-metron/tree/codelab-v1.0/metron-deployment/vagrant/codelab-platform. Note: The Metron Development Image is named launch_image.sh not launch_dev_image.sh.
Step 2: Create a Kafka Topic for the New Data Source
Every data source whose events you are streaming into Metron must have its own Kafka topic. The ingestion tool of choice (for example, Apache Nifi) will push events into this Kafka topic.
ssh to your VM
vagrant ssh
Create a Kafka topic called "squid" in the directory /usr/hdp/current/kafka-broker/bin/:
cd /usr/hdp/current/kafka-broker/bin/
./kafka-topics.sh --zookeeper localhost:2181 --create --topic squid --partitions 1 --replication-factor 1
List all of the Kafka topics to ensure that the new topic exists:
./kafka-topics.sh --zookeeper localhost:2181 --list
You should see the following list of Kafka topics:
bro
enrichment
pcap
snort
squid
yaf
Step 3: Install Squid
Install and start Squid:
sudo yum install squid
sudo service squid start
With Squid started, look at the the different log files that get created:
sudo su -
cd /var/log/squid
ls
You see that there are three types of logs available: access.log, cache.log, and squid.out. We are interested in access.log becasuse that is the log that records the proxy usage.
Initially the access.log is empty. Let's generate a few entries for the log, then list the new contents of the access.log. The "-h 127.0.0.1" indicates that the squidclient will only use the IPV4 interface.
squidclient -h 127.0.0.1 http://www.hostsite.com
squidclient -h 127.0.0.1 http://www.hostsite.com
cat /var/log/squid/access.log
In production environments you would configure your users web browsers to point to the proxy server, but for the sake of simplicity of this tutorial we will use the client that is packaged with the Squid installation. After we use the client to simulate proxy requests, the Squid log entries should look as follows:
1461576382.642 161 127.0.0.1 TCP_MISS/200 103701 GET http://www.hostsite.com/ - DIRECT/199.27.79.73 text/html
1461576442.228 159 127.0.0.1 TCP_MISS/200 137183 GET http://www.hostsite.com/ - DIRECT/66.210.41.9 text/html
Using the Squid log entries, we can determine the format of the log entires which is:
timestamp | time elapsed | remotehost | code/status | bytes | method | URL rfc931 peerstatus/peerhost | type
Step 4: Create a Grok Statement to Parse the Squid Telemetry Event
Now we are ready to tackle the Metron parsing topology setup.
The first thing we need to do is decide if we will be using the Java-based parser or the Grok-based parser for the new telemetry. In this example we will be using the Grok parser. Grok parser is perfect for structured or semi-structured logs that are well understood (check) and telemetries with lower volumes of traffic (check).
Next we need to define the Grok expression for our log. Refer to Grok documentation for additional details. In our case the pattern is:
WDOM [^(?:http:\/\/|www\.|https:\/\/)]([^\/]+) SQUID_DELIMITED %{NUMBER:timestamp} %{SPACE:UNWANTED} %{INT:elapsed} %{IPV4:ip_src_addr} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} http:\/\/\www.%{WDOM:url}\/ - %{WORD:UNWANTED}\/%{IPV4:ip_dst_addr} %{WORD:UNWANTED}\/%{WORD:UNWANTED}
Notice the WDOM pattern (that is more tailored to Squid instead of using the generic Grok URL pattern) before defining the Squid log pattern. This is optional and is done for ease of use. Also, notice that we apply the UNWANTED tag for any part of the message that we don't want included in our resulting JSON structure. Finally, notice that we applied the naming convention to the IPV4 field by referencing the following list of field conventions.
The last thing we need to do is to validate the Grok pattern to make sure it's valid. For our test we will be using a free Grok validator called Grok Constructor. A validated Grok expression should look like this:
Now that the Grok pattern has been defined, we need to save it and move it to HDFS. Create a files called "squid" in the tmp directory and copy the Grok pattern into the file.
touch /tmp/squid
vi /tmp/squid
//copy the grok pattern above to the squid file
Now put the squid file into the directory where Metron stores its Grok parsers. Existing Grok parsers that ship with Metron are staged under /apps/metron/patterns/.
su - hdfs
hdfs dfs -put /tmp/squid /apps/metron/patterns/
exit
Step 5: Create a Flux configuration for the new Squid Storm Parser Topology
Now that the Grok pattern is staged in HDFS we need to define Storm Flux configuration for the Metron Parsing Topology. The configs are staged under /usr/metron/0.1BETA/config/topologies/ and each parsing topology has it's own set of configs. Each directory for a topology has a remote.yaml which is designed to be run on AWS and local/test.yaml designed to run locally on a single-node VM. Since we are going to be running locally on a VM we need to define a test.yaml for Squid. The easiest way to do this is to copy one of the existing Grok-based configs (YAF) and tailor it for Squid.
mkdir /usr/metron/0.1BETA/flux/squid
cp /usr/metron/0.1BETA/flux/yaf/remote.yaml /usr/metron/0.1BETA/flux/squid/remote.yaml
vi /usr/metron/0.1BETA/flux/squid/remote.yaml
And edit your config to look like this (replaced yaf with squid and replace the constructorArgs section ):
name: "squid"
config:
topology.workers: 1
components:
- id: "parser"
className: "org.apache.metron.parsers.GrokParser"
constructorArgs:
- "/apps/metron/patterns/squid"
- "SQUID_DELIMITED"
configMethods:
- name: "withTimestampField"
args:
- "timestamp"
- id: "writer"
className: "org.apache.metron.parsers.writer.KafkaWriter"
constructorArgs:
- "${kafka.broker}"
- id: "zkHosts"
className: "storm.kafka.ZkHosts"
constructorArgs:
- "${kafka.zk}"
- id: "kafkaConfig"
className: "storm.kafka.SpoutConfig"
constructorArgs:
# zookeeper hosts
- ref: "zkHosts"
# topic name
- "squid"
# zk root
- ""
# id
- "squid"
properties:
- name: "ignoreZkOffsets"
value: true
- name: "startOffsetTime"
value: -1
- name: "socketTimeoutMs"
value: 1000000
spouts:
- id: "kafkaSpout"
className: "storm.kafka.KafkaSpout"
constructorArgs:
- ref: "kafkaConfig"
bolts:
- id: "parserBolt"
className: "org.apache.metron.parsers.bolt.ParserBolt"
constructorArgs:
- "${kafka.zk}"
- "squid"
- ref: "parser"
- ref: "writer"
streams:
- name: "spout -> bolt"
from: "kafkaSpout"
to: "parserBolt"
grouping:
type: SHUFFLE
Step 6: Deploy the new Parser Topology
Now that we have the Squid parser topology defined, lets deploy it to our cluster.
Deploy the new squid paser topology:
sudo storm jar /usr/metron/0.1BETA/lib/metron-parsers-0.1BETA.jar org.apache.storm.flux.Flux --filter /usr/metron/0.1BETA/config/elasticsearch.properties --remote /usr/metron/0.1BETA/flux/squid/remote.yaml
If you currently have four topologies in Storm, you need to kill one to make a worker available for Squid. To do this, from the Storm UI, click the name of the topology you want to kill in the Topology Summary section, then click Kill under Topology Actions. Storm will kill the topology and make a worker available for Squid.
Go to the Storm UI and you should now see new "squid" topology and ensure that the topology has no errors
This squid processor topology will ingest from the squid Kafka topic we created earlier and then parse the event with Metron's Grok framework using the grok pattern we defined earlier. The result of the parsing is a standard JSON Metron structure that then gets put on the "enrichment" Kafka topic for further processing.
But how does the squid events in the access.log get put into the "squid" Kafka topic such at the Parser topology can parse it? We will do that using Apache Nifi.
Using Apache Nifi to Stream data into Metron
Put simply NiFi was built to automate the flow of data between systems. Hence it is a fantastic tool to collect, ingest and push data to Metron. The below instructions on how to install configure and create the nifi flow to push squid events into Metron.
Install, Configure and and Start Apache Nifi
The following shows how to install Nifi on the VM. Do the following as root:
Download Nifi:
cd /usr/lib
wget http://public-repo-1.hortonworks.com/HDF/centos6/1.x/updates/1.2.0.0/HDF-1.2.0.0-91.tar.gz
tar -zxvf HDF-1.2.0.0-91.tar.gz
Edit Nifi Configuration to update the port of the nifi web app: nifi.web.http.port=8089
cd HDF-1.2.0.0/nifi
vi conf/nifi.properties
//update nifi.web.http.port to 8089
Install Nifi as service
bin/nifi.sh install nifi
Start the Nifi Service
service nifi start
Go to the Nifi Web: http://node1:8089/nifi/
Create a Nifi Flow to stream events to Metron
Now we will create a flow to capture events from squid and push them into metron
Drag a processor to the canvas (do this by the dragging the processor icon..first icon)
Search for TailFile processor and select Add. Right click on the processor and configure. In settings tab change the name to "Ingest Squid Events"
In properties, configure the following like the following:
Drag Another Processor the canvas
Search for PutKafka and select Add
Right click on the processor and configure. In Settings, change names to "Stream to Metron” click the checkbox for failure and success for relationship.
Under properties, set 3 properties
Known Brokers: node1:6667
Topic Name: squid
Client Name: nifi-squid
Create a connection by dragging the arrow from Ingest Squid Events to Stream to Metron
Select the entire Flow and click the play button (play button). you should see all processors green like the below:
Generate some data using squidclient (do this for about 20+ sites)
squidclient http://www.hostsite.com
You should see metrics on the processor of data being pushed into Metron.
Look at the Storm UI for the parser topology and you should see tuples coming in
After about 5 minutes, you should see a new Elastic Search index called squid_index* in the Elastic Admin UI
Verify Events are Indexed
By convention the index where the new messages will be indexed is called squid_index_[timestamp] and the document type is squid_doc.
In order to verify that the messages were indexed correctly, we can use the elastic search Head plugin.
Install the head plugin:
/usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head/1.x
You should see the message: Installed mobz/elasticsearch-head/1.x into /usr/share/elasticsearch/plugins/head
2. Navigate to elastic head UI: http://node1:9200/_plugin/head/
3. Click on Browser tab and select squid doc on the left panel and then select one of the sample docs. You should see something like the following:
Configure Metron UI to view the Squid Telemetry Events
Now that we have Metron configured to parse, index and persist telemetry events and Nifi pushing data to Metron, lets now visualize this streaming telemetry data in the Metron UI.
Go to the Metron UI.
Add a New Pinned query
Click the + to add new pinned query
Create a query: _type: squid_doc
Click the colored circle icon, name the saved query and click Pin. See below
Add a new histogram panel for the Squid events
Click the add add panel + icon
Select histogram panel type
Set title as “Squid Events”
Change Time Field to: timestamp
Configure span to 12
In the queries dropdown select “Selected” and only select the “Squid Events” pinned query
Click Save and should see data in the histogram
You should now see the new Squid events
What Next?
The next article in the series covers Enriching Telemetry Data.
... View more
Labels:
05-02-2016
05:22 PM
3 Kudos
One of the key design principles of Apache Metron is that it should be easily extensible. We envision many users using Metron as a platform and building custom capabilities on top of it; one of which will be to add new telemetry data sources. In this multi-part article series, we will walk you through how to add a new data telemetry data source: Squid proxy logs. This multi-part article series consists of the following:
This Article: Sets up the use case for this multi-part article series Use Case 1: Collecting and Parsing Telemetry Events - This tutorial walks you through how to collect/ingest events into Metron and then parse them. Use Case 2: Enriching Telemetry Data - Describes how to enrich elements of telemetry events with Apache Metron. Use Case 3: Adding/Enriching/Validating with Threat Intel Feeds - Describes how to add new threat intel feeds to the system and how those feeds can be used to cross-reference every telemetry event that comes in. When a hit occurs, an alert will be generated and displayed on the Metron UI. Setting up the Use Case Scenario Customer Foo has installed Metron TP1 and they are using the out-of-the-box data sources (PCAP, YAF/Netflow, Snort, and Bro). They love Metron! But now they want to add a new data source to the platform: Squid proxy logs. Customer Foo's Requirements The following are the customer's requirements for Metron with respect to this new data source:
The proxy events from Squid logs need to be ingested in real-time. The proxy logs must be parsed into a standardized JSON structure that Metron can understand. In real-time, the Squid proxy event needs to be enriched so that the domain names are enriched with the IP information. In real-time, the IP within the proxy event must be checked for threat intel feeds. If there is a threat intel hit, an alert needs to be raised. The end user must be able to see the new telemetry events and the alerts from the new data source. All of these requirements will need to be implemented easily without writing any new Java code. What is Squid? Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. For more information on Squid see Squid-cache.org. How Metron Enriches a Squid Telemetry Event When you make an outbound http connection to https://www.cnn.com from a given host, the following entry is added to a Squid file called access.log. The following represents the magic that Metron will do to this telemetry event as it is streamed through the platform in real-time: Key Points Some key points to highlight as you go this multi-part article series
We will be adding a net new data source without writing any code. Metron strives for easy extensibility and this is a good example of it. This is a repeatable pattern for a majority of telemetry data sources. Read the next article, on how to collect and push data into Metron and then parse data in the Metron platform: Collecting and Parsing Telemetry Data.
... View more
Labels:
04-28-2016
06:29 PM
on my new mac book pro, i saw the same issue this morning. I followed casey's directions and it fixed it for me. Until we have a better fix, I have updated the documentation to instruction ansible 2.0.0.2 be installed): https://community.hortonworks.com/articles/24818/metron-tech-preview-1-install-instructions-on-sing.html
... View more