Community Articles

cstella · ‎03-14-2017

Introduction

Getting Apache Metron (Incubating) installed on an Ubuntu cluster has not yet been something that has gotten a lot of attention. Building on the work of Michael Miklavcic in setting up Elasticsearch and Kibana on Ubuntu, we can proceed to tackle how to install Metron on a working HDP Ubuntu cluster with Elasticsearch and Kibana. For maximum transparency, we will do this manually.

This article assumes the following to be installed and running:

HDP 2.5.0+ with
- HDFS
- HBase
- Storm
- Zookeeper
- Kafka
Elasticsearch 2.4
Kibana 4.5.3

We also assume that the access node that we are on has Oracle Java 8 on the classpath.

Install Metron

Preliminaries

The following steps should be done on an access node. This node should have the following installed:

The Hadoop client
The Storm client

I will assume that the user executing the following section

Has suitable permissions to write to /apps in HDFS
Has suitable permissions to start Storm topologies on the cluster
Has sudo access

In order to build Metron, we will need Maven. We will go ahead and install maven manually here in ~:

wget https://archive.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip
unzip -qq apache-maven-3.3.9-bin.zip
export M2_HOME=$PWD/apache-maven-3.3.9
export PATH=$M2_HOME/bin:$PATH

After this, you should see something like the following when running mvn -version:

root@u1401:~# mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00)
Maven home: /root/apache-maven-3.3.9
Java version: 1.8.0_121, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-oracle/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-24-generic", arch: "amd64", family: "unix"

Build Metron from Source and Install

I will be building Metron from git master here, but an official Metron apache release post 0.3.1 from http://metron.apache.org/documentation/#releases would work as well.

Install git and clone the Metron repository via

sudo apt-get install -y git
git clone https://github.com/apache/incubator-metron.git

This will create a directory called incubator-metron. From within that directory we can build Metron:

cd incubator-metron
mvn -q -T 2C -DskipTests -PHDP-2.5.0.0 install

This will take some time and please ignore the warnings (but not errors). The output will be a set of tar.gz files that we can decompress and expand into the core of our Metron installation:

Profiler
- Client : ./metron-analytics/metron-profiler-client/target/metron-profiler-client-0.3.1-archive.tar.gz
- Topology: ./metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz
Model as a Service: ./metron-analytics/metron-maas-service/target/metron-maas-service-0.3.1-archive.tar.gz
Metron Management Stellar Functions: ./metron-platform/metron-management/target/metron-management-0.3.1-archive.tar.gz
Stellar REPL : ./metron-platform/metron-common/target/metron-common-0.3.1-archive.tar.gz
Indexing
- Indexing Configuration: ./metron-platform/metron-indexing/target/metron-indexing-0.3.1-archive.tar.gz
- Elasticsearch Topology: ./metron-platform/metron-elasticsearch/target/metron-elasticsearch-0.3.1-archive.tar.gz
PCAP Ingest Topology: ./metron-platform/metron-pcap-backend/target/metron-pcap-backend-0.3.1-archive.tar.gz
Data and Configuration Management: ./metron-platform/metron-data-management/target/metron-data-management-0.3.1-archive.tar.gz
- Enrichment Loader
- Zookeeper Configuration Manager
Enrichment : ./metron-platform/metron-enrichment/target/metron-enrichment-0.3.1-archive.tar.gz
Parsers: ./metron-platform/metron-parsers/target/metron-parsers-0.3.1-archive.tar.gz

We will create a $METRON_HOME directory and untar these files in it. For this purpose, we will set $METRON_HOME to /usr/metron/0.3.1:

sudo mkdir -p /usr/metron/0.3.1
export METRON_HOME=/usr/metron/0.3.1

Now we can extract the tarballs above into $METRON_HOME:

tar xf ./metron-analytics/metron-profiler-client/target/metron-profiler-client-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-analytics/metron-maas-service/target/metron-maas-service-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-management/target/metron-management-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-common/target/metron-common-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-indexing/target/metron-indexing-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-elasticsearch/target/metron-elasticsearch-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-pcap-backend/target/metron-pcap-backend-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-data-management/target/metron-data-management-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-enrichment/target/metron-enrichment-0.3.1-archive.tar.gz -C $METRON_HOME
tar xf ./metron-platform/metron-parsers/target/metron-parsers-0.3.1-archive.tar.gz -C $METRON_HOME

Configure Metron

From here, we will need to configure the relevant bits of Metron to use the existing infrastructure. We will assume the following:

You are executing these commands on the access node
METRON_ZK is a zookeeper node (e.g. u1401:2181)
METRON_KAFKA is a kafka broker (e.g. u1401:6667)
METRON_HOME is set as in the previous step.

Setup HDFS

We will need a few directories created with the correct permissions in order for Metron to operate properly:

The user which will be used to start topologies should have a home directory in HDFS.
The /apps/metron directory should be created.

Obviously substitute the appropriate user in the following commands:

sudo su - hdfs
# setup variables
export METRON_USER=...
export METRON_HOME=...
export METRON_ZK=...
# Add a user directory
hdfs dfs -mkdir -p /user/$METRON_USER
hdfs dfs -chown $METRON_USER:$METRON_USER /user/$METRON_USER
# Create /apps/metron
hdfs dfs -mkdir -p /apps/metron
hdfs dfs -chown hdfs:hadoop /apps/metron
hdfs dfs -chmod 775 /apps/metron
# Create the HDFS Patterns directory for Grok parsers
hdfs dfs -mkdir -p /apps/metron/patterns
hdfs dfs -chown hdfs:hadoop /apps/metron/patterns
hdfs dfs -chmod 775 /apps/metron/patterns
hdfs dfs -put $METRON_HOME/patterns/* /apps/metron/patterns
# Create the HDFS Index directory
hdfs dfs -mkdir -p /apps/metron/indexing/indexed
hdfs dfs -chown hdfs:hadoop /apps/metron/indexing/indexed
hdfs dfs -chmod 775 /apps/metron/indexing/indexed
# Create the geo IP directory
hdfs dfs -mkdir -p /apps/metron/geo/default
hdfs dfs -chown hdfs:hadoop /apps/metron/geo/default
hdfs dfs -chmod 775 /apps/metron/geo/default
# Grab geo IP data and put it in HDFS
wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
hdfs dfs -put GeoLite2-City.mmdb.gz /apps/metron/geo/default
exit

Adjust Storm Configs

We need to ensure that the HDFS and HBase configurations are on the Storm classpath. This way Metron can interact with them for enrichments and the like. In order to do this:

Open the Ambari Storm configuration
Navigate to “Custom storm-site”
Add a property and select the Single property add mode
- Key: topology.classpath
- Value: /etc/hbase/conf:/etc/hadoop/conf
Click “Add”
Save the Change
Restart Storm

Global Config

The global configuration is a configuration held in zookeeper which spans multiple topologies in Metron. A more detailed discussion of it can be found here.

Edit $METRON_HOME/config/zookeeper/global.json and add or modify the following entries into the JSON Map, substituting the following variables:

ES_NODE - The Elasticsearch node
ES_PORT - The Elasticsearch port
ES_DATE_FORMAT - The Elasticsearch date format to use in index naming (e.g. yyyy.MM.dd.HH cuts indices at hour granularity)

{
  "es.clustername" : "metron",
  "es.ip" : "$ES_NODE",
  "es.port" : "$ES_PORT",
  "es.date.format" : "$ES_DATE_FORMAT"
}

Just for example, my configuration looks as follows:

{
  "es.clustername" : "metron",
  "es.ip" : "u1401",
  "es.port" : "9200",
  "es.date.format" : "yyyy.MM.dd.HH"
}

A couple of things to note here, we are setting up elasticsearch, so Solr configuration would be different. Also, this is a minimal configuration, see the documentation linked above for more options.

Now that these are set, you can push the configurations using the following command: $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK

Enrichment Topology

Individual sensor configurations for enrichment are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/enrichments/ but the topology itself must be configured as well, which involves setting up a couple of things.

HBase Tables for Enrichment and Threat Intel

Enrichments and threat intel data can be loaded easily and referred to via Stellar or through the HBase enrichment adapters. In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables:

METRON_ENRICHMENT_TABLE - usually enrichment
METRON_ENRICHMENT_CF - usually t
METRON_THREATINTEL_TABLE - usually threatintel
METRON_THREATINTEL_CF - usually t

Now you can create the tables via:

export METRON_ENRICHMENT_TABLE=...
export METRON_ENRICHMENT_CF=...
export METRON_THREATINTEL_TABLE=...
export METRON_THREATINTEL_CF=...
echo "create '$METRON_ENRICHMENT_TABLE', '$METRON_ENRICHMENT_CF'" | hbase shell
echo "create '$METRON_THREATINTEL_TABLE', '$METRON_THREATINTEL_CF'" | hbase shell

Kafka Topic

Create the enrichments kafka topic

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic enrichments --partitions 1 --replication-factor 1

Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load.

Topology Configuration

Edit $METRON_HOME/config/enrichment.properties and make the following modifications:

kafka.zk should be $METRON_ZK (e.g. u1401:2181)
kafka.broker should be $METRON_KAFKA (e.g. u1401:6667)
threat.intel.tracker.table should be threatintel
threat.intel.simple.hbase.table should be $METRON_THREATINTEL_TABLE (usually threatintel)
threat.intel.simple.hbase.cf shoudl be $METRON_THREATINTEL_CF (usually t)
enrichment.simple.hbase.table should be $METRON_ENRICHMENT_TABLE (usually enrichment)
enrichment.simple.hbase.cf should be $METRON_ENRICHMENT_CF (usually t)

Start the Enrichment Topology

Now, we can start the enrichment topology and have it function by running: $METRON_HOME/bin/start_enrichment_topology.sh

Indexing Topology

Individual sensor configurations for writing indices are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/indexing/ but the topology itself must be configured as well, which involves setting up a couple of things.

Kafka Topic

Create the indexing kafka topic

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic indexing --partitions 1 --replication-factor 1

Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load.

Topology Configuration

Edit $METRON_HOME/config/elasticsearch.properties and make the following modifications:

kafka.zk should be $METRON_ZK (e.g. u1401:2181)
kafka.broker should be $METRON_KAFKA (e.g. u1401:6667)
bolt.hdfs.file.system.url should be the output of hdfs getconf -confKey fs.default.name
index.hdfs.output should be /apps/metron/indexing/indexed

Elasticsearch Configuration

We have a set of elasticsearch templates which can be used and are located in incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files.

You can install them from this directory via:

export ES_IP=...
export ES_PORT=...
export TEMPLATE_DIR=incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files
export METRON_VERSION=master
# If we don't have a template directory, then we'll create one and pull the templates from the specified version of Metron
if [ ! -d $TEMPLATE_DIR ];then
  echo "Template directory is not there, so we're going to pull from github."
  mkdir -p $TEMPLATE_DIR
  pushd $TEMPLATE_DIR
  for template in bro error snort yaf;do
    wget https://raw.githubusercontent.com/apache/incubator-metron/${METRON_VERSION}/metron-deployment/packag...
  done
  popd
fi
curl http://$ES_IP:$ES_PORT/_template/bro_index --upload-file $TEMPLATE_DIR/bro_index.template
curl http://$ES_IP:$ES_PORT/_template/error_index --upload-file $TEMPLATE_DIR/error_index.template
curl http://$ES_IP:$ES_PORT/_template/snort_index --upload-file $TEMPLATE_DIR/snort_index.template
curl http://$ES_IP:$ES_PORT/_template/yaf_index --upload-file $TEMPLATE_DIR/yaf_index.template

Where $ES_PORT is port elasticsearch is bound to (generally 9200) and the ES_IP is the hostname for elasticsearch.

HBase Table for the Profiler

In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables:

METRON_PROFILER_TABLE - usually profiler
METRON_PROFILER_CF - usually P

Now you can create the tables via:

export METRON_PROFILER_TABLE=...
export METRON_PROFILER_CF=...
echo "create '$METRON_PROFILER_TABLE', '$METRON_PROFILER_CF'" | hbase shell

Topology Configuration

Edit $METRON_HOME/config/profiler.properties and make the following modifications:

kafka.zk should be $METRON_ZK (e.g. u1401:2181)
kafka.broker should be $METRON_KAFKA (e.g. u1401:6667)
profiler.period.duration is the time between snapshots in the profiler. By default this is
profiler.period.duration.units is the time unit of the time between snapshots in the profiler. By default this is MINUTES

Start the Profiler

Now, we can start the profiler topology and have it function by running: $METRON_HOME/bin/start_profiler_topology.sh

Smoke-test Metron

If this were a real Metron installation, you would ingest data by tapping into one of the sensors that we support:

Yaf
Bro
Snort

Or via another mechanism that could transport your data into a kafka queue for the parser to use. We can test Metron by sending some synthetic data into a parser and tracking the data all the way through to the indices and the profiler.

Sensor Source

We are going to call this data source dummy and pipe it directly into kafka. The format of the messages will be a JSON map with one field called valuewhich is a float.

Generator

Create a file called rand_gen.py in your home directory with the following content:

#!/usr/bin/python
import random
import sys
import time
def main():
  mu = float(sys.argv[1])
  sigma = float(sys.argv[2])
  freq_s = int(sys.argv[3])
  while True:
    out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
    print out
    sys.stdout.flush()
    time.sleep(freq_s)

if __name__ == '__main__':
  main()

This will generate random data at a certain frequency.

Kafka Queue

Create the dummy kafka topic

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic dummy --partitions 1 --replication-factor 1

Parser Configuration

We must set up a parser to parse the JSON blobs coming out of our synthetic data generator. Since this is just a JSON Map with a single value, we can set up a simple JSONMap parser. Create a new parser called dummy by editing $METRON_HOME/config/zookeeper/parsers/dummy.json:

{
  "parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
  "sensorTopic":"dummy",
  "fieldTransformations" : [ ]
}

Profiler Configuration

We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. In order to do that, we need to track a statistical summary of the values:

One profile called stat that stores a statistical summary every 1 minute of the value field of the messages from the dummy sensor type.

Edit the value profiler config at $METRON_HOME/config/zookeeper/profiler.json:

{
  "profiles": [
    {
      "profile": "stat",
      "foreach": "'global'",
      "onlyif": "sensor.type == 'dummy'",
      "init" : {
               },
      "update": {
        "s": "STATS_ADD(s, value)"
                },
      "result": "s"
    }
  ]
}

For the purposes of our example, we should adjust the profile to capture the data every 1 minute, rather than 15 minutes.

Edit $METRON_HOME/config/profiler.properties to adjust the capture duration by changing profiler.period.duration=15 to profiler.period.duration=1
Edit $METRON_HOME/config/zookeeper/global.json and add the following properties:

"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"

Enrichment Configuration

We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. Edit the value enrichment config at $METRON_HOME/config/zookeeper/enrichments/dummy.json:

{
  "enrichment" : {
   "fieldMap": {
      "stellar" : {
        "config" : {
"median" : "STATS_PERCENTILE(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))), 50)",
"stddev" : "STATS_SD(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))))"
                  }
      }
    }
  },
  "threatIntel" : { }
  }
}

This creates a new field on every message called median and stddev which represent the median and standard deviation of the values in the last 5 minutes. Despite taking a snapshot every minute, but we can merge those snapshots and get a longer range view.

Start the Topologies

Push zookeeper configs:

$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK

Restart the profiler:

storm kill profiler
$METRON_HOME/bin/start_profiler_topology.sh

Start the dummy parser topology

$METRON_HOME/bin/start_parser_topology.sh -k $METRON_KAFKA -z $METRON_ZK -s dummy

Send some synthetic data directly to the profiler (in another terminal):

python ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $METRON_KAFKA --topic dummy

Validate that Data is Flowing

Wait for at least 5 minutes and execute the following in the Stellar REPL started as $METRON_HOME/bin/stellar -z $METRON_ZK:

# Grab the mean of the values from 3 minutes ago til now
STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 3 minutes ago'))))

Inspect the data going into the indices by running the following:

curl -XPOST "http://$ES_IP:$ES_PORT/dummy*/_search?pretty" -d '
{
  "_source" : [ "median", "stddev", "value" ]
}
'