Created on 03-14-2017 08:34 PM
Getting Apache Metron (Incubating) installed on an Ubuntu cluster has not yet been something that has gotten a lot of attention. Building on the work of Michael Miklavcic in setting up Elasticsearch and Kibana on Ubuntu, we can proceed to tackle how to install Metron on a working HDP Ubuntu cluster with Elasticsearch and Kibana. For maximum transparency, we will do this manually.
This article assumes the following to be installed and running:
We also assume that the access node that we are on has Oracle Java 8 on the classpath.
The following steps should be done on an access node. This node should have the following installed:
I will assume that the user executing the following section
In order to build Metron, we will need Maven. We will go ahead and install maven manually here in ~:
wget https://archive.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip unzip -qq apache-maven-3.3.9-bin.zip export M2_HOME=$PWD/apache-maven-3.3.9 export PATH=$M2_HOME/bin:$PATH
After this, you should see something like the following when running mvn -version:
root@u1401:~# mvn -version Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) Maven home: /root/apache-maven-3.3.9 Java version: 1.8.0_121, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-8-oracle/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "3.13.0-24-generic", arch: "amd64", family: "unix"
I will be building Metron from git master here, but an official Metron apache release post 0.3.1 from http://metron.apache.org/documentation/#releases would work as well.
Install git and clone the Metron repository via
sudo apt-get install -y git git clone https://github.com/apache/incubator-metron.git
This will create a directory called incubator-metron. From within that directory we can build Metron:
cd incubator-metron mvn -q -T 2C -DskipTests -PHDP-2.5.0.0 install
This will take some time and please ignore the warnings (but not errors). The output will be a set of tar.gz files that we can decompress and expand into the core of our Metron installation:
We will create a $METRON_HOME directory and untar these files in it. For this purpose, we will set $METRON_HOME to /usr/metron/0.3.1:
sudo mkdir -p /usr/metron/0.3.1 export METRON_HOME=/usr/metron/0.3.1
Now we can extract the tarballs above into $METRON_HOME:
tar xf ./metron-analytics/metron-profiler-client/target/metron-profiler-client-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-analytics/metron-maas-service/target/metron-maas-service-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-management/target/metron-management-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-common/target/metron-common-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-indexing/target/metron-indexing-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-elasticsearch/target/metron-elasticsearch-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-pcap-backend/target/metron-pcap-backend-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-data-management/target/metron-data-management-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-enrichment/target/metron-enrichment-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-parsers/target/metron-parsers-0.3.1-archive.tar.gz -C $METRON_HOME
From here, we will need to configure the relevant bits of Metron to use the existing infrastructure. We will assume the following:
We will need a few directories created with the correct permissions in order for Metron to operate properly:
Obviously substitute the appropriate user in the following commands:
sudo su - hdfs # setup variables export METRON_USER=... export METRON_HOME=... export METRON_ZK=... # Add a user directory hdfs dfs -mkdir -p /user/$METRON_USER hdfs dfs -chown $METRON_USER:$METRON_USER /user/$METRON_USER # Create /apps/metron hdfs dfs -mkdir -p /apps/metron hdfs dfs -chown hdfs:hadoop /apps/metron hdfs dfs -chmod 775 /apps/metron # Create the HDFS Patterns directory for Grok parsers hdfs dfs -mkdir -p /apps/metron/patterns hdfs dfs -chown hdfs:hadoop /apps/metron/patterns hdfs dfs -chmod 775 /apps/metron/patterns hdfs dfs -put $METRON_HOME/patterns/* /apps/metron/patterns # Create the HDFS Index directory hdfs dfs -mkdir -p /apps/metron/indexing/indexed hdfs dfs -chown hdfs:hadoop /apps/metron/indexing/indexed hdfs dfs -chmod 775 /apps/metron/indexing/indexed # Create the geo IP directory hdfs dfs -mkdir -p /apps/metron/geo/default hdfs dfs -chown hdfs:hadoop /apps/metron/geo/default hdfs dfs -chmod 775 /apps/metron/geo/default # Grab geo IP data and put it in HDFS wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz hdfs dfs -put GeoLite2-City.mmdb.gz /apps/metron/geo/default exit
We need to ensure that the HDFS and HBase configurations are on the Storm classpath. This way Metron can interact with them for enrichments and the like. In order to do this:
The global configuration is a configuration held in zookeeper which spans multiple topologies in Metron. A more detailed discussion of it can be found here.
Edit $METRON_HOME/config/zookeeper/global.json and add or modify the following entries into the JSON Map, substituting the following variables:
{ "es.clustername" : "metron", "es.ip" : "$ES_NODE", "es.port" : "$ES_PORT", "es.date.format" : "$ES_DATE_FORMAT" }
Just for example, my configuration looks as follows:
{ "es.clustername" : "metron", "es.ip" : "u1401", "es.port" : "9200", "es.date.format" : "yyyy.MM.dd.HH" }
A couple of things to note here, we are setting up elasticsearch, so Solr configuration would be different. Also, this is a minimal configuration, see the documentation linked above for more options.
Now that these are set, you can push the configurations using the following command: $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK
Individual sensor configurations for enrichment are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/enrichments/ but the topology itself must be configured as well, which involves setting up a couple of things.
Enrichments and threat intel data can be loaded easily and referred to via Stellar or through the HBase enrichment adapters. In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables:
Now you can create the tables via:
export METRON_ENRICHMENT_TABLE=... export METRON_ENRICHMENT_CF=... export METRON_THREATINTEL_TABLE=... export METRON_THREATINTEL_CF=... echo "create '$METRON_ENRICHMENT_TABLE', '$METRON_ENRICHMENT_CF'" | hbase shell echo "create '$METRON_THREATINTEL_TABLE', '$METRON_THREATINTEL_CF'" | hbase shell
Create the enrichments kafka topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic enrichments --partitions 1 --replication-factor 1
Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load.
Edit $METRON_HOME/config/enrichment.properties and make the following modifications:
Now, we can start the enrichment topology and have it function by running: $METRON_HOME/bin/start_enrichment_topology.sh
Individual sensor configurations for writing indices are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/indexing/ but the topology itself must be configured as well, which involves setting up a couple of things.
Create the indexing kafka topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic indexing --partitions 1 --replication-factor 1
Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load.
Edit $METRON_HOME/config/elasticsearch.properties and make the following modifications:
We have a set of elasticsearch templates which can be used and are located in incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files.
You can install them from this directory via:
export ES_IP=... export ES_PORT=... export TEMPLATE_DIR=incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files export METRON_VERSION=master # If we don't have a template directory, then we'll create one and pull the templates from the specified version of Metron if [ ! -d $TEMPLATE_DIR ];then echo "Template directory is not there, so we're going to pull from github." mkdir -p $TEMPLATE_DIR pushd $TEMPLATE_DIR for template in bro error snort yaf;do wget https://raw.githubusercontent.com/apache/incubator-metron/${METRON_VERSION}/metron-deployment/packag... done popd fi curl http://$ES_IP:$ES_PORT/_template/bro_index --upload-file $TEMPLATE_DIR/bro_index.template curl http://$ES_IP:$ES_PORT/_template/error_index --upload-file $TEMPLATE_DIR/error_index.template curl http://$ES_IP:$ES_PORT/_template/snort_index --upload-file $TEMPLATE_DIR/snort_index.template curl http://$ES_IP:$ES_PORT/_template/yaf_index --upload-file $TEMPLATE_DIR/yaf_index.template
Where $ES_PORT is port elasticsearch is bound to (generally 9200) and the ES_IP is the hostname for elasticsearch.
In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables:
Now you can create the tables via:
export METRON_PROFILER_TABLE=... export METRON_PROFILER_CF=... echo "create '$METRON_PROFILER_TABLE', '$METRON_PROFILER_CF'" | hbase shell
Edit $METRON_HOME/config/profiler.properties and make the following modifications:
Now, we can start the profiler topology and have it function by running: $METRON_HOME/bin/start_profiler_topology.sh
If this were a real Metron installation, you would ingest data by tapping into one of the sensors that we support:
Or via another mechanism that could transport your data into a kafka queue for the parser to use. We can test Metron by sending some synthetic data into a parser and tracking the data all the way through to the indices and the profiler.
We are going to call this data source dummy and pipe it directly into kafka. The format of the messages will be a JSON map with one field called valuewhich is a float.
Create a file called rand_gen.py in your home directory with the following content:
#!/usr/bin/python import random import sys import time def main(): mu = float(sys.argv[1]) sigma = float(sys.argv[2]) freq_s = int(sys.argv[3]) while True: out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }' print out sys.stdout.flush() time.sleep(freq_s) if __name__ == '__main__': main()
This will generate random data at a certain frequency.
Create the dummy kafka topic
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic dummy --partitions 1 --replication-factor 1
We must set up a parser to parse the JSON blobs coming out of our synthetic data generator. Since this is just a JSON Map with a single value, we can set up a simple JSONMap parser. Create a new parser called dummy by editing $METRON_HOME/config/zookeeper/parsers/dummy.json:
{ "parserClassName":"org.apache.metron.parsers.json.JSONMapParser", "sensorTopic":"dummy", "fieldTransformations" : [ ] }
We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. In order to do that, we need to track a statistical summary of the values:
Edit the value profiler config at $METRON_HOME/config/zookeeper/profiler.json:
{ "profiles": [ { "profile": "stat", "foreach": "'global'", "onlyif": "sensor.type == 'dummy'", "init" : { }, "update": { "s": "STATS_ADD(s, value)" }, "result": "s" } ] }
For the purposes of our example, we should adjust the profile to capture the data every 1 minute, rather than 15 minutes.
"profiler.client.period.duration" : "1", "profiler.client.period.duration.units" : "MINUTES"
We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. Edit the value enrichment config at $METRON_HOME/config/zookeeper/enrichments/dummy.json:
{ "enrichment" : { "fieldMap": { "stellar" : { "config" : { "median" : "STATS_PERCENTILE(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))), 50)", "stddev" : "STATS_SD(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))))" } } } }, "threatIntel" : { } } }
This creates a new field on every message called median and stddev which represent the median and standard deviation of the values in the last 5 minutes. Despite taking a snapshot every minute, but we can merge those snapshots and get a longer range view.
$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK
storm kill profiler $METRON_HOME/bin/start_profiler_topology.sh
$METRON_HOME/bin/start_parser_topology.sh -k $METRON_KAFKA -z $METRON_ZK -s dummy
python ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $METRON_KAFKA --topic dummy
# Grab the mean of the values from 3 minutes ago til now STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 3 minutes ago'))))
curl -XPOST "http://$ES_IP:$ES_PORT/dummy*/_search?pretty" -d ' { "_source" : [ "median", "stddev", "value" ] } '