About cstella

cstella · ‎03-14-2017

Introduction Getting Apache Metron (Incubating) installed on an Ubuntu cluster has not yet been something that has gotten a lot of attention. Building on the work of Michael Miklavcic in setting up Elasticsearch and Kibana on Ubuntu, we can proceed to tackle how to install Metron on a working HDP Ubuntu cluster with Elasticsearch and Kibana. For maximum transparency, we will do this manually. This article assumes the following to be installed and running: HDP 2.5.0+ with HDFS HBase Storm Zookeeper Kafka Elasticsearch 2.4 Kibana 4.5.3 We also assume that the access node that we are on has Oracle Java 8 on the classpath. Install Metron Preliminaries The following steps should be done on an access node. This node should have the following installed: The Hadoop client The Storm client I will assume that the user executing the following section Has suitable permissions to write to /apps in HDFS Has suitable permissions to start Storm topologies on the cluster Has sudo access In order to build Metron, we will need Maven. We will go ahead and install maven manually here in ~: wget https://archive.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip unzip -qq apache-maven-3.3.9-bin.zip export M2_HOME=$PWD/apache-maven-3.3.9 export PATH=$M2_HOME/bin:$PATH After this, you should see something like the following when running mvn -version: root@u1401:~# mvn -version Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) Maven home: /root/apache-maven-3.3.9 Java version: 1.8.0_121, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-8-oracle/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "3.13.0-24-generic", arch: "amd64", family: "unix" Build Metron from Source and Install I will be building Metron from git master here, but an official Metron apache release post 0.3.1 from http://metron.apache.org/documentation/#releases would work as well. Install git and clone the Metron repository via sudo apt-get install -y git git clone https://github.com/apache/incubator-metron.git This will create a directory called incubator-metron. From within that directory we can build Metron: cd incubator-metron mvn -q -T 2C -DskipTests -PHDP-2.5.0.0 install This will take some time and please ignore the warnings (but not errors). The output will be a set of tar.gz files that we can decompress and expand into the core of our Metron installation: Profiler Client : ./metron-analytics/metron-profiler-client/target/metron-profiler-client-0.3.1-archive.tar.gz Topology: ./metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz Model as a Service: ./metron-analytics/metron-maas-service/target/metron-maas-service-0.3.1-archive.tar.gz Metron Management Stellar Functions: ./metron-platform/metron-management/target/metron-management-0.3.1-archive.tar.gz Stellar REPL : ./metron-platform/metron-common/target/metron-common-0.3.1-archive.tar.gz Indexing Indexing Configuration: ./metron-platform/metron-indexing/target/metron-indexing-0.3.1-archive.tar.gz Elasticsearch Topology: ./metron-platform/metron-elasticsearch/target/metron-elasticsearch-0.3.1-archive.tar.gz PCAP Ingest Topology: ./metron-platform/metron-pcap-backend/target/metron-pcap-backend-0.3.1-archive.tar.gz Data and Configuration Management: ./metron-platform/metron-data-management/target/metron-data-management-0.3.1-archive.tar.gz Enrichment Loader Zookeeper Configuration Manager Enrichment : ./metron-platform/metron-enrichment/target/metron-enrichment-0.3.1-archive.tar.gz Parsers: ./metron-platform/metron-parsers/target/metron-parsers-0.3.1-archive.tar.gz We will create a $METRON_HOME directory and untar these files in it. For this purpose, we will set $METRON_HOME to /usr/metron/0.3.1: sudo mkdir -p /usr/metron/0.3.1 export METRON_HOME=/usr/metron/0.3.1 Now we can extract the tarballs above into $METRON_HOME: tar xf ./metron-analytics/metron-profiler-client/target/metron-profiler-client-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-analytics/metron-maas-service/target/metron-maas-service-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-management/target/metron-management-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-common/target/metron-common-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-indexing/target/metron-indexing-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-elasticsearch/target/metron-elasticsearch-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-pcap-backend/target/metron-pcap-backend-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-data-management/target/metron-data-management-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-enrichment/target/metron-enrichment-0.3.1-archive.tar.gz -C $METRON_HOME tar xf ./metron-platform/metron-parsers/target/metron-parsers-0.3.1-archive.tar.gz -C $METRON_HOME Configure Metron From here, we will need to configure the relevant bits of Metron to use the existing infrastructure. We will assume the following: You are executing these commands on the access node METRON_ZK is a zookeeper node (e.g. u1401:2181) METRON_KAFKA is a kafka broker (e.g. u1401:6667) METRON_HOME is set as in the previous step. Setup HDFS We will need a few directories created with the correct permissions in order for Metron to operate properly: The user which will be used to start topologies should have a home directory in HDFS. The /apps/metron directory should be created. Obviously substitute the appropriate user in the following commands: sudo su - hdfs # setup variables export METRON_USER=... export METRON_HOME=... export METRON_ZK=... # Add a user directory hdfs dfs -mkdir -p /user/$METRON_USER hdfs dfs -chown $METRON_USER:$METRON_USER /user/$METRON_USER # Create /apps/metron hdfs dfs -mkdir -p /apps/metron hdfs dfs -chown hdfs:hadoop /apps/metron hdfs dfs -chmod 775 /apps/metron # Create the HDFS Patterns directory for Grok parsers hdfs dfs -mkdir -p /apps/metron/patterns hdfs dfs -chown hdfs:hadoop /apps/metron/patterns hdfs dfs -chmod 775 /apps/metron/patterns hdfs dfs -put $METRON_HOME/patterns/* /apps/metron/patterns # Create the HDFS Index directory hdfs dfs -mkdir -p /apps/metron/indexing/indexed hdfs dfs -chown hdfs:hadoop /apps/metron/indexing/indexed hdfs dfs -chmod 775 /apps/metron/indexing/indexed # Create the geo IP directory hdfs dfs -mkdir -p /apps/metron/geo/default hdfs dfs -chown hdfs:hadoop /apps/metron/geo/default hdfs dfs -chmod 775 /apps/metron/geo/default # Grab geo IP data and put it in HDFS wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz hdfs dfs -put GeoLite2-City.mmdb.gz /apps/metron/geo/default exit Adjust Storm Configs We need to ensure that the HDFS and HBase configurations are on the Storm classpath. This way Metron can interact with them for enrichments and the like. In order to do this: Open the Ambari Storm configuration Navigate to “Custom storm-site” Add a property and select the Single property add mode Key: topology.classpath Value: /etc/hbase/conf:/etc/hadoop/conf Click “Add” Save the Change Restart Storm Global Config The global configuration is a configuration held in zookeeper which spans multiple topologies in Metron. A more detailed discussion of it can be found here. Edit $METRON_HOME/config/zookeeper/global.json and add or modify the following entries into the JSON Map, substituting the following variables: ES_NODE - The Elasticsearch node ES_PORT - The Elasticsearch port ES_DATE_FORMAT - The Elasticsearch date format to use in index naming (e.g. yyyy.MM.dd.HH cuts indices at hour granularity) { "es.clustername" : "metron", "es.ip" : "$ES_NODE", "es.port" : "$ES_PORT", "es.date.format" : "$ES_DATE_FORMAT" } Just for example, my configuration looks as follows: { "es.clustername" : "metron", "es.ip" : "u1401", "es.port" : "9200", "es.date.format" : "yyyy.MM.dd.HH" } A couple of things to note here, we are setting up elasticsearch, so Solr configuration would be different. Also, this is a minimal configuration, see the documentation linked above for more options. Now that these are set, you can push the configurations using the following command: $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK Enrichment Topology Individual sensor configurations for enrichment are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/enrichments/ but the topology itself must be configured as well, which involves setting up a couple of things. HBase Tables for Enrichment and Threat Intel Enrichments and threat intel data can be loaded easily and referred to via Stellar or through the HBase enrichment adapters. In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables: METRON_ENRICHMENT_TABLE - usually enrichment METRON_ENRICHMENT_CF - usually t METRON_THREATINTEL_TABLE - usually threatintel METRON_THREATINTEL_CF - usually t Now you can create the tables via: export METRON_ENRICHMENT_TABLE=... export METRON_ENRICHMENT_CF=... export METRON_THREATINTEL_TABLE=... export METRON_THREATINTEL_CF=... echo "create '$METRON_ENRICHMENT_TABLE', '$METRON_ENRICHMENT_CF'" | hbase shell echo "create '$METRON_THREATINTEL_TABLE', '$METRON_THREATINTEL_CF'" | hbase shell Kafka Topic Create the enrichments kafka topic /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic enrichments --partitions 1 --replication-factor 1 Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load. Topology Configuration Edit $METRON_HOME/config/enrichment.properties and make the following modifications: kafka.zk should be $METRON_ZK (e.g. u1401:2181) kafka.broker should be $METRON_KAFKA (e.g. u1401:6667) threat.intel.tracker.table should be threatintel threat.intel.simple.hbase.table should be $METRON_THREATINTEL_TABLE (usually threatintel) threat.intel.simple.hbase.cf shoudl be $METRON_THREATINTEL_CF (usually t) enrichment.simple.hbase.table should be $METRON_ENRICHMENT_TABLE (usually enrichment) enrichment.simple.hbase.cf should be $METRON_ENRICHMENT_CF (usually t) Start the Enrichment Topology Now, we can start the enrichment topology and have it function by running: $METRON_HOME/bin/start_enrichment_topology.sh Indexing Topology Individual sensor configurations for writing indices are stored in zookeeper and may be found at $METRON_HOME/config/zookeeper/indexing/ but the topology itself must be configured as well, which involves setting up a couple of things. Kafka Topic Create the indexing kafka topic /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic indexing --partitions 1 --replication-factor 1 Note, this creates a topic with one partition and a replication factor of 1. This is likely not suitable for production and may need to be adjusted according to load. Topology Configuration Edit $METRON_HOME/config/elasticsearch.properties and make the following modifications: kafka.zk should be $METRON_ZK (e.g. u1401:2181) kafka.broker should be $METRON_KAFKA (e.g. u1401:6667) bolt.hdfs.file.system.url should be the output of hdfs getconf -confKey fs.default.name index.hdfs.output should be /apps/metron/indexing/indexed Elasticsearch Configuration We have a set of elasticsearch templates which can be used and are located in incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files. You can install them from this directory via: export ES_IP=... export ES_PORT=... export TEMPLATE_DIR=incubator-metron/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files export METRON_VERSION=master # If we don't have a template directory, then we'll create one and pull the templates from the specified version of Metron if [ ! -d $TEMPLATE_DIR ];then echo "Template directory is not there, so we're going to pull from github." mkdir -p $TEMPLATE_DIR pushd $TEMPLATE_DIR for template in bro error snort yaf;do wget https://raw.githubusercontent.com/apache/incubator-metron/${METRON_VERSION}/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/files/${template}_index.template done popd fi curl http://$ES_IP:$ES_PORT/_template/bro_index --upload-file $TEMPLATE_DIR/bro_index.template curl http://$ES_IP:$ES_PORT/_template/error_index --upload-file $TEMPLATE_DIR/error_index.template curl http://$ES_IP:$ES_PORT/_template/snort_index --upload-file $TEMPLATE_DIR/snort_index.template curl http://$ES_IP:$ES_PORT/_template/yaf_index --upload-file $TEMPLATE_DIR/yaf_index.template Where $ES_PORT is port elasticsearch is bound to (generally 9200) and the ES_IP is the hostname for elasticsearch. HBase Table for the Profiler In order to use these, we must create the appropriate HBase tables. For these, we will assume the following environment variables: METRON_PROFILER_TABLE - usually profiler METRON_PROFILER_CF - usually P Now you can create the tables via: export METRON_PROFILER_TABLE=... export METRON_PROFILER_CF=... echo "create '$METRON_PROFILER_TABLE', '$METRON_PROFILER_CF'" | hbase shell Topology Configuration Edit $METRON_HOME/config/profiler.properties and make the following modifications: kafka.zk should be $METRON_ZK (e.g. u1401:2181) kafka.broker should be $METRON_KAFKA (e.g. u1401:6667) profiler.period.duration is the time between snapshots in the profiler. By default this is profiler.period.duration.units is the time unit of the time between snapshots in the profiler. By default this is MINUTES Start the Profiler Now, we can start the profiler topology and have it function by running: $METRON_HOME/bin/start_profiler_topology.sh Smoke-test Metron If this were a real Metron installation, you would ingest data by tapping into one of the sensors that we support: Yaf Bro Snort Or via another mechanism that could transport your data into a kafka queue for the parser to use. We can test Metron by sending some synthetic data into a parser and tracking the data all the way through to the indices and the profiler. Sensor Source We are going to call this data source dummy and pipe it directly into kafka. The format of the messages will be a JSON map with one field called valuewhich is a float. Generator Create a file called rand_gen.py in your home directory with the following content: #!/usr/bin/python import random import sys import time def main(): mu = float(sys.argv[1]) sigma = float(sys.argv[2]) freq_s = int(sys.argv[3]) while True: out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }' print out sys.stdout.flush() time.sleep(freq_s) if __name__ == '__main__': main() This will generate random data at a certain frequency. Kafka Queue Create the dummy kafka topic /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $METRON_ZK --create --topic dummy --partitions 1 --replication-factor 1 Parser Configuration We must set up a parser to parse the JSON blobs coming out of our synthetic data generator. Since this is just a JSON Map with a single value, we can set up a simple JSONMap parser. Create a new parser called dummy by editing $METRON_HOME/config/zookeeper/parsers/dummy.json: { "parserClassName":"org.apache.metron.parsers.json.JSONMapParser", "sensorTopic":"dummy", "fieldTransformations" : [ ] } Profiler Configuration We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. In order to do that, we need to track a statistical summary of the values: One profile called stat that stores a statistical summary every 1 minute of the value field of the messages from the dummy sensor type. Edit the value profiler config at $METRON_HOME/config/zookeeper/profiler.json: { "profiles": [ { "profile": "stat", "foreach": "'global'", "onlyif": "sensor.type == 'dummy'", "init" : { }, "update": { "s": "STATS_ADD(s, value)" }, "result": "s" } ] } For the purposes of our example, we should adjust the profile to capture the data every 1 minute, rather than 15 minutes. Edit $METRON_HOME/config/profiler.properties to adjust the capture duration by changing profiler.period.duration=15 to profiler.period.duration=1 Edit $METRON_HOME/config/zookeeper/global.json and add the following properties: "profiler.client.period.duration" : "1", "profiler.client.period.duration.units" : "MINUTES" Enrichment Configuration We want to be able to enrich these messages by looking back and capturing some statistical information about the last 5 minutes of values coming through the topology. Edit the value enrichment config at $METRON_HOME/config/zookeeper/enrichments/dummy.json: { "enrichment" : { "fieldMap": { "stellar" : { "config" : { "median" : "STATS_PERCENTILE(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))), 50)", "stddev" : "STATS_SD(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago'))))" } } } }, "threatIntel" : { } } } This creates a new field on every message called median and stddev which represent the median and standard deviation of the values in the last 5 minutes. Despite taking a snapshot every minute, but we can merge those snapshots and get a longer range view. Start the Topologies Push zookeeper configs: $METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper -z $METRON_ZK Restart the profiler: storm kill profiler $METRON_HOME/bin/start_profiler_topology.sh Start the dummy parser topology $METRON_HOME/bin/start_parser_topology.sh -k $METRON_KAFKA -z $METRON_ZK -s dummy Send some synthetic data directly to the profiler (in another terminal): python ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $METRON_KAFKA --topic dummy Validate that Data is Flowing Wait for at least 5 minutes and execute the following in the Stellar REPL started as $METRON_HOME/bin/stellar -z $METRON_ZK: # Grab the mean of the values from 3 minutes ago til now STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 3 minutes ago')))) Inspect the data going into the indices by running the following: curl -XPOST "http://$ES_IP:$ES_PORT/dummy*/_search?pretty" -d ' { "_source" : [ "median", "stddev", "value" ] } '

cstella · ‎11-10-2015

An older version of JPMML is BSD-3 licensed. It supports PMML 3.0, 3.1, 3.2, 4.0 and 4.1.

cstella · ‎11-05-2015

@Simon Elliston Ball is right, there's a huge variety of options for NLP as there are many niches for natural language processing. Keep in mind that NLP libraries rarely directly solve business solutions directly. Rather, they give you the tools to build a solution. Often this is segmenting free text into chunks suitable for analysis (e.g. sentence disambiguation), annotating free text (e.g. part of speech tagging), converting free text to a more structured form (e.g. vectorization). All of these are tools that are useful in processing text, but are insufficient by themselves. These tools help you convert free, unstructured text into a form suitable as input into a normal machine learning or analysis pipeline (i.e. classification, etc.). I suppose the one exception to this that I can think of is sentiment analysis..that is a properly valuable analytic in and of itself. Also, keep in mind the license for some of these libraries are not as permissive as Apache (e.g. CoreNLP is GPL with the option to purchase a license for commercial use).

Online	Offline
Last Visited	‎10-30-2017 06:55 PM

Member Since	‎11-05-2015 11:07 PM
Last Visited	‎10-30-2017 06:55 PM
Posts	23
Kudos received	15

Cloudera Community

Re: Metron pcap analysis vs wireshark

Re: Exception in indexingBolt of indexing topology

Re: Metron Quick-dev-platform installation error -...

Re: Metron singlenode pcap_replay step failing

Re: Metron vagrant start hadoop services faiing

Manually Installing Apache Metron on Ubuntu 14.04

Re: Anyone know of a Java PMML evaluator with an A...

Re: What is recommended NLP solution on top of HDP...