Support Questions
Find answers, ask questions, and share your expertise

Metron - Error enriching squid data in Storm


Am running through the tutorial to add a new telemetry source into Metron and have encountered a problem with the enrichmentJoinBolt in Storm, it is failing to process any of the messages that the Squid topology has process with the below error;

2016-10-11 14:32:09 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Unable to retrieve a sensor enrichment config of squid
2016-10-11 14:32:09 o.a.m.e.b.EnrichmentJoinBolt [ERROR] Unable to retrieve a sensor enrichment config of squid
2016-10-11 14:32:09 o.a.m.e.b.JoinBolt [ERROR] [Metron] Unable to join messages: {"code":0,"method":"GET","enrichmentsplitterbolt.splitter.end.ts":"1476196329341","enrichmentsplitterbolt.splitter.begin.ts":"1476196329341","url":"https:\/\/\/plan-a-journey\/","source.type":"squid","elapsed":31271,"ip_dst_addr":null,"original_string":"1476113538.772  31271 TCP_MISS\/000 0 GET https:\/\/\/plan-a-journey\/ - DIRECT\/ -","bytes":0,"action":"TCP_MISS","ip_src_addr":"","timestamp":1476113538772}
java.lang.NullPointerException: null
	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.joinMessages( ~[stormjar.jar:na]
	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.joinMessages( ~[stormjar.jar:na]
	at org.apache.metron.enrichment.bolt.JoinBolt.execute( ~[stormjar.jar:na]
	at backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670) [storm-core-]
	at backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426) [storm-core-]
	at backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) [storm-core-]
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor( [storm-core-]
	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable( [storm-core-]
	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-]
	at backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808) [storm-core-]
	at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) [storm-core-]
	at [clojure-1.6.0.jar:na]
	at [na:1.8.0_40]

I am using the full-dev environment with Metron 0.2.0BETA and the guide,

I can see data in the kibana dashboard from Bro and Yaf, which both also have indexes created in elastic, however there is no index for the squid data.

I tried killing the Storm topologies and re-running ./ then after this restarting the squid parser topology.

Any help would be greatly appreciated.


@Aaron Harris

First check HBase in Ambari to make sure it is green. The threat intelligence enrichments are using hbase.

Another thing to check is the squid log that is sent to kafka. One of the things I found with squid is that if you aren't constantly sending http requests to squid the logs roll over and there are no messages in the latest log. In a production system where squid is routing user http request the log won't be empty. I think you may be running into this problem:

Check the messages going to the squid topic. It looks like they might be missing some information such as the source and dest ips. An easy way to fix this is to do the squid requests again and populate the most recent log.

The squid messages should look something like this:

[vagrant@node1 ~]$ /usr/hdp/ --zookeeper localhost:2181 --topic squid --from-beginning

{,,, security.protocol=PLAINTEXT}

1476285641.838 1439 TCP_MISS/200 457194 GET - DIRECT/ text/html

1476285642.545 704 TCP_MISS/200 40385 GET - DIRECT/ text/html

1476285644.617 2068 TCP_MISS/200 177264 GET - DIRECT/ text/html

Then check the squid messages going to the enrichments topic. They should look something like this:

[vagrant@node1 ~]$ /usr/hdp/ --zookeeper localhost:2181 --topic enrichments --from-beginning | grep squid

{"full_hostname":"","code":200,"method":"GET","url":"http:\/\/\/af\/shoes.html?","source.type":"squid","elapsed":1439,"ip_dst_addr":"","original_string":"1476285641.838 1439 TCP_MISS\/200 457194 GET http:\/\/\/af\/shoes.html? - DIRECT\/ text\/html","bytes":457194,"domain_without_subdomains":"","action":"TCP_MISS","ip_src_addr":"","timestamp":1476285641838}

{"full_hostname":"","code":200,"method":"GET","url":"http:\/\/\/domains-c40986\/transfer-domains-c79878","source.type":"squid","elapsed":704,"ip_dst_addr":"","original_string":"1476285642.545 704 TCP_MISS\/200 40385 GET http:\/\/\/domains-c40986\/transfer-domains-c79878 - DIRECT\/ text\/html","bytes":40385,"domain_without_subdomains":"","action":"TCP_MISS","ip_src_addr":"","timestamp":1476285642545}



Thanks for all your help along the way I think I am finally up and running now.

Found the issue with the enrichments, it was that the squid logs I had generated were missing the destination IP address, once I regenerated these, cleared the kafka queues and restarted the topologies the data started flowing through into elastic index.

Then to get around the timestamp issue I had to curl in a template to elastic to create a template for the squid data with the timestamp field specified as a date as below;

curl -XPUT http://node1:9200/_template/squid -d '{"template":"squid*","mappings": {"squid*": {"properties": {"timestamp": { "type": "date" }}}}}'

@Aaron Harris Glad you are up and running!