Support Questions

Find answers, ask questions, and share your expertise

Metron - Error enriching squid data in Storm

avatar
Contributor

Am running through the tutorial to add a new telemetry source into Metron and have encountered a problem with the enrichmentJoinBolt in Storm, it is failing to process any of the messages that the Squid topology has process with the below error;

2016-10-11 14:32:09 o.a.m.e.b.EnrichmentSplitterBolt [ERROR] Unable to retrieve a sensor enrichment config of squid
2016-10-11 14:32:09 o.a.m.e.b.EnrichmentJoinBolt [ERROR] Unable to retrieve a sensor enrichment config of squid
2016-10-11 14:32:09 o.a.m.e.b.JoinBolt [ERROR] [Metron] Unable to join messages: {"code":0,"method":"GET","enrichmentsplitterbolt.splitter.end.ts":"1476196329341","enrichmentsplitterbolt.splitter.begin.ts":"1476196329341","url":"https:\/\/tfl.gov.uk\/plan-a-journey\/","source.type":"squid","elapsed":31271,"ip_dst_addr":null,"original_string":"1476113538.772  31271 127.0.0.1 TCP_MISS\/000 0 GET https:\/\/tfl.gov.uk\/plan-a-journey\/ - DIRECT\/tfl.gov.uk -","bytes":0,"action":"TCP_MISS","ip_src_addr":"127.0.0.1","timestamp":1476113538772}
java.lang.NullPointerException: null
	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.joinMessages(EnrichmentJoinBolt.java:76) ~[stormjar.jar:na]
	at org.apache.metron.enrichment.bolt.EnrichmentJoinBolt.joinMessages(EnrichmentJoinBolt.java:33) ~[stormjar.jar:na]
	at org.apache.metron.enrichment.bolt.JoinBolt.execute(JoinBolt.java:111) ~[stormjar.jar:na]
	at backtype.storm.daemon.executor$fn__7014$tuple_action_fn__7016.invoke(executor.clj:670) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$mk_task_receiver$fn__6937.invoke(executor.clj:426) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$clojure_handler$reify__6513.onEvent(disruptor.clj:58) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.daemon.executor$fn__7014$fn__7027$fn__7078.invoke(executor.clj:808) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at backtype.storm.util$async_loop$fn__545.invoke(util.clj:475) [storm-core-0.10.0.2.3.0.0-2557.jar:0.10.0.2.3.0.0-2557]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]

I am using the full-dev environment with Metron 0.2.0BETA and the guide, https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%...

I can see data in the kibana dashboard from Bro and Yaf, which both also have indexes created in elastic, however there is no index for the squid data.

I tried killing the Storm topologies and re-running ./run_enrichment_role.sh then after this restarting the squid parser topology.

Any help would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar

@Aaron Harris

Check to be sure the enrichment config and parser configs for squid are installed using the zk_load_configs.sh with the -m DUMP method:

For example on quick dev run this command. The parser enrichment configs are in bold:

[vagrant@node1 ~]$ /usr/metron/0.2.0BETA/bin/zk_load_configs.sh -i /usr/metron/0.2.0BETA/config/zookeeper/ -m DUMP -z localhost:2181 | grep -i squid | grep Config

log4j:WARN No appenders could be found for logger (org.apache.curator.framework.imps.CuratorFrameworkImpl).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

PARSER Config: squid

ENRICHMENT Config: squid

If not, check the zookeeper config directory:

[vagrant@node1 ~]$ ls /usr/metron/0.2.0BETA/config/zookeeper/enrichments/

bro.json snort.json squid.json websphere.json yaf.json

Then update zookeeper:

/usr/metron/0.2.0BETA/bin/zk_load_configs.sh -i /usr/metron/0.2.0BETA/config/zookeeper/ -m PUSH -z localhost:2181

Then you will probably need to restart the enrichment topology. From Ambari, go to the storm UI, click into the enrichment topology and then the Kill button. If you are using quick dev, monit should automatically restart.

View solution in original post

12 REPLIES 12

avatar

@Aaron Harris

First check HBase in Ambari to make sure it is green. The threat intelligence enrichments are using hbase.

Another thing to check is the squid log that is sent to kafka. One of the things I found with squid is that if you aren't constantly sending http requests to squid the logs roll over and there are no messages in the latest log. In a production system where squid is routing user http request the log won't be empty. I think you may be running into this problem:

Check the messages going to the squid topic. It looks like they might be missing some information such as the source and dest ips. An easy way to fix this is to do the squid requests again and populate the most recent log.

The squid messages should look something like this:

[vagrant@node1 ~]$ /usr/hdp/2.4.2.0-258/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic squid --from-beginning

{metadata.broker.list=node1:6667, request.timeout.ms=30000, client.id=console-consumer-31722, security.protocol=PLAINTEXT}

1476285641.838 1439 127.0.0.1 TCP_MISS/200 457194 GET http://www.aliexpress.com/af/shoes.html? - DIRECT/104.81.164.40 text/html

1476285642.545 704 127.0.0.1 TCP_MISS/200 40385 GET http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878 - DIRECT/212.227.34.3 text/html

1476285644.617 2068 127.0.0.1 TCP_MISS/200 177264 GET http://www.pravda.ru/science/ - DIRECT/185.103.135.90 text/html

Then check the squid messages going to the enrichments topic. They should look something like this:

[vagrant@node1 ~]$ /usr/hdp/2.4.2.0-258/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic enrichments --from-beginning | grep squid

{"full_hostname":"www.aliexpress.com","code":200,"method":"GET","url":"http:\/\/www.aliexpress.com\/af\/shoes.html?","source.type":"squid","elapsed":1439,"ip_dst_addr":"104.81.164.40","original_string":"1476285641.838 1439 127.0.0.1 TCP_MISS\/200 457194 GET http:\/\/www.aliexpress.com\/af\/shoes.html? - DIRECT\/104.81.164.40 text\/html","bytes":457194,"domain_without_subdomains":"aliexpress.com","action":"TCP_MISS","ip_src_addr":"127.0.0.1","timestamp":1476285641838}

{"full_hostname":"www.help.1and1.co.uk","code":200,"method":"GET","url":"http:\/\/www.help.1and1.co.uk\/domains-c40986\/transfer-domains-c79878","source.type":"squid","elapsed":704,"ip_dst_addr":"212.227.34.3","original_string":"1476285642.545 704 127.0.0.1 TCP_MISS\/200 40385 GET http:\/\/www.help.1and1.co.uk\/domains-c40986\/transfer-domains-c79878 - DIRECT\/212.227.34.3 text\/html","bytes":40385,"domain_without_subdomains":"1and1.co.uk","action":"TCP_MISS","ip_src_addr":"127.0.0.1","timestamp":1476285642545}

avatar
Contributor

@cduby

Thanks for all your help along the way I think I am finally up and running now.

Found the issue with the enrichments, it was that the squid logs I had generated were missing the destination IP address, once I regenerated these, cleared the kafka queues and restarted the topologies the data started flowing through into elastic index.

Then to get around the timestamp issue I had to curl in a template to elastic to create a template for the squid data with the timestamp field specified as a date as below;

curl -XPUT http://node1:9200/_template/squid -d '{"template":"squid*","mappings": {"squid*": {"properties": {"timestamp": { "type": "date" }}}}}'

avatar

@Aaron Harris Glad you are up and running!