Support Questions

Find answers, ask questions, and share your expertise

metron pcap data stored in HDFS sequence format

avatar
Expert Contributor

Hi all,

I have a metron cluster of 3 nodes running. I also have pycapa, pcap-service, and pcap-replay running. I'm trying to understand the role of these three and it's a bit fuzzy to me. I'm using Metron 0.4 version. Pycapa acts as the Metron probe and capture the network traffic then send them to Kafka. PCAP storm topology picks it up and store them in HDFS as sequence files. What's next? How is this data become available in Metron? who, how and when this data get used?

pcap-replay - We put pcap files that were captured previously to for example /opt/pcap-replay/*.pcap, pcap-replay will replay these packets so that Snort, Yaf, and Bro to pick up the data and send them to STORM topology?

pcap-service - was this designed only work for OpenSoc and no longer support in Metron 0.4?

Thank you for any feedbacks in advance.

1 ACCEPTED SOLUTION

avatar
Guru

The pycapa probe is used to ingest PCAP, which is then pushed to Kafka. Pycapa (http://metron.apache.org/current-book/metron-sensors/pycapa/index.html) is really intended for test use and is probably good up to around 1Gbps. In a real production you want to use fastcapa (http://metron.apache.org/current-book/metron-sensors/fastcapa/index.html) which does the same job, but in an accelerated way.

The PCAP metron topology then takes this and stores it on HDFS sequence files. ($METRON_HOME/bin/start_pcap_topology.sh to start this).

pcap-replay is a testing tool used to feed sample pcap data to an interface which you can then listen to with pycapa or fastcapa.

pcap-service was the backend for an older interface panel, which is no longer really supported. We'll take a look at get the functionality pushed in the the new metron-rest service somewhere on the roadmap, in the meantime, your best bet is to use the query and inspector tools. There are various ways of then querying the PCAP data through the cli tools documented here: http://metron.apache.org/current-book/metron-platform/metron-pcap-backend/index.html has a lot of other useful information about the way PCAP is collected in Metron.

It sounds like you are using monit from your description of services. This is deprecated, please use Ambari to manage services in metron. I would also recommend using the HCP deployment if you can rather than a direct Apache build. 3 nodes is also a tiny metron cluster, so you're unlikely to be able to get the levels of performance for anything like full scape PCAP, but it should be ok for a PoC test grade environment.

View solution in original post

4 REPLIES 4

avatar
Guru

The pycapa probe is used to ingest PCAP, which is then pushed to Kafka. Pycapa (http://metron.apache.org/current-book/metron-sensors/pycapa/index.html) is really intended for test use and is probably good up to around 1Gbps. In a real production you want to use fastcapa (http://metron.apache.org/current-book/metron-sensors/fastcapa/index.html) which does the same job, but in an accelerated way.

The PCAP metron topology then takes this and stores it on HDFS sequence files. ($METRON_HOME/bin/start_pcap_topology.sh to start this).

pcap-replay is a testing tool used to feed sample pcap data to an interface which you can then listen to with pycapa or fastcapa.

pcap-service was the backend for an older interface panel, which is no longer really supported. We'll take a look at get the functionality pushed in the the new metron-rest service somewhere on the roadmap, in the meantime, your best bet is to use the query and inspector tools. There are various ways of then querying the PCAP data through the cli tools documented here: http://metron.apache.org/current-book/metron-platform/metron-pcap-backend/index.html has a lot of other useful information about the way PCAP is collected in Metron.

It sounds like you are using monit from your description of services. This is deprecated, please use Ambari to manage services in metron. I would also recommend using the HCP deployment if you can rather than a direct Apache build. 3 nodes is also a tiny metron cluster, so you're unlikely to be able to get the levels of performance for anything like full scape PCAP, but it should be ok for a PoC test grade environment.

avatar
Expert Contributor

Thank you all feedbacks.

Simon, Thank you for pointing out many useful things.

It would be helpful if on Metron Document, we makr what's deprecated and what's no longer supported.

I would like to understand why Metron doesn't support bringing in PCAP data to add to Metron Cluster. I thought the whole idea is to have the data in one place. My work around would be to use tshark to extract PCAP metadata and push it to Metron manually. What's your take on that?

I'll spend more time with query and inspector tools for PCAP, however, we like to visualize our data on a dashboard along with our other network traffic data collected.

I do use Ambari to manage services, however it doesn't have status on the sensors like yaf, bro, and snort like Monit does. Unless, I'm missing a configuration/installation steps to add sensor services to Ambari.

Thank you for your time and feedbacks.

avatar
New Contributor

I would look into moloch from aol. They have an index you might be interested in

avatar
Guru

Note however, that Moloch will not give you any compatibility with Metron or Hadoop, so you'll need a separate Moloch cluster.