About sball

sball · ‎02-05-2018

The intention behind this is very much to move towards PCAP query within zeppelin. This script is effectively a backend to provide access to pcap query via a zeppelin interpreter. If you install the sample zeppelin notebooks you will find one demonstrating the PCAP capabilities. The notebook is used like this:

sball · ‎10-16-2017

Note however, that Moloch will not give you any compatibility with Metron or Hadoop, so you'll need a separate Moloch cluster.

sball · ‎10-16-2017

The pycapa probe is used to ingest PCAP, which is then pushed to Kafka. Pycapa (http://metron.apache.org/current-book/metron-sensors/pycapa/index.html) is really intended for test use and is probably good up to around 1Gbps. In a real production you want to use fastcapa (http://metron.apache.org/current-book/metron-sensors/fastcapa/index.html) which does the same job, but in an accelerated way. The PCAP metron topology then takes this and stores it on HDFS sequence files. ($METRON_HOME/bin/start_pcap_topology.sh to start this). pcap-replay is a testing tool used to feed sample pcap data to an interface which you can then listen to with pycapa or fastcapa. pcap-service was the backend for an older interface panel, which is no longer really supported. We'll take a look at get the functionality pushed in the the new metron-rest service somewhere on the roadmap, in the meantime, your best bet is to use the query and inspector tools. There are various ways of then querying the PCAP data through the cli tools documented here: http://metron.apache.org/current-book/metron-platform/metron-pcap-backend/index.html has a lot of other useful information about the way PCAP is collected in Metron. It sounds like you are using monit from your description of services. This is deprecated, please use Ambari to manage services in metron. I would also recommend using the HCP deployment if you can rather than a direct Apache build. 3 nodes is also a tiny metron cluster, so you're unlikely to be able to get the levels of performance for anything like full scape PCAP, but it should be ok for a PoC test grade environment.

sball · ‎07-04-2017

Metron uses Storm, Kafka, HDFS, Spark, Zepplein, Zookeeper and HBase primarily, which are available in other distributions. However, the primary deployment method is through Ambari, so it tends to work a lot better on HDP. All Hortonworks testing of the platform is certainly done on the HDP platform, so it will certainly be a lot easier to use. For AWS type deployments, it may also be worth considering HDC, which is essentially HDP but packaged up in the same way as EMR running on Amazon. s3 could also make sense as a long term storage platform for Metron replacing the HDFS default.

sball · ‎04-26-2017

Updated to include HDP 2.5 (required for Metron > 0.3.0)

sball · ‎04-18-2017

In theory, yes, however, you may want to back up and change the repo locations in Ambari as per the docs to prevent ambari overwriting the repo file. Note there is little real difference here, in terms of performance of the install. Any packages already present will just be skipped as already installed even if you start an install completely from scratch again the local repos.

sball · ‎04-17-2017

Sounds like you may have some connectivity issues to the public repo. One way to solve this is to download separately and use a local repo. Checkout the docs at http://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-installation/content/using_a_local_repository.html which show you how to download all the relevant tar balls to setup a local repo server.

sball · ‎02-20-2017

@Sebastian Carroll These options will work in both yarn-client and yarn-cluster mode. What you will need to do is ensure you have an appropriate file appender in the log4j configuration. That said, if you have a job which is running for multiple days, you are far far better off using yarn-cluster mode to ensure the driver is safely located on the cluster, rather than relying on a single node with a yarn-client hooked to it.

sball · ‎12-30-2016

Could you please confirm your ansible version? This is likely a version conflict with Ansible. We recommend 2.0.0.2

sball · ‎12-30-2016

Metron uses a number of Hadoop ecosystem components, and so tends to require separate master nodes for these for performance, this can also be used for resilience, though this diagram does not show full master HA. To expand the abbreviations:- NN = Name Node (the Hadoop HDFS name node stores file system meta data) SN = Secondary Name Node (not very well named, but provides compaction and optimisation services for the NN) RM = Resource Manager (the container coordinator which manages YARN resources and allocates them to running jobs) ZS = Zookeeper Server (zookeeper is used extensively in Metron for storage and coordination of configuration. It is also used for similar purposes by many other Hadoop components) DN = Data Node (this is an HDFS Data Node and responsible for storing the actual blocks in HDFS)

Online	Offline
Last Visited	‎10-19-2020 01:00 PM

Member Since	‎09-15-2015 10:07 PM
Last Visited	‎10-19-2020 01:00 PM
Posts	116
Kudos received	121

Cloudera Community

Re: metron pcap query

Re: metron pcap data stored in HDFS sequence forma...

Re: Can Apache Metron be installed using CDH or EM...

Re: Installation failed with ambari, Can I retry t...

Re: metron installation on existed ambari managed ...

Re: metron pcap query

Re: metron pcap data stored in HDFS sequence forma...

Re: metron pcap data stored in HDFS sequence forma...

Re: Can Apache Metron be installed using CDH or EM...

Re: Deploying a fresh Metron cluster via Ambari Ma...

Re: Installation failed with ambari, Can I retry t...

Re: Installation failed with ambari, Can I retry t...

Re: Spark job submit log messages on console

Re: Fastcapa Test Environment not work in Metron 0...

Re: metron installation on existed ambari managed ...