Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Expert Contributor

Apache Metron vs. OpenSoc

Apache Metron inherits the advantages of OpenSoc which enables fast processing of events from variety sources. One of its intent is to overcome the shortcomings of OpenSoc. The main challenges of OpenSoc architecture are:

  • Does not take advantage of full parallelism. For each topic, the enrichment topology contains a set of bolts which run in serially. This architecture does not take advantage of the parallelism that storm could provide. It can be observed in OpenSoc architecture in Figure 1.
  • Hard to extend. To add a new data source, it has to create a new topology to fulfill that requirement.
  • Hard to maintain. With the number of data sources increases, there are lots of “redundant” topologies. Many logics of bolts in enrichment topology are similar, however, due to the architecture of OpenSoc, each topic has to maintain its own line of code. If any change is needed for a certain shared logic for topics, it has to be modified across multiple lines of code, which increase the cost and complexity of maintenance.
  • Lack of testing. There is no unit and integration testing within the OpenSoc. The quality of code has not been fully validated.

3317-opensoc-architecture.png

Figure 1

3319-metron-architecture.png

Figure 2.

With the new Metron architecture, as showed in Figure 2, the intent of Metron is to achieve better extensibility, better maintainability, and better performance.

  • Better Extensibility
    • Metron has the ability to add new data source parsers without writing code, this is achieved by using Grok Framework Parser. With the Grok Framework Parser, Metron can add pattern file that abstracting the new data sources, and thus easily be extended to accept new data sources.
    • By introducing additional layer of normalizing topology, as showed in Figure 2, Metron abstract the common feature of ingesting data source, and thus remove the necessities to implement individual topology for each data source.
  • Better Maintainability
    • Metron converts all Storm topologies to use Flux configuration (declarative way to wire topologies together).
    • Metron Introduces unit and integration testing frameworks will reduce the cost of maintenance, improve the quality of code as well as errors in runtime.
    • By moving most configuration information into Zookeeper, Metron remove the necessities to stop the topologies in most cases. In the case of changing configuration of topologies, there is no need to restart the topology which will not impact the production environment.
  • Better Performance
    • As showed in figure 2, by introducing splitter joiner pattern of Storm, Metron is able to process multiple enrichment bolt and intel threat cross reference bolt in parallel, which is expected to improve the performance.
    • By introducing Cache mechanism, Metron is able to maintain the most used cross reference information of enrichment and intel threat in memory, which in turn to reduce to cost to access data from HBase and MySQL.
  • The new architecture that Metron adopted will significantly reduce the cost for adding new sources, while maintaining high performance that OpenSoc has. With the support of open source community, Metron is evolving quickly and is adding more features in fast pace.
    2,824 Views
    Don't have an account?
    Coming from Hortonworks? Activate your account here
    Version history
    Revision #:
    2 of 2
    Last update:
    ‎08-17-2019 12:52 PM
    Updated by:
    Expert Contributor ylu
     
    Contributors
    Top Kudoed Authors