Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Master Mentor

"Druid is fast column-oriented distributed data store".

Druid is an open source data store designed for OLAP queries on event data.

Architecture

  • Historical nodes are the workhorses that handle storage and querying on "historical" data (non-realtime). Historical nodes download segments from deep storage, respond to the queries from broker nodes about these segments, and return results to the broker nodes. They announce themselves and the segments they are serving in Zookeeper, and also use Zookeeper to monitor for signals to load or drop new segments.
  • Coordinator nodes monitor the grouping of historical nodes to ensure that data is available, replicated and in a generally "optimal" configuration. They do this by reading segment metadata information from metadata storage to determine what segments should be loaded in the cluster, using Zookeeper to determine what Historical nodes exist, and creating Zookeeper entries to tell Historical nodes to load and drop new segments.
  • Broker nodes receive queries from external clients and forward those queries toRealtime and Historical nodes. When Broker nodes receive results, they merge these results and return them to the caller. For knowing topology, Broker nodes use Zookeeper to determine what Realtime and Historical nodes exist.
  • Indexing Service nodes form a cluster of workers to load batch and real-time data into the system as well as allow for alterations to the data stored in the system.
  • Realtime nodes also load real-time data into the system. They are simpler to set up than the indexing service, at the cost of several limitations for production use.

Segments are stored in deep storage. You can use S3, HDFS or local mount.

Queries are going from client to broker to Realtime or Historical nodes.

LAMBDA Architecture

Dependencies

Indexing service - Source

ZK, Storage and Metadata

  1. A running ZooKeeper cluster for cluster service discovery and maintenance of current data topology
  2. A metadata storage instance for maintenance of metadata about the data segments that should be served by the system
  3. A "deep storage" LOB store/file system to hold the stored segments

Source

Part 2 - Demo Druid and HDFS as deep storage.
4,921 Views