Ozone is an Object store for Hadoop. It is a redundant, distributed object store built by leveraging primitives present in HDFS. Below are some key features of ozone:
A Hadoop compatible file system called Ozone File system that allows programs like Hive or Spark to run against Ozone without any modifications.
Ozone supports RPC and REST API for accessing the store.
Built to support billions of keys in distributed environment.
Ozone can run concurrently with HDFS.
Like many other object stores, Ozone has a notion of volume. Only Administrators can create Volumes. Users create buckets in the volumes. To store data inside a bucket, users create keys.
An ozone file system allows other Hadoop ecosystem applications like Hive and Spark to use ozone. Once a bucket is created, it is trivial to create an ozone file system.
A 10-thousand foot view of Ozone
OzoneManager (Om) acts as namespace manager. All ozone entities like volumes, buckets and keys are managed by Om. Om talks to an independent block manager (Storage Container Manager, SCM) to get blocks and passes it on to the Ozone client.
SCM: Storage Container Manager is the block and cluster manager for Ozone.
Block: Blocks are similar to blocks in HDFS. They are replicated blocks of data.
These components map very closely to the existing HDFS NameNode and DataNodes. The most significant difference is the presence of a block manager, SCM.
The easiest way to run ozone is to try it out using the docker. To build Ozone from source, please checkout the hadoop sources from github. Then checkout the ozone branch, HDFS-7240 and build it.
git checkout HDFS-7240
You can build ozone by running the following build command.