Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Expert Contributor

Ozone is an Object store for Hadoop. It is a redundant, distributed object store built by leveraging primitives present in HDFS. Below are some key features of ozone:

  1. A Hadoop compatible file system called Ozone File system that allows programs like Hive or Spark to run against Ozone without any modifications.
  2. Ozone supports RPC and REST API for accessing the store.
  3. Built to support billions of keys in distributed environment.
  4. Ozone can run concurrently with HDFS.

Like many other object stores, Ozone has a notion of volume. Only Administrators can create Volumes. Users create buckets in the volumes. To store data inside a bucket, users create keys.

An ozone file system allows other Hadoop ecosystem applications like Hive and Spark to use ozone. Once a bucket is created, it is trivial to create an ozone file system.

A 10-thousand foot view of Ozone

  1. OzoneManager (Om) acts as namespace manager. All ozone entities like volumes, buckets and keys are managed by Om. Om talks to an independent block manager (Storage Container Manager, SCM) to get blocks and passes it on to the Ozone client.
  2. SCM: Storage Container Manager is the block and cluster manager for Ozone.
  3. Block: Blocks are similar to blocks in HDFS. They are replicated blocks of data.

These components map very closely to the existing HDFS NameNode and DataNodes. The most significant difference is the presence of a block manager, SCM.

Using Ozone

The easiest way to run ozone is to try it out using the docker. To build Ozone from source, please checkout the hadoop sources from github. Then checkout the ozone branch, HDFS-7240 and build it.

git checkout HDFS-7240

You can build ozone by running the following build command.

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Pdist -Phdsl -Dtar -DskipShade

skipShade is just to make compilation faster and not really required.

Running Ozone via Docker

This assumes that you have a running docker setup on the machine. Please run following commands to see ozone in action.

  • Go to the directory where the docker compose files exist.
cd hadoop-dist/target/compose/ozone
  • Start ozone.
docker-compose up -d
  • Log into the datanode container
docker exec -it ozone_datanode_1  bash
  • Run the ozone load generator
./bin/oz freon

Take a look at OzoneManager UI, to see all the requests made by Freon http://localhost:9874/

Congratulations! on your first ozone deployment. In the next part of this tutorial we will cover oz command shell and look at how to use ozone to store files.

7,389 Views
Comments
avatar

@Ajay - Thank you for this article! Can you please re-label this to Apache Hadoop HDFS Ozone, rather than Apache Ozone? The latter is not the proper use of Apache branding. Thanks. Tom

avatar
Contributor

@Ajay, I checked out the HDFS-7240 branch and ran the build command (on Mac OS X). That seemed to work and downloaded a bunch of files, but then failed saying "hdsl" does not exist:

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 12.615 s

[INFO] Finished at: 2018-08-30T07:09:47-04:00

[INFO] Final Memory: 127M/1258M

[INFO] ------------------------------------------------------------------------

[WARNING] The requested profile "hdsl" could not be activated because it does not exist.

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.2.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version -> [Help 1]

[ERROR] 

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR] 

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR] 

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :hadoop-common
<br>
avatar

@David Hoyle

The code structure has changed since this article was written.

1) checkout trunk

2) brew install protobuf250 (protobuf is needed to build hadoop)

3) Build using : mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true

edit: updated the proto version

avatar
Contributor

@Sandeep Nemuri helped me work through this. Here are updated steps:

Clone Hadoop. From the trunk branch run the following command to install protoc 2.5:

brew install protobuf250

Run the following command to create symlinks for protoc 2.5:

brew link --overwrite --force protobuf250

You can use the following command to verify that protoc 2.5 has been installed:

protoc --version

Use the following command to build ozone:

mvn clean package -Phdds -Pdist -Dtar -DskipShade -DskipTests -Dmaven.javadoc.skip=true

Go to the directory that contains the Docker compose files:

cd <path_to_local_github>/hadoop/hadoop-dist/target/ozone-0.2.1-SNAPSHOT/compose/ozone

Start ozone:

docker-compose up -d

Log in to the DataNode container:

docker exec-it ozone_datanode_1  bash

Run the ozone load generator:

bin/ozone freon -validateWrites -numOfVolumes 5 -numOfBuckets 10 -numOfKeys 10

Now you should be able to see the OzoneManager UI at http://localhost:9874/

avatar
Expert Contributor

Thanks for the update. Glad you were able to make it work. Thanks for the comments and sharing it with the community.