Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Super Collaborator

In this tutorial, we will learn how to create Apache Ozone volumes, buckets, and keys. After that, we will see how to create an Apache Hive table using Apache Ozone, and finally how we can insert/read the data from Apache Spark.

Ozone

  1. Create the volume with the name vol1.
    # ozone sh volume create /vol1
    21/08/25 06:23:27 INFO rpc.RpcClient: Creating Volume: vol1, with root as owner.
  2. Create the bucket with the name bucket1 under vol1.
    # ozone sh bucket create /vol1/bucket1
    21/08/25 06:24:09 INFO rpc.RpcClient: Creating Bucket: vol1/bucket1, with Versioning false and Storage Type set to DISK and Encryption set to false

 

 

Hive

  • Launch the beeline shell.
  • Create the employee table in Hive.

Note: Update the om.host.example.com value.

 

CREATE DATABASE IF NOT EXISTS ozone_db;
USE  ozone_db;

CREATE EXTERNAL TABLE IF NOT EXISTS `employee`(                  
   `id` bigint,                                     
   `name` string,                                   
   `age` smallint)
STORED AS parquet 
LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee';

 

Spark

Spark2:

  1. Launch spark-shell
    spark-shell
  2. Run the following query to insert/read the data from the Hive employee table.
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
    
    spark.sql("SELECT * FROM ozone_db.employee").show()

 

 

Spark3:

  1. Launch spark3-shell
    spark3-shell --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
  2. Run the following query to insert/read the data from the Hive employee table.
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
    
    spark.sql("SELECT * FROM ozone_db.employee").show()

 

 

Kerberized environment

Pre-requisites:

  1. Create a user and provide proper Ranger permissions to create Ozone volume and buckets, etc.
  2. kinit with the user.

Spark2:

  1. Launch spark-shell
    Note: Before launching spark-shell update the om.host.example.com value.
    spark-shell \
    	--conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862
  2. Run the following query to insert/read the data from Hive employee table.
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
    
    spark.sql("SELECT * FROM ozone_db.employee").show()

 

 

Spark3:

  1. Launch spark3-shell
    Note: Before launching spark-shell update the om.host.example.com value.
    spark3-shell \
    --conf spark.kerberos.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 \ 
    --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
  2. Run the following query to insert/read the data from the Hive employee table.
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
    spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
    
    spark.sql("SELECT * FROM ozone_db.employee").show()

 

 

Notes: 

  • If you get the java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.ozone.OzoneFileSystem not foundthen add the /opt/cloudera/parcels/CDH/jars/hadoop-ozone-filesystem-hadoop3-*.jar to spark class path using --jars option.
  • In a Kerberized environment, mandatorily, we need to specify the spark.yarn.access.hadoopFileSystems configuration, otherwise, it will display the following error.
    java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]​

 

 

Thanks for reading this article. If you liked this article, you can give kudos.

1,251 Views
0 Kudos