Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Master Collaborator
Created on
09-08-2021
03:50 AM
- edited on
09-08-2021
03:59 AM
by
subratadas
In this tutorial, we will learn how to create Apache Ozone volumes, buckets, and keys. After that, we will see how to create an Apache Hive table using Apache Ozone, and finally how we can insert/read the data from Apache Spark.
Ozone
- Create the volume with the name vol1.
# ozone sh volume create /vol1 21/08/25 06:23:27 INFO rpc.RpcClient: Creating Volume: vol1, with root as owner.
- Create the bucket with the name bucket1 under vol1.
# ozone sh bucket create /vol1/bucket1 21/08/25 06:24:09 INFO rpc.RpcClient: Creating Bucket: vol1/bucket1, with Versioning false and Storage Type set to DISK and Encryption set to false
Hive
- Launch the beeline shell.
- Create the employee table in Hive.
Note: Update the om.host.example.com value.
CREATE DATABASE IF NOT EXISTS ozone_db;
USE ozone_db;
CREATE EXTERNAL TABLE IF NOT EXISTS `employee`(
`id` bigint,
`name` string,
`age` smallint)
STORED AS parquet
LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee';
Spark
Spark2:
- Launch spark-shell
spark-shell
- Run the following query to insert/read the data from the Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show()
Spark3:
- Launch spark3-shell
spark3-shell --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
- Run the following query to insert/read the data from the Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show()
Kerberized environment
Pre-requisites:
- Create a user and provide proper Ranger permissions to create Ozone volume and buckets, etc.
- kinit with the user.
Spark2:
- Launch spark-shell
Note: Before launching spark-shell update the om.host.example.com value.spark-shell \ --conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862
- Run the following query to insert/read the data from Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show()
Spark3:
- Launch spark3-shell
Note: Before launching spark-shell update the om.host.example.com value.spark3-shell \ --conf spark.kerberos.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 \ --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
- Run the following query to insert/read the data from the Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""") spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""") spark.sql("SELECT * FROM ozone_db.employee").show()
Notes:
- If you get the java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.ozone.OzoneFileSystem not foundthen add the /opt/cloudera/parcels/CDH/jars/hadoop-ozone-filesystem-hadoop3-*.jar to spark class path using --jars option.
- In a Kerberized environment, mandatorily, we need to specify the spark.yarn.access.hadoopFileSystems configuration, otherwise, it will display the following error.
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
Thanks for reading this article. If you liked this article, you can give kudos.