Created on 09-08-202103:50 AM - edited on 09-08-202103:59 AM by subratadas
In this tutorial, we will learn how to create Apache Ozone volumes, buckets, and keys. After that, we will see how to create an Apache Hive table using Apache Ozone, and finally how we can insert/read the data from Apache Spark.
Ozone
Create the volume with the namevol1.
# ozone sh volume create /vol1
21/08/25 06:23:27 INFO rpc.RpcClient: Creating Volume: vol1, with root as owner.
Create the bucket with the namebucket1 under vol1.
# ozone sh bucket create /vol1/bucket1
21/08/25 06:24:09 INFO rpc.RpcClient: Creating Bucket: vol1/bucket1, with Versioning false and Storage Type set to DISK and Encryption set to false
Hive
Launch the beeline shell.
Create the employee table in Hive.
Note:Update the om.host.example.com value.
CREATE DATABASE IF NOT EXISTS ozone_db;
USE ozone_db;
CREATE EXTERNAL TABLE IF NOT EXISTS `employee`(
`id` bigint,
`name` string,
`age` smallint)
STORED AS parquet
LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee';
Spark
Spark2:
Launch spark-shell
spark-shell
Run the following query to insert/read the data from the Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
Run the following query to insert/read the data from the Hive employee table.
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
Notes:
If you get the java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.ozone.OzoneFileSystem not foundthen add the /opt/cloudera/parcels/CDH/jars/hadoop-ozone-filesystem-hadoop3-*.jartospark class pathusing--jarsoption.
In a Kerberized environment, mandatorily, we need to specify the spark.yarn.access.hadoopFileSystems configuration, otherwise, it will display the following error.