Created on 09-08-2021 03:50 AM - edited on 09-08-2021 03:59 AM by subratadas
In this tutorial, we will learn how to create Apache Ozone volumes, buckets, and keys. After that, we will see how to create an Apache Hive table using Apache Ozone, and finally how we can insert/read the data from Apache Spark.
# ozone sh volume create /vol1
21/08/25 06:23:27 INFO rpc.RpcClient: Creating Volume: vol1, with root as owner.
# ozone sh bucket create /vol1/bucket1
21/08/25 06:24:09 INFO rpc.RpcClient: Creating Bucket: vol1/bucket1, with Versioning false and Storage Type set to DISK and Encryption set to false
Note: Update the om.host.example.com value.
CREATE DATABASE IF NOT EXISTS ozone_db;
USE ozone_db;
CREATE EXTERNAL TABLE IF NOT EXISTS `employee`(
`id` bigint,
`name` string,
`age` smallint)
STORED AS parquet
LOCATION 'o3fs://bucket1.vol1.om.host.example.com/employee';
spark-shell
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
spark3-shell --jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
spark-shell \
--conf spark.yarn.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
spark3-shell \
--conf spark.kerberos.access.hadoopFileSystems=o3fs://bucket1.vol1.om.host.example.com:9862 \
--jars /opt/cloudera/parcels/CDH/lib/hadoop-ozone/hadoop-ozone-filesystem-hadoop3-*.jar
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (1, "Ranga", 33)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (2, "Nishanth", 3)""")
spark.sql("""INSERT INTO TABLE ozone_db.employee VALUES (3, "Raja", 59)""")
spark.sql("SELECT * FROM ozone_db.employee").show()
Notes:
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
Thanks for reading this article. If you liked this article, you can give kudos.