Created on
09-21-2020
01:50 PM
- edited on
09-25-2020
01:04 AM
by
VidyaSargur
Cloudera Data Warehouse architecture leverage compute/storage separation, this is different from the standard Hadoop architecture.
Figure 1: Cloudera Modern Data Warehouse Architecture
The objective of this post is to show how to import the wide functions and code material that we have on the open-source community inside Cloudera Data Warehouse using the object storage architecture. For this, we'll use ESRI Spatial Framework as an example.
We'll use github to download the ESRI project, Java and maven to build the necessary JAR files.
Download the necessary files from ESRI Spatial Framework Github repository, this can be done using the following command:
$ git clone https://github.com/Esri/spatial-framework-for-hadoop.git
Figure 2: Cloning ESRI project
This will create a dir called "spatial-framework-for-hadoop", enter in this directory to build the project, and generate the JAR files that will be used for the functions.
To build the project using Apache Maven,
$ mvn package
Figure 3: Building ESRI project
spatial-sdk-hive-2.1.1-SNAPSHOT.jar -> Located in <path/to/githubproject>/spatial-framework-for-hadoop/hive/target
Figure 4: Upload JAR File into the object storage.
File uploaded:
Figure 5: JAR uploaded in the object storage.
Now that the JAR file is in the object storage, you need just to create the functions inside Cloudera Data Warehouse pointing to the JAR that is uploaded.
CREATE FUNCTION ST_Geometry AS 'com.esri.hadoop.hive.ST_Geometry' USING JAR 's3a://<BucketName>/warehouse/tablespace/external/jars/spatial-sdk-hive-2.1.1-SNAPSHOT.jar';
For more CREATE FUNCTION statements for ESRI you can visit my Github link.
SELECT ST_AsText(ST_Point(1, 2));
Figure 6: Functions working
In this article we saw how easy it is to import/create the vast functions ecosystem in the open-source community inside Cloudera Data Warehouse, we used specifically the ESRI Spatial functions.
For more information on how to use ESRI functions in Cloudera Data Platform you can check Geo-spatial Queries with Hive using ESRI Geometry and Spatial Framework for Hadoop or Esri/gis-tools-for-hadoop.