Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Super Guru

Introduction

This article is a continuation of Geo-spatial Queries with Hive Using ESRI Geometry Libraries article published a few months ago.

Objective

Demonstrate how to use Hive context and invoke built-in ESRI UDFs for Hive from Spark SQL.

Pre-requisites

Steps

1. Launch spark-shell with--jarsas its parameter:

spark-shell --jars /home/spark/esri/esri-geometry-api.jar,/home/spark/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar

I placed the dependency jars to /home/spark/esri path, but you can store them in hdfs or local filesystem and grant proper privileges to your spark user.

2. Instantiate sqlContext:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);

3. From spark-shell, define temporary functions:

sqlContext.sql("""create temporary function st_point as 'com.esri.hadoop.hive.ST_Point'""");

sqlContext.sql("""create temporary function st_x as 'com.esri.hadoop.hive.ST_X'""");

4. From spark-shell, invoke your UDF:

sqlContext.sql("""from geospatial.demo_shape_point select st_x(st_point(shape))""").show;

Note: geospatial is the Hive database where demo_shape_point table was created

Conclusion

The Esri Geometry API for Java and the Spatial Framework for Hadoop could be used by developers building geometry functions for various geo-spatial applications using also Spark, not only Hive.

3,632 Views