This article is a continuation of Geo-spatial Queries with Hive Using ESRI Geometry Libraries article published a few months ago.


Demonstrate how to use Hive context and invoke built-in ESRI UDFs for Hive from Spark SQL.



1. Launch spark-shell with--jarsas its parameter:

spark-shell --jars /home/spark/esri/esri-geometry-api.jar,/home/spark/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar

I placed the dependency jars to /home/spark/esri path, but you can store them in hdfs or local filesystem and grant proper privileges to your spark user.

2. Instantiate sqlContext:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);

3. From spark-shell, define temporary functions:

sqlContext.sql("""create temporary function st_point as 'com.esri.hadoop.hive.ST_Point'""");

sqlContext.sql("""create temporary function st_x as 'com.esri.hadoop.hive.ST_X'""");

4. From spark-shell, invoke your UDF:

sqlContext.sql("""from geospatial.demo_shape_point select st_x(st_point(shape))""").show;

Note: geospatial is the Hive database where demo_shape_point table was created


The Esri Geometry API for Java and the Spatial Framework for Hadoop could be used by developers building geometry functions for various geo-spatial applications using also Spark, not only Hive.

