- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-16-2016 09:59 PM
Introduction
This article is a continuation of Geo-spatial Queries with Hive Using ESRI Geometry Libraries article published a few months ago.
Objective
Demonstrate how to use Hive context and invoke built-in ESRI UDFs for Hive from Spark SQL.
Pre-requisites
- HDP 2.4.2
- Steps documented on Geo-spatial Queries with Hive Using ESRI Geometry Libraries
Steps
1. Launch spark-shell with--jars
as its parameter:
spark-shell --jars /home/spark/esri/esri-geometry-api.jar,/home/spark/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar
I placed the dependency jars to /home/spark/esri path, but you can store them in hdfs or local filesystem and grant proper privileges to your spark user.
2. Instantiate sqlContext:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
3. From spark-shell, define temporary functions:
sqlContext.sql("""create temporary function st_point as 'com.esri.hadoop.hive.ST_Point'"""); sqlContext.sql("""create temporary function st_x as 'com.esri.hadoop.hive.ST_X'""");
4. From spark-shell, invoke your UDF:
sqlContext.sql("""from geospatial.demo_shape_point select st_x(st_point(shape))""").show;
Note: geospatial is the Hive database where demo_shape_point table was created
Conclusion
The Esri Geometry API for Java and the Spatial Framework for Hadoop could be used by developers building geometry functions for various geo-spatial applications using also Spark, not only Hive.