Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Super Guru

Introduction

This article is a complement to Geo-spatial Queries with Hive using ESRI Geometry and Spatial Framework for Hadoop and includes a few more findings, mainly documenting the differences between ST_Geometry functions supported in Hive and those in commercial spatial packages for Oracle, SQL Server or Netezza.

The Hive UDF's are modeled after existing implementations of ST_Geometry. Some functions exist only in Hive’s implementation, a few behave different or don’t exist.

Additional ST_Geometry Functions in Hive

ST_Geometry Functions with Different Behavior

Overloaded constructors - These overloaded constructors differ from other ST_Geometry implementations in how the caller can specify the spatial-reference ID. Default SRID is plane, when the SRID is not specified. Hive does not accept SRID in second argument - wrap with ST_SetSRID or use ST_GeomFromText. Applies to ST_Point, ST_LineString, ST_Polygon, ST_MultiPoint, ST_MultiLineString, ST_MultiPolygon.

ST_PointN - Return type varies in the case of index out of range - Hive: null;

ST_AsText - The OGC WKT standard dictates that a MultiPoint is represented as MULTIPOINT ((1 2),(3 4)); however some existing WKT parsers accept only MULTIPOINT (1 2, 3 4). ST_AsText outputs the former, compliant format, with the nested parentheses.

ST_Envelope - In the case of a point or a vertical or horizontal line, ST_Envelope may either apply a tolerance or return an empty envelope.

ST_Intersection - In the case where the two geometries intersect in a lower dimension, ST_Intersection may drop the lower-dimension intersections, or output a closed linestring.

ST_Intersection(ST_Polygon(2,0,3,1, 2,1), ST_Polygon(1,1, 4,1, 4,4, 1,4))) -- MULTIPOLYGON EMPTY 
or 
LINESTRING(2 1, 3 1, 2 1)

ST_Union - may drop lower-dimension members of the union

ST_Union(ST_LineString(2,3,4,5), ST_Point(1,1)) -- MULTILINESTRING ((2 3, 4 5))

ST_SymmetricDiff - Hive-spatial follows the naming in the Esri implemention of ST_Geometry. For the OGC naming, use an alias:

create temporary function ST_SymDifference as 'com.esri.hadoop.hive.ST_SymmetricDiff';

ST_Geometry Functions Not Supported in Hive

ST_GeomCollection, ST_NumGeometries, ST_GeometryN - collection of varying geometry types is not supported; hive supports arrays

ST_Geometry - no constructor of this name - use one of the other constructors, e.g. ST_GeomFromText

ST_Curve, ST_Surface, ST_MultiCurve, ST_MultiSurface - Curve and Surface constructors not supported

ST_PointOnSurface - ST_PointOnSurface is not supported on Hive

ST_GeoSize - ST_GeoSize is not supported on Hive

ST_Transform - ST_Transform is not supported on Hive

ST_Equalsrs - ST_Equalsrs is not supported on Hive

Key Resources

Tips

  • To share functions globally across sessions create them without “temporary” option. This has the advantage that you do not need to declare the functions for every session.
create function ST_AsBinary as'com.esri.hadoop.hive.ST_AsBinary'
  • You can also include the jar file in the create function statement which makes it easier to create a permanent declaration. For example, for the definition of the ST_Point function you would write the following SQL statement:
create function ST_Point as ‘com.esri.hadoop.hive.ST_Point‘ using jar ‘hdfs://YourHDFSClientNode:8020/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar’;

Final Notes

  • As discussed with ESRI recently, there are no plans to open source all spatial functions currently available for traditional RDBMS like Oracle, SQL Server, or Netezza, as those are commercially licensed packages. The best option to compensate for the 5-10% missing functions is to contribute to ESRI’s open source repository: https://github.com/Esri/spatial-framework-for-hadoop. ESRI does not provide a commercial library for Hive including all spatial functions.
  • Thanks to Mansour Raad, Big Data Expert at ESRI for his candide and valuable input. Check Mansour's last article: Geospatial Big Data - the next big trend in analytics
3,394 Views
Comments
avatar

Great article, thank you!