- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 07-29-2016 09:19 AM
Introduction
This article is a complement to Geo-spatial Queries with Hive using ESRI Geometry and Spatial Framework for Hadoop and includes a few more findings, mainly documenting the differences between ST_Geometry functions supported in Hive and those in commercial spatial packages for Oracle, SQL Server or Netezza.
The Hive UDF's are modeled after existing implementations of ST_Geometry. Some functions exist only in Hive’s implementation, a few behave different or don’t exist.
Additional ST_Geometry Functions in Hive
- ST_GeomFromJSON - Create a geometry from Esri JSON
- ST_AsJSON - Return Esri JSON representation of geometry
- ST_GeomFromGeoJSON - Create a geometry from GeoJSON
- ST_AsGeoJSON - Return GeoJSON representation of geometry (requires geometry-api-java version 1.1)
- ST_PointZ - constructor for 3D points from direct position
- ST_SetSRID - set the spatial-reference ID, for a geometry constructed without it
- ST_GeodesicLengthWGS84 - geodesic length in meters rather than in angles
- ST_Bin, ST_BinEnvelope - aggregate into regular grid, tutorial here
ST_Geometry Functions with Different Behavior
Overloaded constructors - These overloaded constructors differ from other ST_Geometry implementations in how the caller can specify the spatial-reference ID. Default SRID is plane, when the SRID is not specified. Hive does not accept SRID in second argument - wrap with ST_SetSRID or use ST_GeomFromText. Applies to ST_Point, ST_LineString, ST_Polygon, ST_MultiPoint, ST_MultiLineString, ST_MultiPolygon.
ST_PointN - Return type varies in the case of index out of range - Hive: null;
ST_AsText - The OGC WKT standard dictates that a MultiPoint is represented as MULTIPOINT ((1 2),(3 4)); however some existing WKT parsers accept only MULTIPOINT (1 2, 3 4). ST_AsText outputs the former, compliant format, with the nested parentheses.
ST_Envelope - In the case of a point or a vertical or horizontal line, ST_Envelope may either apply a tolerance or return an empty envelope.
ST_Intersection - In the case where the two geometries intersect in a lower dimension, ST_Intersection may drop the lower-dimension intersections, or output a closed linestring.
ST_Intersection(ST_Polygon(2,0,3,1, 2,1), ST_Polygon(1,1, 4,1, 4,4, 1,4))) -- MULTIPOLYGON EMPTY or LINESTRING(2 1, 3 1, 2 1)
ST_Union - may drop lower-dimension members of the union
ST_Union(ST_LineString(2,3,4,5), ST_Point(1,1)) -- MULTILINESTRING ((2 3, 4 5))
ST_SymmetricDiff - Hive-spatial follows the naming in the Esri implemention of ST_Geometry. For the OGC naming, use an alias:
create temporary function ST_SymDifference as 'com.esri.hadoop.hive.ST_SymmetricDiff';
ST_Geometry Functions Not Supported in Hive
ST_GeomCollection, ST_NumGeometries, ST_GeometryN - collection of varying geometry types is not supported; hive supports arrays
ST_Geometry - no constructor of this name - use one of the other constructors, e.g. ST_GeomFromText
ST_Curve, ST_Surface, ST_MultiCurve, ST_MultiSurface - Curve and Surface constructors not supported
ST_PointOnSurface - ST_PointOnSurface is not supported on Hive
ST_GeoSize - ST_GeoSize is not supported on Hive
ST_Transform - ST_Transform is not supported on Hive
ST_Equalsrs - ST_Equalsrs is not supported on Hive
Key Resources
- Complete list of functions supported in Hive: https://github.com/Esri/spatial-framework-for-hadoop/blob/master/hive/function-ddl.sql
- UDF documentation: https://github.com/Esri/spatial-framework-for-hadoop/wiki/UDF-Documentation
Tips
- To share functions globally across sessions create them without “temporary” option. This has the advantage that you do not need to declare the functions for every session.
create function ST_AsBinary as'com.esri.hadoop.hive.ST_AsBinary'
- You can also include the jar file in the create function statement which makes it easier to create a permanent declaration. For example, for the definition of the ST_Point function you would write the following SQL statement:
create function ST_Point as ‘com.esri.hadoop.hive.ST_Point‘ using jar ‘hdfs://YourHDFSClientNode:8020/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar’;
Final Notes
- As discussed with ESRI recently, there are no plans to open source all spatial functions currently available for traditional RDBMS like Oracle, SQL Server, or Netezza, as those are commercially licensed packages. The best option to compensate for the 5-10% missing functions is to contribute to ESRI’s open source repository: https://github.com/Esri/spatial-framework-for-hadoop. ESRI does not provide a commercial library for Hive including all spatial functions.
- Thanks to Mansour Raad, Big Data Expert at ESRI for his candide and valuable input. Check Mansour's last article: Geospatial Big Data - the next big trend in analytics
Created on 07-29-2016 09:21 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Great article, thank you!