Created on 07-29-2016 09:19 AM
This article is a complement to Geo-spatial Queries with Hive using ESRI Geometry and Spatial Framework for Hadoop and includes a few more findings, mainly documenting the differences between ST_Geometry functions supported in Hive and those in commercial spatial packages for Oracle, SQL Server or Netezza.
The Hive UDF's are modeled after existing implementations of ST_Geometry. Some functions exist only in Hive’s implementation, a few behave different or don’t exist.
Overloaded constructors - These overloaded constructors differ from other ST_Geometry implementations in how the caller can specify the spatial-reference ID. Default SRID is plane, when the SRID is not specified. Hive does not accept SRID in second argument - wrap with ST_SetSRID or use ST_GeomFromText. Applies to ST_Point, ST_LineString, ST_Polygon, ST_MultiPoint, ST_MultiLineString, ST_MultiPolygon.
ST_PointN - Return type varies in the case of index out of range - Hive: null;
ST_AsText - The OGC WKT standard dictates that a MultiPoint is represented as MULTIPOINT ((1 2),(3 4)); however some existing WKT parsers accept only MULTIPOINT (1 2, 3 4). ST_AsText outputs the former, compliant format, with the nested parentheses.
ST_Envelope - In the case of a point or a vertical or horizontal line, ST_Envelope may either apply a tolerance or return an empty envelope.
ST_Intersection - In the case where the two geometries intersect in a lower dimension, ST_Intersection may drop the lower-dimension intersections, or output a closed linestring.
ST_Intersection(ST_Polygon(2,0,3,1, 2,1), ST_Polygon(1,1, 4,1, 4,4, 1,4))) -- MULTIPOLYGON EMPTY or LINESTRING(2 1, 3 1, 2 1)
ST_Union - may drop lower-dimension members of the union
ST_Union(ST_LineString(2,3,4,5), ST_Point(1,1)) -- MULTILINESTRING ((2 3, 4 5))
ST_SymmetricDiff - Hive-spatial follows the naming in the Esri implemention of ST_Geometry. For the OGC naming, use an alias:
create temporary function ST_SymDifference as 'com.esri.hadoop.hive.ST_SymmetricDiff';
ST_GeomCollection, ST_NumGeometries, ST_GeometryN - collection of varying geometry types is not supported; hive supports arrays
ST_Geometry - no constructor of this name - use one of the other constructors, e.g. ST_GeomFromText
ST_Curve, ST_Surface, ST_MultiCurve, ST_MultiSurface - Curve and Surface constructors not supported
ST_PointOnSurface - ST_PointOnSurface is not supported on Hive
ST_GeoSize - ST_GeoSize is not supported on Hive
ST_Transform - ST_Transform is not supported on Hive
ST_Equalsrs - ST_Equalsrs is not supported on Hive
Tips
create function ST_AsBinary as'com.esri.hadoop.hive.ST_AsBinary'
create function ST_Point as ‘com.esri.hadoop.hive.ST_Point‘ using jar ‘hdfs://YourHDFSClientNode:8020/esri/spatial-sdk-hive-1.1.1-SNAPSHOT.jar’;
Created on 07-29-2016 09:21 AM
Great article, thank you!