To be able to calculate the distance on Hortonworks HDP
using Hive (and Tez) let’s generate some sample data that includes some pairs
of Geolocation coordinates.
Data file called: distance1.csv
42.28,-71.87,42.28,-71.86
42.00,-71.87,42.28,-71.11
42.28,-71.87,42.28,-71.86
42.28,-71.87,42.28,-71.22
42.00,-71.87,42.28,-72.33
42.28,-71.87,42.28,-70.44
42.00,-71.87,42.28,-71.55
42.28,-71.87,41.28,-71.66
42.00,-71.87,43.28,-71.77
42.28,-71.87,44.28,-71.88
42.00,-71.87,45.28,-71.99
42.28,-71.87,46.11,-71.00
42.00,-71.87,47.22,-71.00
42.4428,-71.2317,37.405990600586,-122.07851409912
….
We can create a schema to be able to read the content in
Hive
After placing my distance1.csv in the /tmp/dist2/ directory I can query the
content in hive:
The following query will produce a calculation of distance
with every given pair of geo location coordinates:
select src_lat, src_long, dest_lat, dest_long,
60*1.1515*(180*(acos(((sin(radians(src_lat))*sin(radians(dest_lat)))
+
(cos(radians(src_lat))*cos(radians(dest_lat))*cos(radians(src_long-dest_long))))))/PI())
as distancecalc