Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

Geo Distance calculations in Hive and Java

March 23, 2016

Eran Orgad

Geographical distance is the distance measured along the surface of the earth.

The distances between points are defined by geographical coordinates (Geo Location cords) in terms of latitude and longitude.

Each pair represent a location as:

Lexington MA:

Latitude : 42.4428, Longitude: -71.2317

Mountain View CA:

Latitude: 37.405990600586, Longitude: -122.07851409912

Distance: 2676.09225228497 Miles

To be able to calculate the distance on Hortonworks HDP using Hive (and Tez) let’s generate some sample data that includes some pairs of Geolocation coordinates.

Data file called: distance1.csv

42.28,-71.87,42.28,-71.86

42.00,-71.87,42.28,-71.11

42.28,-71.87,42.28,-71.86

42.28,-71.87,42.28,-71.22

42.00,-71.87,42.28,-72.33

42.28,-71.87,42.28,-70.44

42.00,-71.87,42.28,-71.55

42.28,-71.87,41.28,-71.66

42.00,-71.87,43.28,-71.77

42.28,-71.87,44.28,-71.88

42.00,-71.87,45.28,-71.99

42.28,-71.87,46.11,-71.00

42.00,-71.87,47.22,-71.00

42.4428,-71.2317,37.405990600586,-122.07851409912

….

We can create a schema to be able to read the content in Hive

Here is the table I created:

hive> show create table distancecalc;

OK

CREATE EXTERNAL TABLE distancecalc(

src_lat double,

src_long double,

dest_lat double,

dest_long double)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://sandbox.hortonworks.com:8020/tmp/dist2'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='false',

'numFiles'='1',

'numRows'='-1',

'rawDataSize'='-1',

'totalSize'='338',

'transient_lastDdlTime'='1458866694')

After placing my distance1.csv in the /tmp/dist2/ directory I can query the content in hive:

The following query will produce a calculation of distance with every given pair of geo location coordinates:

select src_lat, src_long, dest_lat, dest_long,

60*1.1515*(180*(acos(((sin(radians(src_lat))*sin(radians(dest_lat))) + (cos(radians(src_lat))*cos(radians(dest_lat))*cos(radians(src_long-dest_long))))))/PI()) as distancecalc

from distancecalc;

screen-shot-2016-03-25-at-122643-pm.png

3753-screenshothive.png

Last entry is the distance between Lexington MA and Mountain View CA.

Java code to calculate the distance:

/*

* Class to calculate geo-location distance

* eorgad - Hortonworks.com 9/30/2014

*/

package geodistance;

import java.sql.SQLException;

public class calcGeoDistance {

public static double distance(double lat1, double lon1, double lat2, double lon2, String string) {

double theta = lon1 - lon2;

double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));

dist = Math.acos(dist);

dist = rad2deg(dist);

dist = dist * 60 * 1.1515;

if (string == "K") {

dist = dist * 1.609344;

} else if (string == "N") {

dist = dist * 0.8684;

}

return (dist);

}

/*###################################################################*/

/*# This function converts decimal degrees to radians #*/

/*###################################################################*/

privatestaticdouble deg2rad(double deg) {

return (deg * Math.PI / 180.0);

}

/*###################################################################*/

/*# This function converts radians to decimal degrees #*/

/*###################################################################*/

privatestaticdouble rad2deg(double rad) {

return (rad * 180 / Math.PI);

}

publicstaticvoid main(String[] args) throws SQLException {

System.out.println(distance(42.4428, -71.2317, 37.405990600586, -122.07851409912, "M") + " Miles\n");

// lexington to mountain view

// 42.4428 -71.2317 37.405990600586 -122.07851409912

}

}

11,079 Views
Comments
avatar

@eorgadn You should wrap the geoDistance functions as hive UDF’s it will be a lot friendlier for most people that will want to use it in hive.