Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Cloudera Employee

Geo Distance calculations in Hive and Java

March 23, 2016

Eran Orgad

Geographical distance is the distance measured along the surface of the earth.

The distances between points are defined by geographical coordinates (Geo Location cords) in terms of latitude and longitude.

Each pair represent a location as:

Lexington MA:

Latitude : 42.4428, Longitude: -71.2317

Mountain View CA:

Latitude: 37.405990600586, Longitude: -122.07851409912

Distance: 2676.09225228497 Miles

To be able to calculate the distance on Hortonworks HDP using Hive (and Tez) let’s generate some sample data that includes some pairs of Geolocation coordinates.

Data file called: distance1.csv

42.28,-71.87,42.28,-71.86

42.00,-71.87,42.28,-71.11

42.28,-71.87,42.28,-71.86

42.28,-71.87,42.28,-71.22

42.00,-71.87,42.28,-72.33

42.28,-71.87,42.28,-70.44

42.00,-71.87,42.28,-71.55

42.28,-71.87,41.28,-71.66

42.00,-71.87,43.28,-71.77

42.28,-71.87,44.28,-71.88

42.00,-71.87,45.28,-71.99

42.28,-71.87,46.11,-71.00

42.00,-71.87,47.22,-71.00

42.4428,-71.2317,37.405990600586,-122.07851409912

….

We can create a schema to be able to read the content in Hive

Here is the table I created:

hive> show create table distancecalc;

OK

CREATE EXTERNAL TABLE distancecalc(

src_lat double,

src_long double,

dest_lat double,

dest_long double)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://sandbox.hortonworks.com:8020/tmp/dist2'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='false',

'numFiles'='1',

'numRows'='-1',

'rawDataSize'='-1',

'totalSize'='338',

'transient_lastDdlTime'='1458866694')

After placing my distance1.csv in the /tmp/dist2/ directory I can query the content in hive:

The following query will produce a calculation of distance with every given pair of geo location coordinates:

select src_lat, src_long, dest_lat, dest_long,

60*1.1515*(180*(acos(((sin(radians(src_lat))*sin(radians(dest_lat))) + (cos(radians(src_lat))*cos(radians(dest_lat))*cos(radians(src_long-dest_long))))))/PI()) as distancecalc

from distancecalc;

screen-shot-2016-03-25-at-122643-pm.png

3753-screenshothive.png

Last entry is the distance between Lexington MA and Mountain View CA.

Java code to calculate the distance:

/*

* Class to calculate geo-location distance

* eorgad - Hortonworks.com 9/30/2014

*/

package geodistance;

import java.sql.SQLException;

public class calcGeoDistance {

public static double distance(double lat1, double lon1, double lat2, double lon2, String string) {

double theta = lon1 - lon2;

double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));

dist = Math.acos(dist);

dist = rad2deg(dist);

dist = dist * 60 * 1.1515;

if (string == "K") {

dist = dist * 1.609344;

} else if (string == "N") {

dist = dist * 0.8684;

}

return (dist);

}

/*###################################################################*/

/*# This function converts decimal degrees to radians #*/

/*###################################################################*/

privatestaticdouble deg2rad(double deg) {

return (deg * Math.PI / 180.0);

}

/*###################################################################*/

/*# This function converts radians to decimal degrees #*/

/*###################################################################*/

privatestaticdouble rad2deg(double rad) {

return (rad * 180 / Math.PI);

}

publicstaticvoid main(String[] args) throws SQLException {

System.out.println(distance(42.4428, -71.2317, 37.405990600586, -122.07851409912, "M") + " Miles\n");

// lexington to mountain view

// 42.4428 -71.2317 37.405990600586 -122.07851409912

}

}

6,715 Views
Comments

@eorgadn You should wrap the geoDistance functions as hive UDF’s it will be a lot friendlier for most people that will want to use it in hive.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 12:57 PM
Updated by:
 
Contributors
Top Kudoed Authors