Community Articles
Find and share helpful community-sourced technical articles

## Geo Distance calculations in Hive and Java

Labels (2)

• ### Hortonworks Data Platform (HDP)

Cloudera Employee

# Geo Distance calculations in Hive and Java

March 23, 2016

Geographical distance is the distance measured along the surface of the earth.

The distances between points are defined by geographical coordinates (Geo Location cords) in terms of latitude and longitude.

Each pair represent a location as:

Lexington MA:

Latitude : 42.4428, Longitude: -71.2317

Mountain View CA:

Latitude: 37.405990600586, Longitude: -122.07851409912

Distance: 2676.09225228497 Miles

To be able to calculate the distance on Hortonworks HDP using Hive (and Tez) let’s generate some sample data that includes some pairs of Geolocation coordinates.

Data file called: distance1.csv

42.28,-71.87,42.28,-71.86

42.00,-71.87,42.28,-71.11

42.28,-71.87,42.28,-71.86

42.28,-71.87,42.28,-71.22

42.00,-71.87,42.28,-72.33

42.28,-71.87,42.28,-70.44

42.00,-71.87,42.28,-71.55

42.28,-71.87,41.28,-71.66

42.00,-71.87,43.28,-71.77

42.28,-71.87,44.28,-71.88

42.00,-71.87,45.28,-71.99

42.28,-71.87,46.11,-71.00

42.00,-71.87,47.22,-71.00

42.4428,-71.2317,37.405990600586,-122.07851409912

….

We can create a schema to be able to read the content in Hive

Here is the table I created:

hive> show create table distancecalc;

OK

CREATE EXTERNAL TABLE distancecalc(

src_lat double,

src_long double,

dest_lat double,

dest_long double)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS INPUTFORMAT

OUTPUTFORMAT

LOCATION

'hdfs://sandbox.hortonworks.com:8020/tmp/dist2'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='false',

'numFiles'='1',

'numRows'='-1',

'rawDataSize'='-1',

'totalSize'='338',

'transient_lastDdlTime'='1458866694')

After placing my distance1.csv in the /tmp/dist2/ directory I can query the content in hive:

The following query will produce a calculation of distance with every given pair of geo location coordinates:

select src_lat, src_long, dest_lat, dest_long,

from distancecalc;

Last entry is the distance between Lexington MA and Mountain View CA.

Java code to calculate the distance:

/*

* Class to calculate geo-location distance

*/

package geodistance;

import java.sql.SQLException;

public class calcGeoDistance {

public static double distance(double lat1, double lon1, double lat2, double lon2, String string) {

double theta = lon1 - lon2;

dist = Math.acos(dist);

dist = dist * 60 * 1.1515;

if (string == "K") {

dist = dist * 1.609344;

} else if (string == "N") {

dist = dist * 0.8684;

}

return (dist);

}

/*###################################################################*/

/*# This function converts decimal degrees to radians #*/

/*###################################################################*/

return (deg * Math.PI / 180.0);

}

/*###################################################################*/

/*# This function converts radians to decimal degrees #*/

/*###################################################################*/

return (rad * 180 / Math.PI);

}

publicstaticvoid main(String[] args) throws SQLException {

System.out.println(distance(42.4428, -71.2317, 37.405990600586, -122.07851409912, "M") + " Miles\n");

// lexington to mountain view

// 42.4428 -71.2317 37.405990600586 -122.07851409912

}

}

8,905 Views

## Re: Geo Distance calculations in Hive and Java

Guru

@eorgadn You should wrap the geoDistance functions as hive UDF’s it will be a lot friendlier for most people that will want to use it in hive.

Don't have an account?
Announcements
What's New @ Cloudera
What's New @ Cloudera
Product Announcements
What's New @ Cloudera
What's New @ Cloudera
Version history
Last update:
‎08-17-2019 12:57 PM
Updated by:
Contributors
Top Kudoed Authors