Support Questions

sirorezka · ‎09-30-2016

I am working with lan/lon coordinates and intersection of lines on sphere. My task is to find gps routes that intersect some small triangle areas (80m-80m). For simplicity I've implemented function for getting intersection point of lon/lat coordinates of two lines on euclidian space. I then tested it in pure c++ (inputs in double) and in impala (changing double to DoubleVal). What I've found is that same inputs in impala and c++ give different results. After several tests I've figured that problem should be with precision point (very small diff in lan and lon). This cause false positive intersections of gps routes and areas. Has anyone encountered same problems? If changes to decimal will solve my problem? Why doest the difference occure at the first place?

sirorezka · ‎10-13-2016

Thanks, Tim,

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats.

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

View solution in original post

Tim Armstrong · ‎10-03-2016

Some examples of the calculations and numbers would be helpful.

We use a C++ double as the underlying type, so have the same precision. There are a lot of subtleties with floating point numbers where calculations that are mathematically equivalent with real numbers can give different results with floating point numbers. E.g floating point arithmetic is not associative, so it's not guaranteed that a + b + c == a + c + b.

On x86 there's also some additional weirdness where intermediate results of calculations are represented with 80-bits if they're kept in floating-point registers but reduced in precision to 64-bits if they're written to memory: https://en.wikipedia.org/wiki/Extended_precision. At the C++ or SQL levels you have very little control over which precision is used.

Fixed-precision decimal will give you more predictable results if your application isn't tolerant to rounding errors.

sirorezka · ‎10-13-2016

Thanks, Tim,

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats.

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

Tim Armstrong · ‎10-13-2016

Good to hear! Please feel free to mark it as solved to make it easier for others to find.

Cloudera Community

Support Questions

Precision of DoubleVal calculations in udf

Spark error - Decimal precision exceeds max precis...

Geo Distance calculations in Hive and Java

Creating a Hive UDF in Java

Creating custom udf and adding udf jar to Hive LLA...

Using Hive UDF/UDAF/UDTF with SparkSQL

How to create a custom UDF for Hive using Python

Hadoop LocalFileSystem Checksum calculation

How to create a Hive UDF in Scala

Hive UDFs vs Spatial SQL

trying to get the most basic python UDFs working