Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Precision of DoubleVal calculations in udf

Solved Go to solution

Precision of DoubleVal calculations in udf

New Contributor

I am working with lan/lon coordinates and intersection of lines on sphere. My task is to find gps routes that intersect some small triangle areas (80m-80m). For simplicity I've implemented function for getting intersection point of lon/lat coordinates  of two lines on euclidian space. I then tested it in pure c++ (inputs in double) and in impala (changing double to DoubleVal). What I've found is that same inputs in impala and c++ give different results. After several tests I've figured that problem should be with precision point (very small diff in lan and lon). This cause false positive intersections of gps routes and areas. Has anyone encountered same problems? If changes to decimal will solve my problem? Why doest the difference occure at the first place?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Precision of DoubleVal calculations in udf

New Contributor

Thanks, Tim,

 

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats. 

 

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

 

 

3 REPLIES 3
Highlighted

Re: Precision of DoubleVal calculations in udf

Master Collaborator

Some examples of the calculations and numbers would be helpful.

 

We use a C++ double as the underlying type, so have the same precision. There are a lot of subtleties with floating point numbers where calculations that are mathematically equivalent with real numbers can give different results with floating point numbers. E.g floating point arithmetic is not associative, so it's not guaranteed that a + b + c == a + c + b.

 

On x86 there's also some additional weirdness where intermediate results of calculations are represented with 80-bits if they're kept in floating-point registers but reduced in precision to 64-bits if they're written to memory: https://en.wikipedia.org/wiki/Extended_precision. At the C++ or SQL levels you have very little control over which precision is used.

 

Fixed-precision decimal will give you more predictable results if your application isn't tolerant to rounding errors.

Re: Precision of DoubleVal calculations in udf

New Contributor

Thanks, Tim,

 

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats. 

 

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

 

 

Re: Precision of DoubleVal calculations in udf

Master Collaborator

Good to hear! Please feel free to mark it as solved to make it easier for others to find.