Support Questions

Find answers, ask questions, and share your expertise

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- :
- Support
- :
- Support Questions
- :
- Precision of DoubleVal calculations in udf

Announcements

Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
I am working with lan/lon coordinates and intersection of lines on sphere. My task is to find gps routes that intersect some small triangle areas (80m-80m). For simplicity I've implemented function for getting intersection point of lon/lat coordinates of two lines on euclidian space. I then tested it in pure c++ (inputs in double) and in impala (changing double to DoubleVal). What I've found is that same inputs in impala and c++ give different results. After several tests I've figured that problem should be with precision point (very small diff in lan and lon). This cause false positive intersections of gps routes and areas. Has anyone encountered same problems? If changes to decimal will solve my problem? Why doest the difference occure at the first place?

New Contributor

Created 09-30-2016 08:13 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

1 ACCEPTED SOLUTION

Accepted Solutions

New Contributor

Created on 10-13-2016 08:04 AM - edited 10-13-2016 08:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks, Tim,

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats.

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

3 REPLIES 3

Re: Precision of DoubleVal calculations in udf

Master Collaborator

Created 10-03-2016 02:25 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Some examples of the calculations and numbers would be helpful.

We use a C++ double as the underlying type, so have the same precision. There are a lot of subtleties with floating point numbers where calculations that are mathematically equivalent with real numbers can give different results with floating point numbers. E.g floating point arithmetic is not associative, so it's not guaranteed that a + b + c == a + c + b.

On x86 there's also some additional weirdness where intermediate results of calculations are represented with 80-bits if they're kept in floating-point registers but reduced in precision to 64-bits if they're written to memory: https://en.wikipedia.org/wiki/Extended_precision. At the C++ or SQL levels you have very little control over which precision is used.

Fixed-precision decimal will give you more predictable results if your application isn't tolerant to rounding errors.

New Contributor

Created on 10-13-2016 08:04 AM - edited 10-13-2016 08:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks, Tim,

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats.

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

Re: Precision of DoubleVal calculations in udf

Master Collaborator

Created 10-13-2016 03:20 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Good to hear! Please feel free to mark it as solved to make it easier for others to find.

Coming from Hortonworks? Activate your account here