- |
- Sign Out

Announcements

See our new post "What's Changing for the Cloudera Community" for further details on the Cloudera and Hortonworks community merger planned for late July and early August.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- News
- News & Announcements
- Getting Started
- Hadoop 101
- Configuring and Managing
- Cloudera Manager
- CDH Topics (w/o CM)
- Using the Platform
- Batch (MR, YARN, Oozie)
- Data Ingest (Sqoop, Flume...
- Storage (HDFS, HBase...
- Hue
- Hive
- Impala
- Cloudera Data Science Work...
- Data Science
- Search (SolrCloud)
- Spark
- Cloudera Labs
- Data Management
- Data Discovery, Optimization
- Security/Sentry
- Building on the Platform
- Kite SDK
- Cloudera Altus
- Cloudera Altus Director
- Cloudera Altus Cloud Services Q&A
- Cloudera Altus Cloud Services Knowledge Base
- Suggestions
- Off Topic and Suggestions
- Cloudera AMA

- Cloudera Community
- :
- Using the Platform
- :
- Impala
- :
- Precision of DoubleVal calculations in udf

Topic Options

- Start Article
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
# Precision of DoubleVal calculations in udf

Options

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-30-2016 08:13 AM

I am working with lan/lon coordinates and intersection of lines on sphere. My task is to find gps routes that intersect some small triangle areas (80m-80m). For simplicity I've implemented function for getting intersection point of lon/lat coordinates of two lines on euclidian space. I then tested it in pure c++ (inputs in double) and in impala (changing double to DoubleVal). What I've found is that same inputs in impala and c++ give different results. After several tests I've figured that problem should be with precision point (very small diff in lan and lon). This cause false positive intersections of gps routes and areas. Has anyone encountered same problems? If changes to decimal will solve my problem? Why doest the difference occure at the first place?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-03-2016 02:25 PM

Some examples of the calculations and numbers would be helpful.

We use a C++ double as the underlying type, so have the same precision. There are a lot of subtleties with floating point numbers where calculations that are mathematically equivalent with real numbers can give different results with floating point numbers. E.g floating point arithmetic is not associative, so it's not guaranteed that a + b + c == a + c + b.

On x86 there's also some additional weirdness where intermediate results of calculations are represented with 80-bits if they're kept in floating-point registers but reduced in precision to 64-bits if they're written to memory: https://en.wikipedia.org/wiki/Extended_precision. At the C++ or SQL levels you have very little control over which precision is used.

Fixed-precision decimal will give you more predictable results if your application isn't tolerant to rounding errors.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2016 08:04 AM - edited 10-13-2016 08:05 AM

Thanks, Tim,

It appeares that you were right. The reason was in usage of "fmax" function during comparision in "if" statement which apparently uses floats and not doubles. It seems that by including code line with 'if(fmax(a,b)>0)' i got both 'a' and 'b' accidently converted to floats. This is something done inside impala and not in g++ compiler. So everything was correct in 'g++' and in impala I received precision error because of floats.

I decided to implicitly convert all values from 'DoubleVal' to 'long double' and to add additional full references to every function from std that I've used (like std::max()). That've solved my issues.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-13-2016 03:20 PM

Good to hear! Please feel free to mark it as solved to make it easier for others to find.

New solutions