Reply
Explorer
Posts: 8
Registered: ‎09-28-2016
Accepted Solution

Impala round function does not return expected result

Hi,

 

I'm using impala-2.7.0-cdh5.10.0, and I find execute round() in impala-shell may give unexpected result, while hive can give correct result.

 

e.g.

 

impala> select round(269586/334026 * 100, 2);
80.70999999999999

hive > select round(269586/334026 * 100, 2);
80.71

Is this a known issue?

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Impala round function does not return expected result

Looks like a client display issue. Both Impala and Hive return a DOUBLE for round(). A DOUBLE cannot precisely represent 80.71 so I'm guessing that technically the results are the same, it's just that the Hive client "pretty prints" the results in a different way than Impala's.

 

If you require precise rouding results it's recommended you use decimal:

 

select round(cast(269586 / 334026 * 100 as decimal(17, 15)), 2);

80.71

New Contributor
Posts: 2
Registered: ‎09-21-2018

Re: Impala round function does not return expected result

Hi Alex,

 

it works using the solution you provided, however if i use variable/column to indicate the number of digits after decimal point then it throws an error, which is :

 

"AnalysisException: round() must be called with a constant second argument."

 

e.g Below query works perfectly fine

 

select round(cast(test1 as decimal(17,15)),2) as test3
from (

select round(269586/334026 * 100,2) as test1,
1 as test2
)A

 

 

However this query does not work:

 

select round(cast(test1 as decimal(17,15)),test2) as test3
from (

select round(269586/334026 * 100,2) as test1,
1 as test2
)A

 

the problem lies with the decimal typecast, if i remove it then issue will disappaear however the original issue will appear.

 

Please help, i have been struggling for quite some time with this issue. for my requirement i need to use column name to indicate numbers of digits to round, because the same field(column) may have numbers rounded off to various decimal digits.

 

Thanks in advance.

 

 

Regards

Nitin

Cloudera Employee
Posts: 357
Registered: ‎07-29-2015

Re: Impala round function does not return expected result

Hi @nitinagr. You pretty much have two options here right now: * Use a constant second argument * Cast the first input to DOUBLE (which means you may lose some precision if the values can't be exactly represented in a DOUBLE type). I think supporting this seems like a valid ask so I went ahead and filed a JIRA: https://issues.apache.org/jira/browse/IMPALA-7613 . I remember a while back debating with someone about whether users were likely to do data-driven rounding, so I think you just settled that argument!
New Contributor
Posts: 2
Registered: ‎09-21-2018

Re: Impala round function does not return expected result

Thank you Tim.

 

Double is not working in my case, currently i am using case statement to get the desired result (i need maximum 2 digits after decimal, so i have only 3 scenarios in case statement).

 

Looking forward to the fix.

 

 

Thanks,

Nitin

New Contributor
Posts: 1
Registered: ‎11-03-2018

Re: Impala round function does not return expected result

Hi Nitin

 

I was working on similar problem of dynamic rounding and found a workaround. In my case I do average price statistics and I want to round average prices to 3 significant digits, in order to group similar observations. Now every price has different number of digits, so e.g. I must round 1.495 to 1.50, but also 1099 should be rounded to 1100.

 

Following expression did the trick for me:

 

SELECT ROUND(`price` / POWER(10.0, (FLOOR(LOG10(`price`)) - 2)), 0) * POWER(10.0, (FLOOR(LOG10(`price`)) - 2)) AS `price_rounded`

The -2 constant is related to my specific problem, i.e. 3 significant digits, so that could be adjusted or omitted in other use case.

Conceptually what I do is that I reflect number of digits to round on by transforming the first argument, then I can always round to zero digits no matter what, and finally transform data back.

 

May work better than CASE WHEN in situation when you don't know dimensions of your data upfront.

 

BR,

Mirek

 

Announcements