Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impala Use Hive UDF With Group By Gives Wrong Result.

Solved Go to solution
Highlighted

Impala Use Hive UDF With Group By Gives Wrong Result.

Explorer
I find impala will give wrong answer if the result of Hive UDF is used in group by statement. The impala version is: 2.7.0-cdh5-IMPALA_KUDU-cdh5 RELEASE. Here is the procedure to reproduce the error:
impala> create table test_escape_group_by (s string);
impala> insert into table test_escape_group_by values("longstring"), ("short");
impala> select my_escape_string(s) as es from test_escape_group_by;
longstring
short
impala> select my_escape_string(s) as es from test_escape_group_by group by es;
shorttring
short
We can see that the beginning part of 'longstring' is replaced by 'short'. Here is the definition of my_escape_string:
public class MyEscapeString extends UDF
{
  public Text evaluate(Text para) throws ParseException {
    if ((null == para) || ("".equals(para.toString()))) {
      return new Text("");
    }
    return new Text(para.toString().replace("\\", "\\\\").replace("\"", "\\\""));
  }
}
My Question: Is this a bug of impala, or how can I rewritten the Java UDF to avoid such errors.
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Impala Use Hive UDF With Group By Gives Wrong Result.

Cloudera Employee

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath

 

 

1 REPLY 1

Re: Impala Use Hive UDF With Group By Gives Wrong Result.

Cloudera Employee

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here