Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala Use Hive UDF With Group By Gives Wrong Result.

avatar
Explorer
I find impala will give wrong answer if the result of Hive UDF is used in group by statement. The impala version is: 2.7.0-cdh5-IMPALA_KUDU-cdh5 RELEASE. Here is the procedure to reproduce the error:
impala> create table test_escape_group_by (s string);
impala> insert into table test_escape_group_by values("longstring"), ("short");
impala> select my_escape_string(s) as es from test_escape_group_by;
longstring
short
impala> select my_escape_string(s) as es from test_escape_group_by group by es;
shorttring
short
We can see that the beginning part of 'longstring' is replaced by 'short'. Here is the definition of my_escape_string:
public class MyEscapeString extends UDF
{
  public Text evaluate(Text para) throws ParseException {
    if ((null == para) || ("".equals(para.toString()))) {
      return new Text("");
    }
    return new Text(para.toString().replace("\\", "\\\\").replace("\"", "\\\""));
  }
}
My Question: Is this a bug of impala, or how can I rewritten the Java UDF to avoid such errors.
1 ACCEPTED SOLUTION

avatar
Contributor

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath

 

 

View solution in original post

1 REPLY 1

avatar
Contributor

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath