Support Questions

Find answers, ask questions, and share your expertise

Impala Use Hive UDF With Group By Gives Wrong Result.

avatar
Explorer
I find impala will give wrong answer if the result of Hive UDF is used in group by statement. The impala version is: 2.7.0-cdh5-IMPALA_KUDU-cdh5 RELEASE. Here is the procedure to reproduce the error:
impala> create table test_escape_group_by (s string);
impala> insert into table test_escape_group_by values("longstring"), ("short");
impala> select my_escape_string(s) as es from test_escape_group_by;
longstring
short
impala> select my_escape_string(s) as es from test_escape_group_by group by es;
shorttring
short
We can see that the beginning part of 'longstring' is replaced by 'short'. Here is the definition of my_escape_string:
public class MyEscapeString extends UDF
{
  public Text evaluate(Text para) throws ParseException {
    if ((null == para) || ("".equals(para.toString()))) {
      return new Text("");
    }
    return new Text(para.toString().replace("\\", "\\\\").replace("\"", "\\\""));
  }
}
My Question: Is this a bug of impala, or how can I rewritten the Java UDF to avoid such errors.
1 ACCEPTED SOLUTION

avatar
Contributor

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath

 

 

View solution in original post

1 REPLY 1

avatar
Contributor

Hey,

 

This looks like a bug and can be reproduced even on the latest versions of Impala. Thanks for sharing the repro steps with us. I created a jira https://issues.cloudera.org/browse/IMPALA-4266 with a simpler UDF so its easy to follow. Your UDF implementation looks fine and is likely not causing this issue. 

 

- Bharath