Reply
New Contributor
Posts: 1
Registered: ‎10-16-2015

Mapred Context not available in Generic UDF while using join in Sentry Enabled setup.

[ Edited ]

Hello,

I have created a generic UDF that does some custom operation on data using our pre-defined mechanism on rows of a table.

->The problem that we are facing is that we are not able to retrive the MapredContext during the :

void configure() stage inside the generic UDF (Only When using in a Join Query )

which is in turn used to retrieve the username from the JobConfiguration using the MapredContext.

 

public void configure(MapredContext context) {
Configuration c = context.getJobConf();

...

}

 

 

Generic UDF that is created :

MyUDF1(<data1>, <data2>);

MyUDF2(<data1>, <data2>);

 

Regular Query When MapredConext is available:

SELECT MyUDF1(name, 'Data102') from emp3;

SELECT MyUDF2(name, 'Data101') from emp1;

 

Join Query When MapredContext is not available: 

SELECT a.* FROM emp1 a JOIN emp2 b ON (a.id=b.id) WHERE MyUDF2(b.id,'Data101')='928725';

 

 

The Setup is :- CDH 5.3 sandbox (tried it with a cluster too), Sentry Enabled , Impersonation Off, Using Beeline.

(P.S: Works with Hive shell)

 

Error doesn't occur in the same condition using the same query when this property is set : 

hive.auto.convert.join = false;

 

Thus, the suspected reason coulde be related with Map Join in this specific setup.

Highlighted
Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Mapred Context not available in Generic UDF while using join in Sentry Enabled setup.

> Thus, the suspected reason coulde be related with Map Join in this specific setup.

Yes, that is indeed so.

Per the API documentation of GenericUDFs, this is expected behaviour: https://hive.apache.org/javadocs/r0.12.0/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html#c...

"""
> public void configure(MapredContext context)

Additionally setup GenericUDF with MapredContext before initializing. This is only called in runtime of MapRedTask.
"""

Since join optimisations are run as LocalTasks instead of MapRedTasks, the function is not called by Hive. This explains why it works when you disable the join convert optimisation via hive.auto.convert.join=false.