Support Questions

Find answers, ask questions, and share your expertise

Hive reloadable udf: random 'Unable to find class' error

avatar
Contributor

Hi! Some time ago we decided to move few of our udf packages to auxlib reloadable directory to enable updating some functions without restart of Hiveservers. Since then however we experience random errors like shown in following example:

2019-04-04 22:33:18,103 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Handler-Pool: Thread-117]: Completed compiling command(queryId=hive_20190404223333_ed9b3085-fc91-42b1-9ca4-5224cd838aec); Time taken:
0.481 seconds
2019-04-04 22:33:18,103 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-117]: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2019-04-04 22:33:18,103 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Handler-Pool: Thread-117]: </PERFLOG method=releaseLocks start=1554409998103 end=1554409998103 duration=0 from=org.apache.hado
op.hive.ql.Driver>
2019-04-04 22:33:18,105 INFO org.apache.hive.service.cli.operation.OperationManager: [HiveServer2-Handler-Pool: Thread-117]: Closing operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=3
8b7d0d3-d925-48b1-bdb7-b7c3334cc7d8]
2019-04-04 22:33:18,109 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: [HiveServer2-Handler-Pool: Thread-117]: Error executing statement:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Generate Map Join Task Error: Unable to find class: XXXXX
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.JoinOperator)
reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
reduceWork (org.apache.hadoop.hive.ql.plan.MapredWork)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:187)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:271)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:501)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:763)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task Error: Unable to find class: XXXXX
Serialization trace:
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.JoinOperator)
reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
reduceWork (org.apache.hadoop.hive.ql.plan.MapredWork)
at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:516)
at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:179)
at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79)
at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:273)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:225)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10315)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:223)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:558)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1343)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:185)
... 15 more

XXXX denotes here the class name. Have you seen such erratic behaviour before? The eror occurs for functions in one package only, although there are 4 packages in reloadable directory. The only difference between them is that the malfunctioning is significantly bigger than the other (it's shaded fat jar).

There are some network resources suggesting the size may be a problem, but it isn't  the same problem as ours:

https://stackoverflow.com/questions/54572121/how-do-i-fix-this-kryo-exception-when-using-a-udf-on-hi...

https://stackoverflow.com/questions/32448575/why-does-kryo-throw-classnotfoundexception-for-class-in...

 

Is there anything we could do to investigate this case further and get more information about the problem?

6 REPLIES 6

avatar
Super Guru
Hi,

How big is the JAR file? Since it is running a Map Join, it runs on HS2 itself, and it will copy the JAR file locally on the HS2 host, and I think it might be under /tmp, but not 100% sure.

So one thing I think you should check is the disk space. If at the time the disk was full, it might cause this kind of issue.

Maybe check this first before anything else.

Eric

avatar
Contributor

Hi Eric,

 

The size of the said jar package was around 20MB, and /tmp partition on all hive servers had lots of free space (say gigabytes) at any point of time we saw the problem. Finally we decided to remove this jar from 'reloadable' directory, so the case is still unsolved. Maybe someone else will have some neat idea too. Thanks for help, cheers!

avatar
Super Guru
Hi,

In that case, another thing I can think of is how often you update this JAR file and how often you see the error? Not sure if it could be the case that the JAR was replaced/updated while the reloading happened, hence the corrupted file (reading in the middle of updating) might be causing some classes not being loaded properly?

avatar
Contributor

Hi,

this perhaps could serve as explanation, but in our case we didn't make any changes after initial deployment of jars, and the problem still persisted for a week or so. Some additional piece: we used soft links instead of actual files (which were stored in different folder). I'm wondering if perhaps it could cause some short 'unavailability' of these files to Cloduera.

avatar
Super Guru
Hi,

In that case replace it using actual file might be a good test to confirm if the soft link could be the cause here.

Do you know how often it happens? Can you scan through the HS2 logs and see the timing? The pattern might also help to tell some story.

Currently running out of ideas.

avatar
New Contributor

Just wondering if you found a workaround for this? I think this is a known bug in Hive 1.1, but unfortunately upgrading Hive is not an option for us right now.

 

https://issues.apache.org/jira/browse/HIVE-14555