Spark job fails with below error when byte code for any particular method grows beyond 64KB
spark.sql.codegen.wholeStage is enabled by default for internal optimization in Spark2 which can cause these kind of issues in some corner cases.
Below is the detailed stack trace for your reference:
org.codehaus.janino.JaninoRuntimeException: Code of method "processNext()V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB
at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
at org.codehaus.janino.CodeContext.write(CodeContext.java:857)
at org.codehaus.janino.UnitCompiler.writeShort(UnitCompiler.java:11072)
at org.codehaus.janino.UnitCompiler.load(UnitCompiler.java:10744)
at org.codehaus.janino.UnitCompiler.load(UnitCompiler.java:10729)
at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3824)
at org.codehaus.janino.UnitCompiler.access$9100(UnitCompiler.java:206)
at org.codehaus.janino.UnitCompiler$12.visitLocalVariableAccess(UnitCompiler.java:3796)
at org.codehaus.janino.UnitCompiler$12.visitLocalVariableAccess(UnitCompiler.java:3762)
at org.codehaus.janino.Java$LocalVariableAccess.accept(Java.java:3675)
at org.codehaus.janino.Java$Lvalue.accept(Java.java:3563)
at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3820)
[....] Output truncated
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
.
How to fix this?
This can be fixed by setting spark.sql.codegen.wholeStage=false in custom spark2-defaults configuration via Ambari and restart required services OR adding --conf spark.sql.codegen.wholeStage=false in spark-shell or spark-submit command.
.
Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!