Created on 06-12-2018 12:15 AM
Spark job fails with below error when byte code for any particular method grows beyond 64KB
spark.sql.codegen.wholeStage is enabled by default for internal optimization in Spark2 which can cause these kind of issues in some corner cases.
Below is the detailed stack trace for your reference:
org.codehaus.janino.JaninoRuntimeException: Code of method "processNext()V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949) at org.codehaus.janino.CodeContext.write(CodeContext.java:857) at org.codehaus.janino.UnitCompiler.writeShort(UnitCompiler.java:11072) at org.codehaus.janino.UnitCompiler.load(UnitCompiler.java:10744) at org.codehaus.janino.UnitCompiler.load(UnitCompiler.java:10729) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3824) at org.codehaus.janino.UnitCompiler.access$9100(UnitCompiler.java:206) at org.codehaus.janino.UnitCompiler$12.visitLocalVariableAccess(UnitCompiler.java:3796) at org.codehaus.janino.UnitCompiler$12.visitLocalVariableAccess(UnitCompiler.java:3762) at org.codehaus.janino.Java$LocalVariableAccess.accept(Java.java:3675) at org.codehaus.janino.Java$Lvalue.accept(Java.java:3563) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3820) [....] Output truncated at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:782) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
.
How to fix this?
This can be fixed by setting spark.sql.codegen.wholeStage=false in custom spark2-defaults configuration via Ambari and restart required services OR adding --conf spark.sql.codegen.wholeStage=false in spark-shell or spark-submit command.
.
Please comment if you have any feedback/questions/suggestions. Happy Hadooping!!
Created on 07-29-2018 06:44 AM
This configuration is applicable for Spark 2.2.x and above
Created on 04-07-2020 10:53 AM
Hi Team,
I have upgraded to spark 2.2.1 but spark.sql.codegen.wholeStage=false doesn't give any improvement in performance