I've developed a Spark job for calculating some statistics over an aggregation table in HBase and can successfully run the job when run via command line on a server in a folder on a mounted home directory (/net/home/...). The issue occurs when I copy the same artifact and related files into a directory not on the mounted partition (/opt/...). When I run from here I receive this exception:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2.0 (TID 41, c2d003.in.wellcentive.com): java.io.IOException: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 104, Size: 5
The underlying data has not been changed, the commandline and job parameters have not been changed, the only difference is the path of the artifact. I was hoping someone out there had seen this before or might have an idea of what direction to look in.
This was on a cluster running CDH 5.5 and Spark 1.3.1 submitted via yarn-client. I'm not sure what other version might be relevant, but can likely get them if helpful.
Based on the Exception in the title, you may have duplicate dependencies of a custom KryoSerializer or KryoRegistrar. Double check your dependencies and make sure you don't have multple versions of the same dependency. It's possible that you are changing the order of the classpath allowing the jvm to pick up an older version when switching locations. Or perhaps the other folder is adding extra dependencies.
The whole stack trace from the task that failed will be useful as well if you are unable to find a dependency mismatch.