New Contributor
Posts: 2
Registered: ‎10-07-2016

Correctly building the application with SBT to run on the cluster

Hi everyone,


I have a big Java/Scala application which uses Spark (only in the Scala part) to perform calculations. I am able to run the application locally without a problem but cannot run it on the Cloudera cluster because I keep getting the "java.lang.IllegalStateException: unread block data" errors. By searching on Google, I think that this may be due to an incorrectly configured build of my application but I just can't make it work.

On the cluster I have CDH 5.9.0 with Spark 1.6.0, Scala 2.10.5 and Java 1.7. I am building the fat jar of my applcation using SBT assembly with Scala 2.10.6 and Java 1.7.

Here are the relevant parts of the build.sbt file:

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", xs @ _*) => MergeStrategy.last
  case PathList("org", "spark-project", xs @ _*) => MergeStrategy.last
  // I also tried discarding them but kept getting the same errors
  // other rules here

libraryDependencies ++= {
  "org.apache.spark"  %% "spark-core"         % "1.6.0" % "provided" excludeAll(ExclusionRule(organization = "org.slf4j")),
  "org.apache.spark" %% "spark-graphx" % "1.6.0" % "provided"
  // other dependencies here

I run the application from the master node by calling sudo -u hdfs spark-submit --deploy-mode cluster --master yarn myApp.jar


What could be problem here? How do I correctly build/run an application like this? Thank you!

Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: Correctly building the application with SBT to run on the cluster

Unfortunately this isn't enough information to help.  It may be helpfult to provide the code snippet where the problem is occuring as well.


Just one suggestion though, if possible use Cloudera's maven repository[1] to bring in artifacts to help with any version mismatches.