We encountered a dead-lock error in the BlockManager many times described in the issue: https://issues.apache.org/jira/browse/SPARK-13566. It should be fixed in the Spark version 1.6.2. Is this fix already included (or planned) in CDH? We couldn't find this issue in the CDH release notes. We're using 'cdh5.9.0.p0.23'.
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [1200 seconds]. This timeout is controlled by spark.network.timeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:110) at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1623) at org.apache.spark.rdd.RDD.unpersist(RDD.scala:203)
Good question, you can always find exactly what's in a release by looking in the github.com/cloudera/spark branches or consulting for example ...
You are right, it isn't in the 5.9 release, and I would have generally expected it to be. Open a support case to ask for it to be cherry-picked, if you're able to do that.