Support Questions

aroraprateek · ‎05-07-2015

Hi

I upgrade my cloudera version to 5.4.0. and it have spark 1.3.0.

but can i want the latest spark version 1.3.1

is there any tuturial so i can upgrade spark myself or i need to wait for any cloudera upgrade .

Regards

Prateek

srowen · ‎05-07-2015

You can run what you like on your cluster, including Spark 1.3.1. However, it would not be supported, and you should not modify any of the CDH installation. You may need to take care with creating and maintaining your own config. CDH already includes critical fixes after 1.3.0 that went into 1.3.1, so I don't see much value in this.

View solution in original post

srowen · ‎05-07-2015

You can run what you like on your cluster, including Spark 1.3.1. However, it would not be supported, and you should not modify any of the CDH installation. You may need to take care with creating and maintaining your own config. CDH already includes critical fixes after 1.3.0 that went into 1.3.1, so I don't see much value in this.

g.vanlandeghem · ‎05-21-2015

When will Cloudera ship Spark 1.3.1? in cdh 5.4.1/2/3?

srowen · ‎05-21-2015

The base name of the Spark release will always be a ".0" release. The CDH release tracks upstream maintenance releases although not exactly; some upstream fixes may end up in a CDH maintenance release earlier, or later. So, 5.4.1 already has some of 1.3.1.

g.vanlandeghem · ‎05-22-2015

I am asking this since we have the following problem using spark sql in spark 1.3.0 in CDH5.4.1:

val sqlContext2 = new org.apache.spark.sql.SQLContext(sc)
val transactions = sqlContext2.parquetFile("/data/production/blabla.../daily/2015/01/01/transaction.parquet")
transactions.registerTempTable("tnx2")

sqlContext2.sql("SELECT platform_id as platform_id_alias, amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test")

// or

sqlContext2.sql("SELECT IF(amount_euro IS NULL, amount_euro, amount_euro), amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test")

sqlContext2.sql("SELECT * FROM test WHERE amount_euro < 0").registerTempTable("bets")
sqlContext2.sql("SELECT * FROM test WHERE amount_euro > 0").registerTempTable("won")

val wonAndBetsJoined = sqlContext2.sql("SELECT * FROM bets INNER JOIN won ON bets.reference_id = won.reference_id")

The statements in bold generate an exception in spark 1.3.0 at the join execution in CDH 5.4.0:

java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64)
at scala.collection.IterableLike$class.head(IterableLike.scala:91)
at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:47)
at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120)
at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:47)

The problem is not solved when rewriting this using the DataFrame api: "https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-7XDEg4WoiKxmzdjg9kG9jkqL7...

These statements also generate an error in apache spark 1.3.0, this is solved in apache spark 1.3.1.

Hence my question, when will spark 1.3.1 be supported (or this bug fixed) ? We need this for going into production.

Kind regards
Geert

srowen · ‎05-22-2015

Backports are driven by support customer demand and importance. However Spark SQL is not supported now so I dont know if this would be back ported. You can however run whatever Spark you like on CDH, it is just that only the CDH build is supported.

g.vanlandeghem · ‎05-25-2015

Hello Sean,

Perhaps I am missing something but my question was about when cloudera will "release" spark 1.3.1 in CDH? Our problem exists in apache spark 1.3.0 and cdh 5.4.0 spark (1.3.0). Since it is solved in apache spark 1.3.1 I asked when we may expect Cloudera to bundle spark 1.3.1 in one of its upcoming releases.

kind regards

Geert

srowen · ‎05-25-2015

I don't think maintenance releases get released as such with CDH for any component, since the release cycle and customer demand for maintenance releases are different from upstream. Important fixes are backported though, so you already have some of 1.3.1 and beyond in the 1.3.x branch in CDH. The changes aren't different; they come from upstream. Minor releases rebase on upstream minor releases and so 'sync' at that point (i.e. CDH 5.5 should have the latest minor release, whether it's 1.4.x or 1.5.x)

Cloudera Community

Support Questions

can i upgrade spark from 1.3.0 to 1.3.1 in CDH 5.4.0