About g.vanlandeghem

g.vanlandeghem · ‎05-25-2015

Hello Sean, Perhaps I am missing something but my question was about when cloudera will "release" spark 1.3.1 in CDH? Our problem exists in apache spark 1.3.0 and cdh 5.4.0 spark (1.3.0). Since it is solved in apache spark 1.3.1 I asked when we may expect Cloudera to bundle spark 1.3.1 in one of its upcoming releases. kind regards Geert

g.vanlandeghem · ‎05-22-2015

I am asking this since we have the following problem using spark sql in spark 1.3.0 in CDH5.4.1: val sqlContext2 = new org.apache.spark.sql.SQLContext(sc) val transactions = sqlContext2.parquetFile("/data/production/blabla.../daily/2015/01/01/transaction.parquet") transactions.registerTempTable("tnx2") sqlContext2.sql("SELECT platform_id as platform_id_alias, amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test") // or sqlContext2.sql("SELECT IF(amount_euro IS NULL, amount_euro, amount_euro), amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test") sqlContext2.sql("SELECT * FROM test WHERE amount_euro < 0").registerTempTable("bets") sqlContext2.sql("SELECT * FROM test WHERE amount_euro > 0").registerTempTable("won") val wonAndBetsJoined = sqlContext2.sql("SELECT * FROM bets INNER JOIN won ON bets.reference_id = won.reference_id") The statements in bold generate an exception in spark 1.3.0 at the join execution in CDH 5.4.0: java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64) at scala.collection.IterableLike$class.head(IterableLike.scala:91) at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:47) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120) at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:47) The problem is not solved when rewriting this using the DataFrame api: "https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-7XDEg4WoiKxmzdjg9kG9jkqL7mve=g0hd8Y8M_8RqBOw@mail.gmail.com%3E" These statements also generate an error in apache spark 1.3.0, this is solved in apache spark 1.3.1. Hence my question, when will spark 1.3.1 be supported (or this bug fixed) ? We need this for going into production. Kind regards Geert

g.vanlandeghem · ‎05-21-2015

When will Cloudera ship Spark 1.3.1? in cdh 5.4.1/2/3?

Online	Offline
Last Visited	‎01-10-2017 12:07 PM

Member Since	‎07-22-2014 02:58 AM
Last Visited	‎01-10-2017 12:07 PM
Posts	6

Cloudera Community

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...