Member since
07-22-2014
6
Posts
0
Kudos Received
0
Solutions
05-25-2015
11:00 AM
Hello Sean, Perhaps I am missing something but my question was about when cloudera will "release" spark 1.3.1 in CDH? Our problem exists in apache spark 1.3.0 and cdh 5.4.0 spark (1.3.0). Since it is solved in apache spark 1.3.1 I asked when we may expect Cloudera to bundle spark 1.3.1 in one of its upcoming releases. kind regards Geert
... View more
05-22-2015
05:52 AM
I am asking this since we have the following problem using spark sql in spark 1.3.0 in CDH5.4.1: val sqlContext2 = new org.apache.spark.sql.SQLContext(sc) val transactions = sqlContext2.parquetFile("/data/production/blabla.../daily/2015/01/01/transaction.parquet") transactions.registerTempTable("tnx2") sqlContext2.sql("SELECT platform_id as platform_id_alias, amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test") // or sqlContext2.sql("SELECT IF(amount_euro IS NULL, amount_euro, amount_euro), amount_euro, reference_id FROM tnx2 WHERE group_id > 1").registerTempTable("test") sqlContext2.sql("SELECT * FROM test WHERE amount_euro < 0").registerTempTable("bets") sqlContext2.sql("SELECT * FROM test WHERE amount_euro > 0").registerTempTable("won") val wonAndBetsJoined = sqlContext2.sql("SELECT * FROM bets INNER JOIN won ON bets.reference_id = won.reference_id") The statements in bold generate an exception in spark 1.3.0 at the join execution in CDH 5.4.0: java.util.NoSuchElementException: next on empty iterator at scala.collection.Iterator$$anon$2.next(Iterator.scala:39) at scala.collection.Iterator$$anon$2.next(Iterator.scala:37) at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:64) at scala.collection.IterableLike$class.head(IterableLike.scala:91) at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:47) at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:120) at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:47) The problem is not solved when rewriting this using the DataFrame api: "https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAAswR-7XDEg4WoiKxmzdjg9kG9jkqL7mve=g0hd8Y8M_8RqBOw@mail.gmail.com%3E" These statements also generate an error in apache spark 1.3.0, this is solved in apache spark 1.3.1. Hence my question, when will spark 1.3.1 be supported (or this bug fixed) ? We need this for going into production. Kind regards Geert
... View more
05-21-2015
02:16 AM
When will Cloudera ship Spark 1.3.1? in cdh 5.4.1/2/3?
... View more