We are planning to upgrade our hadoop cluster from CDH5.5.4 to CDH5.13.0, the new CDH has Spark 1.6 while the previous had Spark 1.5.0.
Our Spark jobs are using spark Cassandra connector which allow us to write to cassandra, unfortunattly our Cassnadra version is old (2.0.14) and we only plan to upgrade it next year.
I checked and find that there is no spark Cassandra connector that can allow Spark 1.6 to connect to this Cassandra and we should upgrade our Cassandra to have the relevant spark Cassandra connector.
My questions are:
1- Can i have 3 Spark versions in the Same CDH? Spark 1.6 which is part of the CDH, Spark 2 and Spark 1.5 as standalone? Currently at my test cluster i have Spaek 1.6.0 as part of the CDH and Spark2 which i added as parcels.
2- According to what i see that standalone is not intergrated to Yarn resources? Is this mean that i can have Spark history for Spark 1.5.0?
3- Will i have any issue with Oozie to schedule jobs from different Spark versions?
4- Is there an old parcels for Spark 1.5.0 or i need to install it manually? if it manually, is there a clear procedure how to perform this and what are the prereqisite of this?
5- And the most important, is anyone aware of any patch spark Cassandra connector that i can use with Spark 1.6 and Cassnadra 2.0 which will solve all my previous concerns?
The error we got:
when we are trying to use 1.6.8 vesion
I'm getting an rdd error, and with lower versions the following expection occurred:
Exception in thread "main" java.lang.AbstractMethodError at org.apache.spark.Logging$class.log(Logging.scala:50) at com.datastax.spark.connector.cql.CassandraConnector$.log(CassandraConnector.scala:143) at org.apache.spark.Logging$class.logDebug(Logging.scala:62) at com.datastax.spark.connector.cql.CassandraConnector$.logDebug(CassandraConnector.scala:143)