Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH 5.7.0 HBase Spark Module

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor
Sure, but altering /etc/spark/conf/classpath.txt (or creating it) is
also altering the spark client configs.

Re: CDH 5.7.0 HBase Spark Module


How about finding HBaseTableCatalog class?  We have a single CDH 5.7.1 cluster for production, and it is definitely not in the installed hbase-spark-1.2.0-cdh5.7.1.jar, nor in the spark-sql_2.10-1.6.0-cdh5.7.1.jar.


Git repo for this class suggests that the class (and the whole data source) was not introduced until branch-2.  Is this why the CDH version of HBase-Spark module doesn't include it?  When then would the module be brought up to date?





Re: CDH 5.7.0 HBase Spark Module


My purpose is to get the module document's timestamp-based DF query to work:


val df =
      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
        HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))


Looks like I can substitute the constant def HBaseTableCatalog.tableCatalog with its string value "catalog" without needing the class.  However, Cloudera's version of HBaseSparkConf is also divergent - it doesn't have the official version's constants MIN_TIMESTAMP / MAX_TIMESTAMP, or their current incarnations TIMERANGE_START / TIMERANGE_END.


Further substituting them with literals compiled but failed at run time:


    val getRdd =
      .options(Map("catalog" -> cat, "hbase.spark.query.timerange.start" -> startMs.toString,
         "hbase.spark.query.timerange.end" -> currMs.toString))
Exception in thread "main" java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:313)
        at scala.None$.get(Option.scala:311)
        at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)



Don't have an account?
Coming from Hortonworks? Activate your account here