Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH 5.7.0 HBase Spark Module

Re: CDH 5.7.0 HBase Spark Module

Expert Contributor
Sure, but altering /etc/spark/conf/classpath.txt (or creating it) is
also altering the spark client configs.

Re: CDH 5.7.0 HBase Spark Module

Contributor

How about finding HBaseTableCatalog class?  We have a single CDH 5.7.1 cluster for production, and it is definitely not in the installed hbase-spark-1.2.0-cdh5.7.1.jar, nor in the spark-sql_2.10-1.6.0-cdh5.7.1.jar.

 

Git repo for this class suggests that the class (and the whole data source) was not introduced until branch-2.  Is this why the CDH version of HBase-Spark module doesn't include it?  When then would the module be brought up to date?

 

Thanks,

Miles

 

Re: CDH 5.7.0 HBase Spark Module

Contributor

My purpose is to get the module document's timestamp-based DF query to work:

 

val df = sqlContext.read
      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
        HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
      .format("org.apache.hadoop.hbase.spark")
      .load()

 

Looks like I can substitute the constant def HBaseTableCatalog.tableCatalog with its string value "catalog" without needing the class.  However, Cloudera's version of HBaseSparkConf is also divergent - it doesn't have the official version's constants MIN_TIMESTAMP / MAX_TIMESTAMP, or their current incarnations TIMERANGE_START / TIMERANGE_END.

 

Further substituting them with literals compiled but failed at run time:

 

    val getRdd = sqlContext.read
      .options(Map("catalog" -> cat, "hbase.spark.query.timerange.start" -> startMs.toString,
         "hbase.spark.query.timerange.end" -> currMs.toString))
      .format("org.apache.hadoop.hbase.spark")
      .load()
Exception in thread "main" java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:313)
        at scala.None$.get(Option.scala:311)
        at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here