Reply
Expert Contributor
Posts: 61
Registered: ‎02-03-2016

Re: CDH 5.7.0 HBase Spark Module

Yes, I agree. I do not want to do this either. But, I just want to convey the steps to the other spark developers on how to begin using it without having to do the alteration. If this is still a work in progress by Cloudera to fully integrate this into their CDH parcels, I would like to know this too, so I can tell them to wait for a future release.

Expert Contributor
Posts: 63
Registered: ‎03-04-2015

Re: CDH 5.7.0 HBase Spark Module

How about finding HBaseTableCatalog class?  We have a single CDH 5.7.1 cluster for production, and it is definitely not in the installed hbase-spark-1.2.0-cdh5.7.1.jar, nor in the spark-sql_2.10-1.6.0-cdh5.7.1.jar.

 

Git repo for this class suggests that the class (and the whole data source) was not introduced until branch-2.  Is this why the CDH version of HBase-Spark module doesn't include it?  When then would the module be brought up to date?

 

Thanks,

Miles

 

Expert Contributor
Posts: 63
Registered: ‎03-04-2015

Re: CDH 5.7.0 HBase Spark Module

My purpose is to get the module document's timestamp-based DF query to work:

 

val df = sqlContext.read
      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
        HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
      .format("org.apache.hadoop.hbase.spark")
      .load()

 

Looks like I can substitute the constant def HBaseTableCatalog.tableCatalog with its string value "catalog" without needing the class.  However, Cloudera's version of HBaseSparkConf is also divergent - it doesn't have the official version's constants MIN_TIMESTAMP / MAX_TIMESTAMP, or their current incarnations TIMERANGE_START / TIMERANGE_END.

 

Further substituting them with literals compiled but failed at run time:

 

    val getRdd = sqlContext.read
      .options(Map("catalog" -> cat, "hbase.spark.query.timerange.start" -> startMs.toString,
         "hbase.spark.query.timerange.end" -> currMs.toString))
      .format("org.apache.hadoop.hbase.spark")
      .load()
Exception in thread "main" java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:313)
        at scala.None$.get(Option.scala:311)
        at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)

 

 

Announcements