08-01-2016 07:11 PM
Yes, I agree. I do not want to do this either. But, I just want to convey the steps to the other spark developers on how to begin using it without having to do the alteration. If this is still a work in progress by Cloudera to fully integrate this into their CDH parcels, I would like to know this too, so I can tell them to wait for a future release.
08-08-2017 02:00 PM
How about finding HBaseTableCatalog class? We have a single CDH 5.7.1 cluster for production, and it is definitely not in the installed hbase-spark-1.2.0-cdh5.7.1.jar, nor in the spark-sql_2.10-1.6.0-cdh5.7.1.jar.
Git repo for this class suggests that the class (and the whole data source) was not introduced until branch-2. Is this why the CDH version of HBase-Spark module doesn't include it? When then would the module be brought up to date?
08-09-2017 11:42 AM
My purpose is to get the module document's timestamp-based DF query to work:
val df = sqlContext.read .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0", HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString)) .format("org.apache.hadoop.hbase.spark") .load()
Looks like I can substitute the constant def HBaseTableCatalog.tableCatalog with its string value "catalog" without needing the class. However, Cloudera's version of HBaseSparkConf is also divergent - it doesn't have the official version's constants MIN_TIMESTAMP / MAX_TIMESTAMP, or their current incarnations TIMERANGE_START / TIMERANGE_END.
Further substituting them with literals compiled but failed at run time:
val getRdd = sqlContext.read .options(Map("catalog" -> cat, "hbase.spark.query.timerange.start" -> startMs.toString, "hbase.spark.query.timerange.end" -> currMs.toString)) .format("org.apache.hadoop.hbase.spark") .load()
Exception in thread "main" java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)