About jastang

obrobecker · ‎02-19-2020

with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly : scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64 scala> import sqlContext.implicits._ import sqlContext.implicits._ scala> sqlContext.sql("describe mytable") res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field] I'm working with spark 2.3.2

jastang · ‎12-08-2016

So ... after a long hiatus. Turns out this is actually https://issues.apache.org/jira/browse/HBASE-13262 I was using hbase-client 0.96 with HBase 1.0.0 (CDH 5.5) and we had tables that were housing large XML payloads, which would force the bug to manifest when hbase.client.scanner.caching was a high value. There are multiple ways to fix this: Use hbase-client 0.98+, if you can afford to upgrade without impact Lower the value of hbase.client.scanner.caching in CM (this was what I ended up doing) Programatically, use Scan.setCaching(int) and/or Scan.setMaxResultSize() to avoid the region skipping.

Online	Offline
Last Visited	‎01-31-2017 06:26 PM

Member Since	‎09-19-2016 12:56 PM
Last Visited	‎01-31-2017 06:26 PM
Posts	4

Cloudera Community

Re: Kite Datasets SDK (HBase) - Datasets.load() an...

Re: Create Hive table to read parquet files from p...

Re: Kite Datasets SDK (HBase) - Datasets.load() an...