About jastang

jastang · ‎12-08-2016

So ... after a long hiatus. Turns out this is actually https://issues.apache.org/jira/browse/HBASE-13262 I was using hbase-client 0.96 with HBase 1.0.0 (CDH 5.5) and we had tables that were housing large XML payloads, which would force the bug to manifest when hbase.client.scanner.caching was a high value. There are multiple ways to fix this: Use hbase-client 0.98+, if you can afford to upgrade without impact Lower the value of hbase.client.scanner.caching in CM (this was what I ended up doing) Programatically, use Scan.setCaching(int) and/or Scan.setMaxResultSize() to avoid the region skipping.

jastang · ‎12-02-2016

Thanks for this - works for Parquet, but how does one do this for a table from CSV? Let's say a CSV schema changes, I want to be able to use the Avro schema evolution to create the table. I tried the same create statement, but using STORED AS TEXTFILE and with the ROW FORMAT DELIMITED etc. I end up getting null values.

jastang · ‎09-19-2016

Hello, I have several HBase tables defined using Avro schemas and I am trying to write a simple Java function to return the entire dataset for a given table (all records). I'm doing something like this (assume the "Customer avro" schema has been defined): DatasetReader<Customer> reader = null; RandomAccessDataset<Customer> customers = Datasets.load(PropertyManager.getDatasetURI(HBaseHelper.CUSTOMER), Customer.class); reader = customers.newReader(); According to the API docs, this should return the entire unflitered dataset. The URI method also uses the "dataset:" scheme so it is not getting a View. What I'm seeing is that only a very small subset of the entire table is actually returned when I get a handle to the iterator - ~20 out of 15000 records that are actually in the table, which is barely 0.1%. Please advise on how to get all records and if this is a defect with Kite - using the native HBase API is not an option because of the Kite encoding which is challenging to work with outside of Kite. EDIT: we do not seem to see this issue on a single-node HBase, only on an HBase cluster with Kerberos auth.

Online	Offline
Last Visited	‎01-31-2017 06:26 PM

Member Since	‎09-19-2016 12:56 PM
Last Visited	‎01-31-2017 06:26 PM
Posts	4

Cloudera Community

Re: Kite Datasets SDK (HBase) - Datasets.load() an...

Re: Kite Datasets SDK (HBase) - Datasets.load() an...

Re: Create Hive table to read parquet files from p...

Kite Datasets SDK (HBase) - Datasets.load() and Da...