Support Questions

koffitse · ‎07-31-2016

Hi all,

I am using API for java to fetch data from a hbase table that contains 40 millions rows. I use the PrefixFilter on a Scanner to retrieve data. My java application is on a windows PC while my hbase is on hortonworks cluster. I have 4 region servers.

When i use ROWPREFIXFILTER in HBase Shell i retrieve data within 2 seconds. But with my java client application it takes around 10 mn to find the same result. Can someone explain me why this difference?

tyu · ‎08-01-2016

One aspect to note w.r.t. using PrefixFilter is that the start row is not automatically set.

You need to pass the correct start row along with PrefixFilter. Otherwise the number of rows scanned may be quite high.

View solution in original post

mqureshi · ‎08-01-2016

@Samie WALA

I am assuming your PC is remote to the cluster? You are working from home and cluster is your work cluster. You are connected using a VPN which is using your home internet connection. When you login to shell, that shell is running on the same machine as HBase. Is that right? As opposed to shell, your Java application is running on your home PC?

When you run your query in shell, it doesn't have to stream result over the network. The result stays right there and displayed right away. Shell is actually very highly optimized and doesn't have any overhead. It doesn't need much. Shell tends to be the fastest.

Your application running on your PC has to go over the network to make a request, which seems like pretty slow in this case. You didn't mention how big is the result that is being streamed over the network to your PC. If it's big then network issues might become more pronounced. You have not shared your code, but there could be some room for optimization there too.

One way to check your code if possible is to run your code on an edge node or some machine on the same network and see the difference.

koffitse · ‎08-01-2016

@mqureshi

Thank you for this answer.

I'm using a VPN to access the cluster. But the query returns only 312 rows. I executed the same application this morning while my PC was in our LAN and got the same duration. So i think there should be something else i'm missing.

tyu · ‎08-01-2016

One aspect to note w.r.t. using PrefixFilter is that the start row is not automatically set.

You need to pass the correct start row along with PrefixFilter. Otherwise the number of rows scanned may be quite high.

koffitse · ‎08-01-2016

@Ted Yu

Hi Ted. Thank you for your reply. When i provided the StartRow with the suitable information, i can retrieve data in less than 2 seconds. Thank you very much.

Cloudera Community

Support Questions

HBase Java API scan is too slow

Recommended Way to do HBase Prefix Scan through HB...

HBASE JAVA API issues

HBase Scan slow after inserting million reords in ...

Connecting to Kerberos secured HBase cluster from ...

HBASE Thrift API failed at TLS hanshake

Hbase rest API, fetch deleted rows (Hbase Raw Scan...

Creating an HBase Coprocessor in Java

Hbase Java API connection error

Excluding CDP components from Antivirus scans

impala forces full table scan