Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hbase performance with storage in the cloud versus on hdfs:

avatar
Contributor

I am wondering what difference in IO can be expected for hbase with storage in the cloud VS storage on hdfs.
I would expect that when data is retrieved from hdfs, it will be a lot faster than from the cloud (like adls-in my specific case adls gen2=abfs).

Is there somewhere where I can test this?

Or find a previous study for this?

If this is the case, then one would expect that current hbase performance for reading data is a bit less than some years ago when everything was on premise and typically hdfs was used?

 

 

Maybe I am missing something obvious here, so any insight is appreciated !

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Super Collaborator

Hello @JB0000000000001 

 

We wish to follow-up with you on the Post & confirm if you have any additional Observation to be shared with respect to studying or implementing HBase on Cloud Storage. Or, our response to your Post was helpful in getting the required Cloud Storage possible latencies. 

 

- Smarak

avatar
Super Collaborator

Hello @JB0000000000001 

 

As we haven't heard from your side, We assume the Queries posted by you has been addressed & marking the Post as Solved. When you have the time, Feel free to share your Observation with respect to studying or implementing HBase on Cloud Storage. 

 

Thanks again for sharing your thoughts on Cloudera Community.

 

- Smarak

avatar
Contributor

Thank you for this very valuable input!
(I had somehow missed the response).
I see indeed increased latencies, but see that should be neglectable for hot data.
I have observed this, but think there is a limit to how much data you can keep 'hot'. This depends on a combination of settings at the level of the hbase catalog properties and the hbcase cluster. We have discussed this also in following thread: https://community.cloudera.com/t5/Support-Questions/simplest-method-to-read-a-full-hbase-table-so-it...

It would be very interesting if a more in depth study would ever be conducted and reported, as this is very relevant for applications with hbase as back-end that require some more advanced querying of the data (like in my case aggregations to compute a heatmap using a high volume of data points).