We recently moved our HDP cluster to Google Cloud. We are not using any of the GC services, but we consider our options. We could continue to use HBase or migrate to BigTable. Our challenge is that we may have to change our applications using HBase as a data store. Could you advise on a safe path approach, if any?
One way to do it is to use the BigTable HBase client for Java, which is a custom version of the Apache HBase client. This is compatible with HBase 1.3.x API. You can find the source code in Github and there also couple examples with it. However, there are differences between BigTable and HBase and you may miss some of HBase features, e.g. co-processors nor distributed processors. Some of the properties that you could control in HBase are automatically handled by BigTable, e.g. block size and compression. Also, you cannot define ACLs for a specific row nor visibility for a given cell. No tags are supported. Append operations are atomic in BigTable. It is not possible to mass delete versions with a timestamp is not possible, but you could delete an individual version of a cell with a given timestamp. You could probably build your own mass action. Other things that don't work: reverse scans, query versions of a column family with a timestamp range. You can limit number of values per row per column family. Not all the HBase filters are supported in BigTable.
As a benefit, BigTable administration is probably easier because it done automatically for most part.
There are other differences:
- BT does not support namespaces
- BT uses tablets instead of regions and manages those automatically.
- BT does not support some of the snapshot methods
- BT does table compaction automatically and some of the methods available for HBase are not yet available
Overall, for most part, you should be able to migrate to BigTable, but it requires obviously some redesign.
One thing that I would keep in mind. Apache HBase is a project supported by the Open Source Community and keeps adding new features all the time. Apache HBase is supported. I would cautiously consider migrating to BigTable without addressing the support aspects for any critical application that is subject to this migration.
SQL queries may be challenging and require a lot of thought https://cloud.google.com/bigquery/external-data-bigtable
Phoenix on HBase makes for great access and everything is Apache Open Source and has a rich community. Something to think of when things go wrong or why things stagnate or a single vendor abandons.