Support Questions
Find answers, ask questions, and share your expertise

How to use Hbase in Oozie

Contributor

Hi all,

As the title suggests i'm trying to work out how to use Hbase within an Oozie workflow. Resources I've found online regarding this subject have been very limited. Do I need to use a Java action within my Oozie workflow in order to use Hbase?

I require to create a table which will be used to monitor when my data pulls for a specific table are running/completed/failed, which we are referring to as the process status table. Currently to do this I am inserting a new row to a hive table before the relevant actions start and then once the data pull is completed a new row is inserted depending on whether or not the actions were all successful. I believe that using Hbase instead of Hive would be a much more efficient way to add to this table as only one row is added at a time.

Any help provided would be much appreciated

9 REPLIES 9

Super Collaborator

Contributor

This helps me regarding authentication when using hbase. But i'm asking from a more generalized "how exactly is Hbase used within Oozie" perspective. So would I need to use a shell or java action, will I need any scripts as I would with Hive etc. I'm fairly new to Hbase so I'd be grateful for any help clarifying this.

Super Collaborator

Contributor

Unfortunately I will not be using Hbase with Sqoop

Super Collaborator

@Daniel Perry

Well it depends on the use case you are running, If you are trying to connect hbase and run some API through oozie then probably it might not be an low latency application. We have seen many people using hive/pig on hbase for loading/reporting but using oozie java action is also a good choice but again depends on the use case.

Contributor

@Jitendra Yadav,

I have updated the question with details of what I would be using Hbase for. Please let me know of any queries.

Easiest way is to write shell action and create temp file which hold the status of the job during its phases. Or if you are good in java and already have hbase cluster then write a java action code to put status of the job in a hbase table and get most recent cell value to get latest progress of the job.

Contributor

The reasoning for having it in a table is so that the table can be observed by those who might not have access to the cluster and they can then query the table through Ambari views etc. With that in mind would you suggest the best solution is to use Oozie Java actions?