Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can Flume be used with HBase? How?

avatar

Hi,

Can anyone please explain if Flume be used with HBase and how we can use it. If Possibly with example to help me understand.

1 ACCEPTED SOLUTION

avatar

I got below answer:

Apache Flume can be used with HBase using one of the two HBase sinks –

  • HBaseSink (org.apache.flume.sink.hbase.HBaseSink) supports secure HBase clusters and also the novel HBase IPC that was introduced in the version HBase 0.96.
  • AsyncHBaseSink (org.apache.flume.sink.hbase.AsyncHBaseSink) has better performance than HBase sink as it can easily make non-blocking calls to HBase.

Working of the HBaseSink –

In HBaseSink, a Flume Event is converted into HBase Increments or Puts. Serializer implements the HBaseEventSerializer which is then instantiated when the sink starts. For every event, sink calls the initialize method in the serializer which then translates the Flume Event into HBase increments and puts to be sent to HBase cluster.

Working of the AsyncHBaseSink-

AsyncHBaseSink implements the AsyncHBaseEventSerializer. The initialize method is called only once by the sink when it starts. Sink invokes the setEvent method and then makes calls to the getIncrements and getActions methods just similar to HBase sink. When the sink stops, the cleanUp method is called by the serializer.

View solution in original post

5 REPLIES 5

avatar

This Apache document is good on Streaming data into Apache HBase using Apache Flume.

avatar

@Rohan Pednekar, thanks for sharing this link.

avatar

I got below answer:

Apache Flume can be used with HBase using one of the two HBase sinks –

  • HBaseSink (org.apache.flume.sink.hbase.HBaseSink) supports secure HBase clusters and also the novel HBase IPC that was introduced in the version HBase 0.96.
  • AsyncHBaseSink (org.apache.flume.sink.hbase.AsyncHBaseSink) has better performance than HBase sink as it can easily make non-blocking calls to HBase.

Working of the HBaseSink –

In HBaseSink, a Flume Event is converted into HBase Increments or Puts. Serializer implements the HBaseEventSerializer which is then instantiated when the sink starts. For every event, sink calls the initialize method in the serializer which then translates the Flume Event into HBase increments and puts to be sent to HBase cluster.

Working of the AsyncHBaseSink-

AsyncHBaseSink implements the AsyncHBaseEventSerializer. The initialize method is called only once by the sink when it starts. Sink invokes the setEvent method and then makes calls to the getIncrements and getActions methods just similar to HBase sink. When the sink stops, the cleanUp method is called by the serializer.

avatar
Master Mentor

Here's info on both HBase sinks in Flume along with examples https://flume.apache.org/FlumeUserGuide.html#hbasesinks

Alternatively, if you're using Phoenix, there's a connector for that https://phoenix.apache.org/flume.html

avatar

@Artem Ervits, thanks for sharing this link.