Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Master Guru

Short Description:

This Tutorial describes how to use DeleteHBaseRow processor to delete one or more row keys in HBase table.

Thanks alot @Ed Berezitsky helping out to write this article.
Introduction:

NiFi 1.6 introduced DeleteHbaseRow processor, based on the row key values provided to this Processor, deletes those row key/s in Hbase table.

This processor does deleteall operation on the row key that we intended to delete from the hbase table.

This Processor works with row key values are presented as FlowFile Content (or) FlowFile attributes.

We can select Row ID Location in this processor one of the below ways, by default this property configured to have FlowFile content as the value.

1. FlowFile content -Get the row key(s) from the flowfile content.

2. FlowFile attributes -Get the row key from an expression language statement.

2.1. Having one row key value associated with the flowfile.

2.2. Having more than one row key values associated with the flowfile.

DeleteHBaseRow Processor Params:

Client Controller: Configure and enable HBase_1_1_2_ClientService controller service.

Table Name: HBase Table name supporting Expression Language.

Row Identifier: Row key to be deleted when Row ID Location set to ‘Flow file attribute’. Value will be ignored if a location set to ‘Content’. Note, it doesn’t support list of values.

Row ID Location: Source of Row ID(s) to be deleted, either content of attribute.

Flow File Fetch Count: Amount of flow files to be consumed from incoming connection(s) to be combined in single run.

Batch Size: Max number of deletes to send per batch. Actual Batch size won’t exceed number of row keys in a content of each flow file.

Delete Row Key Separator: Specify Delimiter and supports REGEX, Expression Language.

Character Set: Character set used to encode the row key for HBase.

1.Delete HBase Row/s based on Flow File content:

Delete Row Key Separator specifies delimiter for a list of row keys. It could be any value, including new line character.

Example:

Flow:

72568-delete-hbase-rows-ff-content.png

Explanation:
Generate Row Key(s) to delete using GenerateFlowFile processor:

I have used Generate FlowFile processor with custom text has all row key/s with comma separated 1,2,3,4,5,6.

72569-1gnf.png

Then we need to feed flow file to DeleteHBaseRow processor.
DeleteHBaseRow Processor:

As we are having comma separated list of row key values as Flowfile contents,

Configure and enable HbaseClientService

Configure DeleteHBaseRow processor as following

72570-12deletehbaserow-configs.png

Once the deletion is done the processor will routes the flowfile(s) to success relation and adds two new attributes to each flowfile (these write attributes are added only when we are using Row ID Location as flowfile-content).

Write Attributes:

rowkey.start The first rowkey in the flowfile. Only written when using the flowfile's content for the row IDs.

rowkey.end The last rowkey in the flowfile. Only written when using the flowfile's content for the row IDs.

72571-output-ff-attributes.png

rowkey.start and rowkey.end attributes are added to the flowfile with first and last values of flowfile content i.e our flowfile contents are 1,2,3,4,5,6 so rowkey.start value is 1 and rowkey.end value is 6.

**Note**

If we try to delete a rowkey that doesn’t exist in hbase table also this processor won’t throw any error message i.e. if we specify 99 value in our flowfile contents as we are not having 99 as rowkey value still processor doesn’t show any error message.

Reference flow.xml for deletehbaserow from flowfile content
1delete-hbase-row-s-based-on-flow-file-content.xml

How to Configure DeleteHbaseRow processor for Other Seperators/Delimiters?

With Multi Separator/Delimiter:

In this file we are having multi separator as colon and comma(:,)

1:,2:,3

DeleteHbaseRow Configs:

Keep Delete Row Key separator value as :,

72573-multiple-delimiters.png

With Newline separator:-

Configure Delete Row Key Separator as shift+enter (or) \n

72574-newline-delimiter.png

2.Row ID Location FlowFile Attributes:

2.1:If we are having one row key value as attribute to the flowfile:

If we are having row key to delete from Hbase table as flowfile attribute then we can use expression language.

72575-2delete-row-key-ff-attributes.png

Explanation:

GenerateFlowFile Configs:

72576-21-gnf.png

Add new properties as tab_name and row_key attributes with values delete_hbase_demo and 1 to the flowfile.

DeleteHbaseRow Configs:

Now we can configure DeleteHbaseRow processor with expression language so that processor gets tab_name and row_key values from the flowfile attributes and perform deletions dynamically.

72577-2deletehbaserow-configs.png

Reference flow.xml for deletehbaserow from flowfile attribute

2delete-hbase-row-from-flow-file-attribute.xml

2.2. If we are having single/multiple row key values as attribute to the flowfile:

DeleteHbaseRow processor doesn’t support for comma separated list of values presented as flowfile attributes. Here is workaround example on how to delete row keys without changing flow file content.

- Using expression language with indexof and ifElse functions loop through all the list of row_keys values

Flow:

72579-22-multiple-rowkeys-ff-attribute.png

Explanation:
GenerateFlowFile configs:

72580-22-gnf.png

Add new properties as tab_name and row_key attributes with values delete_hbase_demo and 1,2,3,4,5 to the flowfile.

RouteOnAttribute Configs:

72581-22-roa.png

Add new property to check row_keys attribute value Null or empty and auto terminate this empty relationship.

Feed the unmatched relationship from RouteOnAttribute processor to DeleteHBaseRow processor.

DeleteHBaseRow Configs:

Configure and enable the controller service.

Configure DeleteHBaseRow processor as following:
Row Identifier property value as

${row_keys:indexOf(','):equals(-1):ifElse('${row_keys}','${row_keys:substringBefore(",")}')} //check the indexof delimiter if equals to -1 then use row_keys(one value in row_keys attribute else use the value before , and delete that row key in hbase table.

72582-22-delete-hbaserow-configs.png

Fork the success relationship from DeleteHBaseRow processor

72583-deletehbase-processor-forks.png

Fork1 of Success relationship:-

UpdateAttribute Configs:-

Configure the processor as following

row_keys property with value as

${row_keys:indexOf(','):equals(-1):ifElse('','${row_keys:substringAfter(",")}')} //if index of “,” equals -1  then‘’(empty value set) else update the row_keys attribute value with substringAfter “,”

72584-22-fork1.png

This loop will continue until all the row_keys values will be deleted in the Hbase table.

Fork2-Success relationship:

Use this relationship for further processing.

Reference flow.xml for DeleteHBaseRow having list of row_keys as flowfile attribute

22delete-list-of-row-keys-as-attribute-values.xml

Create and put data into hbase table:-

bash$ hbase shell
hbase> create 'delete_hbase_demo','cf'
hbase> put 'delete_hbase_demo','1','cf:name','foo' hbase> put 'delete_hbase_demo','2','cf:name','bar' hbase> put 'delete_hbase_demo','3','cf:name','foo' hbase> put 'delete_hbase_demo','4','cf:name','bar' hbase> put 'delete_hbase_demo','5','cf:name','foo' hbase> put 'delete_hbase_demo','6','cf:name','bar' hbase> scan 'delete_hbase_demo'
4,809 Views