Support Questions
Find answers, ask questions, and share your expertise

How to put Json data as a Json format in HBase

Please tell me how to store json multiple line data in hbase from NiFi.

1 ACCEPTED SOLUTION

Super Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

View solution in original post

13 REPLIES 13

@umang s

Can you please share some more details about your use case?

Here is the sample json data @Shu

{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id" : "412","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Super Guru
@umang s

Could you please mention how you are expecting to see the above record in hbase.
i.e same row key for both json data?

{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"604","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

I'm expecting this type of output for above json data

65444-65433-temp.jpg

Super Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

My input is like case 2 and want output should be

65433-temp.jpg

I have used PutHbaseCell processor but it store two ids in one row.i want to store on different row

Super Guru

@umang s

I think your input json messages are enclosed in an array [] like

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

In this case use Split Json processor before PutHbasecell processor with below configs

65436-splitjson.png

Use Splits relation from splitjson processor to PutHbase cell processor in this case Split json processor splits array of json messages to individual messages.

Input:-

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

Output:-
flowfile1:-

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
Flowfile2:-
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

I have tried this solution but it inserts only last record i.e record with "id":412

Super Guru
@umang s

Are you using UUID as Row Identifier?

Could you please share your PutHbaseCell processor configs..

Super Guru

@umang s

That's expected behavior from Hbase table because we are having unique Row Key for each record in Hbase table now. But your data that you are writing to this category1 table is duplicated data and Hbase only overwrites the existing data if it found same Row key already exists in the table. But in your case we are using UUID i.e unique id for the each flowfile in NiFi, so we will have unique id for each flowfile (although the content of the flowfile is same).

65437-puthbasecell.jpg

i hve tried uuid() function but it repeat records like :-

65438-temp.png

New Contributor

Hi @umang_instantwe ,

 

Could you please suggest how the same record can be put in HBase via HBase shell? 

 

{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

 

Ex. Below query does not work.

 

put 'table_name','row_key','cf:n1','{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}', timestamp 

 

Thanks,

Priyanshu

Super Collaborator

Hello @priyanshu_soni 

 

You can skip the "Timestamp" part as the same is inserted by HBase implicitly. I tried the same Query as you, excluding the Timestamp & the same was Successful:

 

hbase(main):018:0> put 'Table_X1','125','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}'
Took 0.0077 seconds                                                                                                                                                                                                                                                                                                           
hbase(main):019:0> scan 'Table_X1'
ROW                                                                              COLUMN+CELL                                                                                                                                                                                                                                  
 125                                                                             column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}                                                                                     
1 row(s)
Took 0.0057 seconds 

As you may see above, the "timestamp" field corresponds to the Epoch Timestamp of the Inserted Row Time of Operation. If you wish to explicitly specify the Timestamp, You can include a EpochTime as shared below:

hbase(main):020:0> put 'Table_X1','126','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}',1621593487680
Took 0.0202 seconds                                                                                                                                                                                                                                                                                                           
hbase(main):021:0> scan 'Table_X1'
ROW                                                                              COLUMN+CELL                                                                                                                                                                                                                                  
 125                                                                             column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}                                                                                     
 126                                                                             column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}                                                                                     
2 row(s)
Took 0.0071 seconds 

 

Let us know if you have any issues with the Put Operation.

 

- Smarak

; ;