Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to put Json data as a Json format in HBase

avatar

Please tell me how to store json multiple line data in hbase from NiFi.

1 ACCEPTED SOLUTION

avatar
Master Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

View solution in original post

13 REPLIES 13

avatar
@umang s

Can you please share some more details about your use case?

avatar

Here is the sample json data @Shu

{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id" : "412","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

avatar
Master Guru
@umang s

Could you please mention how you are expecting to see the above record in hbase.
i.e same row key for both json data?

avatar

{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"604","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

I'm expecting this type of output for above json data

65444-65433-temp.jpg

avatar
Master Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

avatar

My input is like case 2 and want output should be

65433-temp.jpg

I have used PutHbaseCell processor but it store two ids in one row.i want to store on different row

avatar
Master Guru

@umang s

I think your input json messages are enclosed in an array [] like

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

In this case use Split Json processor before PutHbasecell processor with below configs

65436-splitjson.png

Use Splits relation from splitjson processor to PutHbase cell processor in this case Split json processor splits array of json messages to individual messages.

Input:-

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

Output:-
flowfile1:-

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
Flowfile2:-
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

avatar

I have tried this solution but it inserts only last record i.e record with "id":412

avatar
Master Guru
@umang s

Are you using UUID as Row Identifier?

Could you please share your PutHbaseCell processor configs..