Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to put Json data as a Json format in HBase

avatar

Please tell me how to store json multiple line data in hbase from NiFi.

1 ACCEPTED SOLUTION

avatar
Master Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

View solution in original post

13 REPLIES 13

avatar
@umang s

Can you please share some more details about your use case?

avatar

Here is the sample json data @Shu

{"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id" : "412","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

avatar
Master Guru
@umang s

Could you please mention how you are expecting to see the above record in hbase.
i.e same row key for both json data?

avatar

{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"604","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

I'm expecting this type of output for above json data

65444-65433-temp.jpg

avatar
Master Guru

@umang s

You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid.

Example:-

my input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

PutHbasecell configs:-

65430-hbasecell.png

as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table.

Output:-

hbase(main):008:0> scan 'test'
ROW                                             COLUMN+CELL
 c7ca74ad-4933-4340-a9c7-e55370a4501b           column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin
                                                g/Fabric/Apparel Fabric"}
1 row(s) in 0.1130 seconds

65431-hbasecell.png

Case2:-

If your input json document is

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Then in hbase the document looks like

65432-hbasecell-case2.png

avatar

My input is like case 2 and want output should be

65433-temp.jpg

I have used PutHbaseCell processor but it store two ids in one row.i want to store on different row

avatar
Master Guru

@umang s

I think your input json messages are enclosed in an array [] like

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

In this case use Split Json processor before PutHbasecell processor with below configs

65436-splitjson.png

Use Splits relation from splitjson processor to PutHbase cell processor in this case Split json processor splits array of json messages to individual messages.

Input:-

[{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}]

Output:-
flowfile1:-

{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
Flowfile2:-
{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

avatar

I have tried this solution but it inserts only last record i.e record with "id":412

avatar
Master Guru
@umang s

Are you using UUID as Row Identifier?

Could you please share your PutHbaseCell processor configs..