Support Questions

Find answers, ask questions, and share your expertise

How to capture both key and value of json data with dynamic key on Nifi?

I have json data listen from JettyWebsocketServer like:

{
	"project_1": {
		"host": "V001",
		"data": [{
			"sensor_code": "temp",
			"value": {
				"max": 26.2,
				"avg": 26.1,
				"min": 26.1
			}
		},
		{
			"sensor_code": "humid",
			"value": {
				"max": 48.8,
				"avg": 48.7,
				"min": 48.6
			}
		}]
	}
}

Because my data from multiple project, so I have dynamic key as: project_1, project_2. And I have to route those data into corresponding destination.

I found EvaluateJsonPath to parse Json data but it need static key.

There is any way to capture both key and value of it to route data by key?

1 ACCEPTED SOLUTION

@Kiem Nguyen

you don't need to do all this. I think it's a bad idea to change your JSON data just to be able to parse it. You can have surprise when your application evolve.

So as a summary:

  • Use my solution above to extract your key
  • Use the below solution to extract the content
  • With Key and Value in hand, use put MongoDB to store your data in the right collection

To extract your JSON content, use evaluatejsonpath with the following configuration. Add a dynamic key my_content and use this JSON path $.*

41436-screen-shot-2017-10-20-at-72515-am.png

You get all the data that you are looking for:

41437-screen-shot-2017-10-20-at-72530-am.png

Can you try this an let me know if it works ?

Thanks

View solution in original post

12 REPLIES 12

Super Guru
@Kiem Nguyen

you can use RouteonContent processor and change

Match Requirement property to Content must contain Match

and add the below properties

1.project_1 as .*project_1.*
2.project_2 as .*project_2.*

39835-routeoncontent.png

This processor will checks the content of flowfile if it is having project_1 in it it transfers the ff to project_1 relationship, if the content having project_2 then it transfers to project_2 relationship.

Flow Screenshot:-

39836-flow.png

As shown in the above screenshot you can use the relationships project_1,project_2 and store or process those content as per your requirements.

Thanks @Shu

Please let me clarify my problem. Because I don't know key before. So I need to create a FlowFile which can run with dynamic key without stopping Job when having a data with new key.

If we know certainly list of key, I think we can use EvaluateJsonPath to pull key into Attribute and use RouteOnAttribute as alternative way.

Again, my problem is I don't know when new key incoming to stop Job and add more key into RouteOnContent as your way.

Hi @Kiem Nguyen

What will be the rest of the flow after routing? can you give more details?

Hi @Abdelkrim Hadjidj

After routing, the rest of flow is json string like:

        {
		"host": "V001",
		"data": [{
			"sensor_code": "temp",
			"value": {
				"max": 26.2,
				"avg": 26.1,
				"min": 26.1
			}
		},
		{
			"sensor_code": "humid",
			"value": {
				"max": 48.8,
				"avg": 48.7,
				"min": 48.6
			}
		}]
	}<br>

It is value of key "project_1", "project_2",....

For my case, I will use key as database name on MongoDB. Example:

Value of data from project_1 will be insert into database project_1.

Value of data from project_2 will be insert into database project_2.

Because PutMongo processor of Nifi version 1.4.0 currently supports expression language.

So if I can capture key of data I can insert into corresponding database using ${key}

Please help me if you have any suggestion.

Thanks,

Kiem

Hi @Kiem Nguyen

In this case you can use ExtractText to scan your flow content and extract the name of the project. With the right regular expression you can extract your key and store in an attribute projectname.

Once you have this, use it in the PutMongodb to update the required collection using ${projectname} for the Mongo Collection Name.

Can you try this and let me know if this works ?

Thanks

@Kiem Nguyen

I am not a Regex expert but you can test this and work on it to have a better implementation.

ExtractText with following Regex (\w+)

40888-screen-shot-2017-10-18-at-121754-pm.png

You can see that there's an attribute projectname which contains project_1

40889-screen-shot-2017-10-18-at-121814-pm.png

Thanks @Abdelkrim Hadjidj,

Although using regex with dynamic key seem difficult for me. I will try this.

However as you see, the data to insert into MongoDB is value of key, it is not whole incoming data. (not include "project_1"...)

So we need to capture this data too. And it will be very difficult to obtain value of key if also using regex.

I need to capture both key and value separately. 😞

@Kiem Nguyen Since you have the key with this regex, you can use it to get value with EvaluateJsonPath.

@Abdelkrim Hadjidj,

As your suggestion, I already got project_1, project_2... under the name: projectname. And it is an attribute of incoming flowfile for EvaluateJsonPath.

But when I try to get value of above key, I add property content with value: $.${projectname}

Output flowfile of EvaluateJsonPath has content attribute is empty string. I think EvaluateJsonPath uses JsonPath expression and It doesn't know variable "projectname" attribute of incoming flowfile.

@Abdelkrim Hadjidj

Today I have a solution for just my case:
Because the key of our data is unique string on json data. So here my step to capture both key and value of json data:
Step 1: Using ExtractText to obtain key (project_1, project_2,...)
Step 2: Using ReplaceText to replace key on json data by a static key (project_1, project_2,... string will be replaced by content string.

Step 3: Using EvaluateJsonPath to obtain data of content above and transform it to flowfile-content for next processor.

Step 4: Using PutMongoDB to insert data into corresponding database using Expression Regex {projectname} with data above.

As alternatively way, we also can use ExecuteScript to convert key and value of json data to attribute then insert into MongoDB. But as I running, the throughput when use the first way is higher then using ExecuteScript. Here my code when using ExecuteScript:

#!/usr/bin/python3
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class StreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        data = json.loads(text)
        for key in data:
            first = key
            break
        content = data[first]
        contentString = json.dumps(content)
        viewFlowFile = session.create()
        viewFlowFile = session.putAllAttributes(viewFlowFile,
                {'project': str(first), 'content': contentString})
        session.transfer(viewFlowFile, REL_SUCCESS)
flowFile = session.get()
if flowFile != None:
    flowFile = session.write(flowFile, StreamCallback())
    session.transfer(flowFile, REL_FAILURE)
    session.commit()

@Kiem Nguyen

you don't need to do all this. I think it's a bad idea to change your JSON data just to be able to parse it. You can have surprise when your application evolve.

So as a summary:

  • Use my solution above to extract your key
  • Use the below solution to extract the content
  • With Key and Value in hand, use put MongoDB to store your data in the right collection

To extract your JSON content, use evaluatejsonpath with the following configuration. Add a dynamic key my_content and use this JSON path $.*

41436-screen-shot-2017-10-20-at-72515-am.png

You get all the data that you are looking for:

41437-screen-shot-2017-10-20-at-72530-am.png

Can you try this an let me know if it works ?

Thanks

Thanks @Abdelkrim Hadjidj,

It works well. It is interesting when just use $.* to capture content.

I understood more about JsonPath Expression.

Thank you again, :d

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.