Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

question about NIFI Lookup service

avatar
Rising Star

Dear all, 

I want to ask the difference between `insert entire record` and `insert record fields` for the record result contents.

When i use `insert record fields`, i got error:

jarviszzzz_0-1709716821644.png

but returned ID is what I want:

jarviszzzz_2-1709716929517.png

 

 

When i use `insert entire record`, i did not get any errors, but in stead i have this output

jarviszzzz_1-1709716893106.png

 

How can I get the correct output that I want without any errors?

 

1 ACCEPTED SOLUTION

avatar

Hi ,

Sorry for the delay. I think the confusion arises from the fact that there is a lot that this processor is trying to  do. To better understand, lets go through different scenarios. Lets assume my initial input is the following:

 

<?xml version="1.0" encoding="UTF-8"?>
<lookup>
	<table >
		<tbody>
			<Id>1</Id>
		</tbody>
		<encrich>
   </encrich>
	</table>
</lookup>

 

While The DatabaseRecordLookupService is configured as follows:

SAMSAL_4-1709895224092.png

Here are the scenarios:

 

1- Insert Record Fields. Path doesnt exist in the input:

SAMSAL_6-1709895763533.png

Result:

 

Failed to process FlowFile ...: java.lang.NullPointerException

 

 

2- Insert Record Fields. Path does exist, Field exists:

 

...
<encrich>
   <EnrichCol/>
</encrich>
...

 

SAMSAL_13-1709899152741.png

Result: Success and the value will  be placed in the Enriched Field :

 

<enrich>
         <EnrichCol>SomeValue</EnrichCol>
</enrich>

 

So when using Insert Record Fields, the field has to exist in the input xml  under the specified Result Record Path to populate correctly, otherwise you will get an error.

Note: If you specify the field name in the path [/table/enrich/EnrichCol]  its not going to do anything and it will stay empty. Not sure if this is a bug.

3- Insert Record Fields. Path exists , Field doesnt exist:

input

 

 

<enrich>
</enrich>

 

 

config

SAMSAL_7-1709897433703.png

Result: a MapRecord will be placed into the specified path. The reason is the field doesnt exist and the result might contain multiple return columns. If you have a list of fields and they are specified they would be inserted accordingly.

 

 <enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>

 

4- Insert Entire Record.  Path doesnt exist in the input:

input: see original input

config:

SAMSAL_10-1709898083586.png

Result: The record will be inserted regardless ( Remember with Insert Rec Fields this will give an error)

 

<?xml version="1.0" encoding="UTF-8"?>
<lookup>
   <table>
      <tbody>
         <Id>1</Id>
      </tbody>
      <enrich/>
      <newPath>
         <EnrichCol>SomeValue</EnrichCol>
      </newPath>
   </table>
</lookup>

 

5- Insert Entire Record. Path does exist. Field exists:

Input:

 

<enrich>
   <EnrichCol/>
</enrich>

 

config:

SAMSAL_11-1709898270087.png

Result (similar to  scenario # 2):

 

 <enrich>
         <EnrichCol>SomeValue</EnrichCol>
  </enrich>

 

6- Insert Entire Record , Path exists, field doesnt exist

 

<enrich>
</enrich>

 

SAMSAL_12-1709898485104.png

Result (Similar to Scenario # 3):

 

<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>

 

 

Conclusion:  There is definitely an overlap between Insert Entire Fields and Insert Entire Records. The only difference is with Insert Entire Record if the path doesnt exist it will be created regardless, while the Insert Entire Fields will give a Null Exception Error. I can see its a bit confusing and misleading with possible bug when the the full path to the field is specified and exists. I think this processor needs to be broken up into multiple ( at least 2 ) to simplify and avoid this kind of ambiguity and overlap. Single Responsibility Principle really needs to applied here.

If that helps please accept solution.

Thanks

 

 

 

View solution in original post

8 REPLIES 8

avatar

Hi,

Can you provide screenshot of the lookup record processor and the lookup record service configurations?

avatar
Rising Star

Dear @SAMSAL ,Sure, please refer to the screenshot

jarviszzzz_0-1709780515676.png

jarviszzzz_1-1709780526906.png

 

 

avatar
Rising Star

Dear @SAMSAL,

Do you need additional details? Please do let me know, thank you so much in advance

avatar

Hi ,

Sorry for the delay. I think the confusion arises from the fact that there is a lot that this processor is trying to  do. To better understand, lets go through different scenarios. Lets assume my initial input is the following:

 

<?xml version="1.0" encoding="UTF-8"?>
<lookup>
	<table >
		<tbody>
			<Id>1</Id>
		</tbody>
		<encrich>
   </encrich>
	</table>
</lookup>

 

While The DatabaseRecordLookupService is configured as follows:

SAMSAL_4-1709895224092.png

Here are the scenarios:

 

1- Insert Record Fields. Path doesnt exist in the input:

SAMSAL_6-1709895763533.png

Result:

 

Failed to process FlowFile ...: java.lang.NullPointerException

 

 

2- Insert Record Fields. Path does exist, Field exists:

 

...
<encrich>
   <EnrichCol/>
</encrich>
...

 

SAMSAL_13-1709899152741.png

Result: Success and the value will  be placed in the Enriched Field :

 

<enrich>
         <EnrichCol>SomeValue</EnrichCol>
</enrich>

 

So when using Insert Record Fields, the field has to exist in the input xml  under the specified Result Record Path to populate correctly, otherwise you will get an error.

Note: If you specify the field name in the path [/table/enrich/EnrichCol]  its not going to do anything and it will stay empty. Not sure if this is a bug.

3- Insert Record Fields. Path exists , Field doesnt exist:

input

 

 

<enrich>
</enrich>

 

 

config

SAMSAL_7-1709897433703.png

Result: a MapRecord will be placed into the specified path. The reason is the field doesnt exist and the result might contain multiple return columns. If you have a list of fields and they are specified they would be inserted accordingly.

 

 <enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>

 

4- Insert Entire Record.  Path doesnt exist in the input:

input: see original input

config:

SAMSAL_10-1709898083586.png

Result: The record will be inserted regardless ( Remember with Insert Rec Fields this will give an error)

 

<?xml version="1.0" encoding="UTF-8"?>
<lookup>
   <table>
      <tbody>
         <Id>1</Id>
      </tbody>
      <enrich/>
      <newPath>
         <EnrichCol>SomeValue</EnrichCol>
      </newPath>
   </table>
</lookup>

 

5- Insert Entire Record. Path does exist. Field exists:

Input:

 

<enrich>
   <EnrichCol/>
</enrich>

 

config:

SAMSAL_11-1709898270087.png

Result (similar to  scenario # 2):

 

 <enrich>
         <EnrichCol>SomeValue</EnrichCol>
  </enrich>

 

6- Insert Entire Record , Path exists, field doesnt exist

 

<enrich>
</enrich>

 

SAMSAL_12-1709898485104.png

Result (Similar to Scenario # 3):

 

<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>

 

 

Conclusion:  There is definitely an overlap between Insert Entire Fields and Insert Entire Records. The only difference is with Insert Entire Record if the path doesnt exist it will be created regardless, while the Insert Entire Fields will give a Null Exception Error. I can see its a bit confusing and misleading with possible bug when the the full path to the field is specified and exists. I think this processor needs to be broken up into multiple ( at least 2 ) to simplify and avoid this kind of ambiguity and overlap. Single Responsibility Principle really needs to applied here.

If that helps please accept solution.

Thanks

 

 

 

avatar
Rising Star

Dear @SAMSAL ,

Thanks for your detail explanation.

But I do have a additional question about this, and I do not understand why.

Still using 'INSERT RECORD FIELD'

Suppose I have 10000 records in a flowfile

CASE 1:

I split the flowfile records into multiple flowfiles, each containing only one record. Therefore, there are a total of 10,000 flowfiles entering the Lookup service.

Output: No errors were encountered.

 

CASE 2:

If my flowfile contains more than one record, specifically 10,000 records, I encounter the following errors.

jarviszzzz_0-1710140868127.png

So I attempted to address this error by splitting my flowfile into one record per flowfile. However, this significantly increased my processing time.

Please also note that all records are identical in both cases.  May I know why is this happening haha, this really confused me. 

avatar

Hi,

Can you provide me with sample input so I can test against and verify that its not Result RecordPath issue?

avatar
Rising Star

Dear @SAMSAL ,

Since the data is strictly confidential, I will share it with you in private message. Thanks

avatar
Rising Star

@SAMSAL Hi Samsal, Problem solved by using regex to extract maprecord{[xxxxx]}, Thanks for your details explaination about the lookup service