Created on 03-06-2024 01:22 AM - edited 03-06-2024 01:32 AM
Dear all,
I want to ask the difference between `insert entire record` and `insert record fields` for the record result contents.
When i use `insert record fields`, i got error:
but returned ID is what I want:
When i use `insert entire record`, i did not get any errors, but in stead i have this output
How can I get the correct output that I want without any errors?
Created 03-08-2024 04:17 AM
Hi ,
Sorry for the delay. I think the confusion arises from the fact that there is a lot that this processor is trying to do. To better understand, lets go through different scenarios. Lets assume my initial input is the following:
<?xml version="1.0" encoding="UTF-8"?>
<lookup>
<table >
<tbody>
<Id>1</Id>
</tbody>
<encrich>
</encrich>
</table>
</lookup>
While The DatabaseRecordLookupService is configured as follows:
Here are the scenarios:
1- Insert Record Fields. Path doesnt exist in the input:
Result:
Failed to process FlowFile ...: java.lang.NullPointerException
2- Insert Record Fields. Path does exist, Field exists:
...
<encrich>
<EnrichCol/>
</encrich>
...
Result: Success and the value will be placed in the Enriched Field :
<enrich>
<EnrichCol>SomeValue</EnrichCol>
</enrich>
So when using Insert Record Fields, the field has to exist in the input xml under the specified Result Record Path to populate correctly, otherwise you will get an error.
Note: If you specify the field name in the path [/table/enrich/EnrichCol] its not going to do anything and it will stay empty. Not sure if this is a bug.
3- Insert Record Fields. Path exists , Field doesnt exist:
input
<enrich>
</enrich>
config
Result: a MapRecord will be placed into the specified path. The reason is the field doesnt exist and the result might contain multiple return columns. If you have a list of fields and they are specified they would be inserted accordingly.
<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>
4- Insert Entire Record. Path doesnt exist in the input:
input: see original input
config:
Result: The record will be inserted regardless ( Remember with Insert Rec Fields this will give an error)
<?xml version="1.0" encoding="UTF-8"?>
<lookup>
<table>
<tbody>
<Id>1</Id>
</tbody>
<enrich/>
<newPath>
<EnrichCol>SomeValue</EnrichCol>
</newPath>
</table>
</lookup>
5- Insert Entire Record. Path does exist. Field exists:
Input:
<enrich>
<EnrichCol/>
</enrich>
config:
Result (similar to scenario # 2):
<enrich>
<EnrichCol>SomeValue</EnrichCol>
</enrich>
6- Insert Entire Record , Path exists, field doesnt exist
<enrich>
</enrich>
Result (Similar to Scenario # 3):
<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>
Conclusion: There is definitely an overlap between Insert Entire Fields and Insert Entire Records. The only difference is with Insert Entire Record if the path doesnt exist it will be created regardless, while the Insert Entire Fields will give a Null Exception Error. I can see its a bit confusing and misleading with possible bug when the the full path to the field is specified and exists. I think this processor needs to be broken up into multiple ( at least 2 ) to simplify and avoid this kind of ambiguity and overlap. Single Responsibility Principle really needs to applied here.
If that helps please accept solution.
Thanks
Created 03-06-2024 04:07 AM
Hi,
Can you provide screenshot of the lookup record processor and the lookup record service configurations?
Created 03-06-2024 07:02 PM
Created 03-08-2024 01:38 AM
Dear @SAMSAL,
Do you need additional details? Please do let me know, thank you so much in advance
Created 03-08-2024 04:17 AM
Hi ,
Sorry for the delay. I think the confusion arises from the fact that there is a lot that this processor is trying to do. To better understand, lets go through different scenarios. Lets assume my initial input is the following:
<?xml version="1.0" encoding="UTF-8"?>
<lookup>
<table >
<tbody>
<Id>1</Id>
</tbody>
<encrich>
</encrich>
</table>
</lookup>
While The DatabaseRecordLookupService is configured as follows:
Here are the scenarios:
1- Insert Record Fields. Path doesnt exist in the input:
Result:
Failed to process FlowFile ...: java.lang.NullPointerException
2- Insert Record Fields. Path does exist, Field exists:
...
<encrich>
<EnrichCol/>
</encrich>
...
Result: Success and the value will be placed in the Enriched Field :
<enrich>
<EnrichCol>SomeValue</EnrichCol>
</enrich>
So when using Insert Record Fields, the field has to exist in the input xml under the specified Result Record Path to populate correctly, otherwise you will get an error.
Note: If you specify the field name in the path [/table/enrich/EnrichCol] its not going to do anything and it will stay empty. Not sure if this is a bug.
3- Insert Record Fields. Path exists , Field doesnt exist:
input
<enrich>
</enrich>
config
Result: a MapRecord will be placed into the specified path. The reason is the field doesnt exist and the result might contain multiple return columns. If you have a list of fields and they are specified they would be inserted accordingly.
<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>
4- Insert Entire Record. Path doesnt exist in the input:
input: see original input
config:
Result: The record will be inserted regardless ( Remember with Insert Rec Fields this will give an error)
<?xml version="1.0" encoding="UTF-8"?>
<lookup>
<table>
<tbody>
<Id>1</Id>
</tbody>
<enrich/>
<newPath>
<EnrichCol>SomeValue</EnrichCol>
</newPath>
</table>
</lookup>
5- Insert Entire Record. Path does exist. Field exists:
Input:
<enrich>
<EnrichCol/>
</enrich>
config:
Result (similar to scenario # 2):
<enrich>
<EnrichCol>SomeValue</EnrichCol>
</enrich>
6- Insert Entire Record , Path exists, field doesnt exist
<enrich>
</enrich>
Result (Similar to Scenario # 3):
<enrich>MapRecord[{EnrichCol=SomeValue}]</enrich>
Conclusion: There is definitely an overlap between Insert Entire Fields and Insert Entire Records. The only difference is with Insert Entire Record if the path doesnt exist it will be created regardless, while the Insert Entire Fields will give a Null Exception Error. I can see its a bit confusing and misleading with possible bug when the the full path to the field is specified and exists. I think this processor needs to be broken up into multiple ( at least 2 ) to simplify and avoid this kind of ambiguity and overlap. Single Responsibility Principle really needs to applied here.
If that helps please accept solution.
Thanks
Created on 03-11-2024 12:18 AM - edited 03-11-2024 12:21 AM
Dear @SAMSAL ,
Thanks for your detail explanation.
But I do have a additional question about this, and I do not understand why.
Still using 'INSERT RECORD FIELD'
Suppose I have 10000 records in a flowfile
CASE 1:
I split the flowfile records into multiple flowfiles, each containing only one record. Therefore, there are a total of 10,000 flowfiles entering the Lookup service.
Output: No errors were encountered.
CASE 2:
If my flowfile contains more than one record, specifically 10,000 records, I encounter the following errors.
So I attempted to address this error by splitting my flowfile into one record per flowfile. However, this significantly increased my processing time.
Please also note that all records are identical in both cases. May I know why is this happening haha, this really confused me.
Created 03-11-2024 03:27 AM
Hi,
Can you provide me with sample input so I can test against and verify that its not Result RecordPath issue?
Created 03-11-2024 11:09 PM
Dear @SAMSAL ,
Since the data is strictly confidential, I will share it with you in private message. Thanks
Created 04-08-2024 08:04 PM
@SAMSAL Hi Samsal, Problem solved by using regex to extract maprecord{[xxxxx]}, Thanks for your details explaination about the lookup service