Created on 10-16-2022 01:40 AM - edited 10-16-2022 01:59 AM
Hi Team,
Using QueryRecord to filter the incoming array of objects where each object has an array field.
The field can be an empty array or it contains some objects in it but it is not null. Tried a lot of SQL queries but didn't succeed. Also, I am trying to consider Record based approach as I want to work on multiple flow files instead of single ones
In Actual flow wanted to use ForkRecord on this array but it outputs an empty array if any object contains an empty array in the path so had to use QueryRecord before ForkRecord to filter those empty arrays. Not sure if it's the bug/expected behavior in ForkRecord
Flow:
GenerateFlowFile:
Custom Text:
[
{
"fields": [
{
"field": "field1"
},
{
"field": "field1"
}
]
},
{
"fields": []
},
{
"fields": [
{
"field": "field2"
},
{
"field": "field2"
}
]
},
{
"fields": [
{
"field": "field3"
},
{
"field": "field3"
}
]
}
]
Expected output: To filter out the 2nd object whose field is an empty array
QueryRecord:
Record Reader and Writer services are set to infer the schema
Solutions Tried:
1.
2.
3.
No Error but didn't filter out
4.
5.
No Error but didn't filter out
Using RPATH I was not able to pick specifically the first element to check if it exists or not
Got stuck because of this particular processor. Please help me out with the problem and let me know if any better way to do this with other processors. I'm avoiding working with single-flow files as am aware of the EvaluateJsonPath -> RouteOnAttribute solution and want to work specifically on Record based.
Created 10-16-2022 08:24 AM
Hi,
I think you were close with the last option but did not use the correct syntax with null values where instead of '<> null', use 'is not null' as follows:
select * from FLOWFILE where RPATH(fields, '/field') is not null
However when I run that I got the correct result but for some reason a MapRecord Syntax is added to each array element :
[ {
"fields" : [ "MapRecord[{field=field1}]", "MapRecord[{field=field1}]" ]
}, {
"fields" : [ "MapRecord[{field=field2}]", "MapRecord[{field=field2}]" ]
}, {
"fields" : [ "MapRecord[{field=field3}]", "MapRecord[{field=field3}]" ]
} ]
Not sure why this happens but that should get you close to what you are trying to accomplish.
Hope that helps. If it does, please accept solution.
Thanks
Created 10-16-2022 07:38 AM
I have resolved the issue using ScriptedFilterRecord. I thought it is similar to the ExecuteScript processor and might include a long script with boiler code but in the former processor a single-line code did the job
Would still like to know if QueryRecord can solve this problem or not..
Created 10-16-2022 08:24 AM
Hi,
I think you were close with the last option but did not use the correct syntax with null values where instead of '<> null', use 'is not null' as follows:
select * from FLOWFILE where RPATH(fields, '/field') is not null
However when I run that I got the correct result but for some reason a MapRecord Syntax is added to each array element :
[ {
"fields" : [ "MapRecord[{field=field1}]", "MapRecord[{field=field1}]" ]
}, {
"fields" : [ "MapRecord[{field=field2}]", "MapRecord[{field=field2}]" ]
}, {
"fields" : [ "MapRecord[{field=field3}]", "MapRecord[{field=field3}]" ]
} ]
Not sure why this happens but that should get you close to what you are trying to accomplish.
Hope that helps. If it does, please accept solution.
Thanks
Created 10-27-2022 06:20 AM
Thanks for the reply, it worked 🙂
Also, I got the expected output and didn't see the `MapRecord` as displayed in your output
Thanks!