Created 08-01-2023 03:56 AM
hi i have such a problem, i have a flow that downloads a csv file with data and there are columns like:,system name, surname, phone, mail, card name, proffesion. The file looks like this only data is more of course
system | name | surname | phone | proffesion | department | |
ABA | John | Wane | 77777777 | j.wane@gmail.com | accountant | accountancy |
BCA | Lilly | Black | 89898989 | l.black@gmail.com | recruiter | hr |
I am reading this file record by record and for each record
I need to query another system via REST using first name, last name, profession and department, the query looks something like this:
https://system address/data/?name=${name}&surname=${surname}&proffesion=${proffesion}&department=${department}
I use Invoke HTTP for this
and now this system should return me json with data:
{
"date": [
{
"name": "John",
"surname": "Wane",
"profession": "ACC",
"department":"ACC01"
}
}
As you can see in the answer I get a different profession and department. How can I now create a json for each record that will contain data from the csv file, i.e. system, name, surname, phone mail, but proffesion and department will take the response from the second system that came with jason? So that I have the following result:
{
"system": "ABA",(data from csv)
"phone": 7777777,(data from csv)
"mail": "j.wane@gmail.com" (data from csv)
"name": "John", (data from csv)
"surname": "Wane", (data from csv)
"profession": "ACC", (data from json response from other system)
"department":"ACC01" (data from json from other system)
}
In a word, how can I combine the data from the input csv file with the data coming from the response from another system?
Created 08-01-2023 06:47 AM
@MWM,
What I would try is the following:
- Once you extracted your CSV File, I would split that CSV File into smaller chunks (1Record per FlowFile) using SplitRecord. I would write the output in JSON Format to further help you in your use case.
- Having 1 record per flow file and in JSON Format, I would go ahead and use EvaluateJsonPath, to extract the values from within the FlowFile and save them as attributes in my FlowFile.
- Next, I would call your API using InvokeHTTP, as you are already doing.
- The output of the InvokeHTTP will be a JSON, containing the response. From here, as you have the file as JSON, you can use use another EvaluateJsonPath and extract the newly added content as attributes to your FlowFile, while keeping the old attributes as well.
- With the new data extracted as attributes, I would add an AttributesToJson Processor and build my new data using all the extracted attributes.
Created 08-01-2023 06:54 AM
Hi @MWM ,
What you are describing is a classical data enrichment pattern that can be achieved using ForkEnrichment & JoinEnrichment processors. For more information on this please refer to :
Based on your scenario the "SQL" strategy of the JoinEnrichment will work the best for you since you can select the main fields (system, name, surname, phone, mail) from the original flowfile data and select proffesion and department from enrichment result:
SELECT o.system, o.name, e.surname,o.phone, o.mail, e.profession, e.department
FROM original o
LEFT OUTER JOIN enrichment e ON o.name= e.name
Since you are splitting the CSV and enrich per record then you can just join by name. If you have an API where you can get a collection of user information then you dont have to split and you can do the enrichment on multiple records from the CSV vs. returned records from the API json output , however be aware that if you have large data set this strategy "... can lead to heap exhaustion and cause stability problems or OutOfMemoryErrors to occur". Please review the link above to see how this can be mitigated.
If you find this is helpful please accept solution.
Thanks