Created 09-27-2017 06:35 AM
I am trying to incrementally import data to s3 usiing NIFI.
Workflow:
1. generatetablefetch
2.executesql
3.splitavro
4.avrotojson
5.s3
Now, I need to know, that when a row updates, updated_at column updates, which I am using for incremental check column. How would it merge that record with the older record in s3. Is that something that can be handled at nifi level or I need to do that on my own?
How do people usually do such a thing while using nifi? Coming from sqoop background, there always was a --merge-key attribute that merge rows with same value of that column.
Created 09-27-2017 04:28 PM
s3 is an object store, so based on the key you setup when you push a new value to s3, it will automatically update the data to the new value. after you get json, may be do a evaluatejsonpath to extract the key to an attribute. Then use the attribute as your key.
Created 10-03-2017 11:03 AM
@Karthik Narayanan Alright. In that case, I get the key. what about rest of the flow file content that I intent to store as it is? Could you share an example?
Created 10-03-2017 02:05 PM
when you call puts3, the flowfile content will get saved as the content for the extracted key. You do not need to anything beyong just extracting the key as a flowfile attribute.
Created 10-03-2017 12:15 PM
@Karthik Narayanan: In put s3, I used ${entity_id} as key which was added as attribute in evaluateJSONpath processor. but I keep getting this error:
Key is not expected for the GET method ?uploads subresource (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: A77D158DEC004A8A
Created 10-03-2017 02:04 PM
@Simran Kaur look at this thread. it may help you resolve this issue. https://community.hortonworks.com/questions/72628/error-in-puts3object.html. But it looks even with this error data would be getting saved in s3.