Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

merge key during incremental import through nifi

Expert Contributor

I am trying to incrementally import data to s3 usiing NIFI.


1. generatetablefetch





Now, I need to know, that when a row updates, updated_at column updates, which I am using for incremental check column. How would it merge that record with the older record in s3. Is that something that can be handled at nifi level or I need to do that on my own?

How do people usually do such a thing while using nifi? Coming from sqoop background, there always was a --merge-key attribute that merge rows with same value of that column.


Expert Contributor

s3 is an object store, so based on the key you setup when you push a new value to s3, it will automatically update the data to the new value. after you get json, may be do a evaluatejsonpath to extract the key to an attribute. Then use the attribute as your key.

Expert Contributor

@Karthik Narayanan Alright. In that case, I get the key. what about rest of the flow file content that I intent to store as it is? Could you share an example?

Expert Contributor

when you call puts3, the flowfile content will get saved as the content for the extracted key. You do not need to anything beyong just extracting the key as a flowfile attribute.

Expert Contributor

@Karthik Narayanan: In put s3, I used ${entity_id} as key which was added as attribute in evaluateJSONpath processor. but I keep getting this error:

Key is not expected for the GET method ?uploads subresource (Service: Amazon S3; Status Code: 400; Error Code: InvalidRequest; Request ID: A77D158DEC004A8A

Expert Contributor

@Simran Kaur look at this thread. it may help you resolve this issue. But it looks even with this error data would be getting saved in s3.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.