Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Check if URL has already been Invoked within the Nifi dataflow

Highlighted

Check if URL has already been Invoked within the Nifi dataflow

New Contributor

86627-nifi.png

Hi

I have a question about checking to see if a URL has already been called by the Invoke Process.

Below is the nifi data flow

Step1. GetHTTP Process

Step2: Split Json on $.checks

Step3: Evaluate Json on $.link

Step4: InvokeHttp based on $.link

Step5: PuttHDF and KafkaRecordQueue

The issue I have is that when I first called the GetHttp process the Json file looks like

First Call Returns

{"storeId": "136678",

"dob": "20180122",

"checks": [

{

"id": "20971531",

"printableId": "80001",

"marker": 636704886835750777,

"link": "https://abcd.com/136678/20180122/20971531"

} ]}

The next time I call it the, new information is appended to the bottom of the file

Second Call Returns

{"storeId": "136678",

"dob": "20180122",

"checks": [

{

"id": "20971531",

"printableId": "80001",

"marker": 636704886835750777,

"link": "https://abcd.com/136678/20180122/20971531"

},

{

"id": "20971535",

"printableId": "10001",

"marker": 636704886835789652,

"link": "https://abcd.com/136678/20180122/20971535"

}

]}

Issue:

The second call is invoking the link from the first call again, that has already been called and placed in the HDFS and Publish Kafka Record Queue.

Question:

Is there any way I can check to see if the link called in the First Call has been successfully called and is not invoked in the second call?

Hope that made sense

Many thanks

Tim

1 REPLY 1

Re: Check if URL has already been Invoked within the Nifi dataflow

New Contributor

Hi All

Being new to Nifi it has taken me a while to come up with a solution to this issue. Seeing as there has been a few views on this question I thought I would update members on what I have come up with and to see if there is any feedback on how I could improve the solution regarding design and robustness.

There are two parts to the design

Part one: Create a logging solution

Part two: Invoke the url from the new logging solution.

Part One:Create a logging solution

When I first call the Master URL I get the below Json file

{"storeId": "136678",
"dob":"20180122",
"checks":[{"id": "20971531", 
           "printableId": "80001",
           “IsClosed”= “true”,
           "marker": 636704886835750777, 
           link": "https://abcd.com/136678/20180122/20971531"}
]}

Part One Steps

91533-part-one.png

  1. InvokeHTTP: Call the Master URL
  2. EvaluateJson:. populates the StoreID and DOB variables
  3. SplitJson: Split based on the $.checks array.
  4. EvaluateJson: Populate variables printableId, link, IsClosed
  5. UpdateAttribute: builds the filename StoreID_DOB_ printableId.json, create variable ProcessedFlag and set it to “N”.
  6. RouteOnAttribute: Only process where IsClosed = “true”
  7. PutHDFS: Resulting json file to directory /apps/LinksProcessed with file name 136678_20180122_80001.json. Note: Conflict Resolution Strategy = Ignore

Resulting json File

{"Link":"https://abcd.com/136678/20180122/20971531”,
 "PrintableId":"80001",
  "StoreId":"136678",
  "DOB":"2010122",
 "ProcessedFlag":“N"}

Part Two Steps

91534-part-two.png

  1. ListHDFS: Returns the filename of the latest files added to the /apps/LinksProcessed directory. In this example 136678_20180122_80001.json
  2. FetchHDFS: Fetches the json files for the filenames returned from step 1.
  3. EvaluateJson: Populate the parameters DOB, StoreId, PrintableId, Link and ProcessedFlag
  4. RouteOnAttribute: Only process when ProcessedFlag = “N”
  5. InvokeHTTP: Invoke the Url in the link variable in this example: https://abcd.com/136678/20180122/20971531
  6. PutHDFS: Place the transaction detail json file in the directory /apps/SalesTransactions with the file name 136678_20180122_80001.json
  7. UpdateAttribute: Set ProcessedFlag = “Y”
  8. AttributesToJson: Get variables DOB, StoreId, PrintableId, Link and ProcessedFlag
  9. PutHDFS: replace the file 136678_20180122_80001.json in the apps/LinksProcessed with the data from the previous step

File 136678_20180122_80001.json in the directory apps/LinksProcessed has the resulting data and the detail transaction json file has been placed in the HDFS directory apps/SalesTransactions

{"Link":"https://abcd.com/136678/20180122/20971531”,
 "PrintableId":"80001",
  "StoreId":"136678",
  "DOB":"2010122",
 "ProcessedFlag":“Y"}

What happens when we call the master URL again.

We get the data

{"storeId": "136678",
"dob":"20180122",
"checks":[{"id": "20971531", 
           "printableId": "80001",
           “IsClosed”= “true”,
           "marker": 636704886835750777, 
           link": "https://abcd.com/136678/20180122/20971531"},
           {"id": "20971531", 
           "printableId": "80001",
           “IsClosed”= “true”,
           "marker": 636704886835750777, 
           link": "https://abcd.com/136678/20180122/20971531"},
           {"id": "20971531", 
           "printableId": "80001",
           “IsClosed”= “true”,
           "marker": 636704886835750777, 
           link": "https://abcd.com/136678/20180122/20971531"}
]}

Two new transactions have been appended to the file, we now have three transactions, one of which we processed in the first pass.

Part one

This will result in placing two new json files in the apps/LinksProcessed directory. The first file is not updated nor does the PUTHDFS create an error as the Conflict Resolution Strategy= ignore. Therefore apps/LinksProcessed will look like

Apps/ProcessLinks/136678_20180122_80001.json

Apps/ProcessLinks/136678_20180122_10001.json

Apps/ProcessLinks/136678_20180122_10002.json

Part two

The ListHDFS process will only call the latest files added to the directory since the last call. Therefore only the files 136678_20180122_10001.json and 136678_20180122_10002.json (the two new ones). This means that only the URLs for these transactions will be invoked and not the first one that we have already processed in the first pass

Resulting HDFS

Invoke URL jsons

Apps/ProcessLinks/136678_20180122_80001.json

Apps/ProcessLinks/136678_20180122_10001.json

Apps/ProcessLinks/136678_20180122_10002.json

Detail sales transcation json

Apps/SalesTransactions/136678_20180122_80001.json

Apps/SalesTransactions/136678_20180122_10001.json

Apps/SalesTransactions/136678_20180122_10002.json

I hope that is of use to people. Like I said I am new to Nifi so happy to receive any feedback that will improve the solution

Tim