Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Newbie: Connecting to RESTful API to collect files

Newbie: Connecting to RESTful API to collect files

New Contributor

Completely new here and even newer to Nifi.

I am trying to connect to a RESTful API in order to retrieve files outside of my org, I have so far created about 7 flows and only 2-3 of them seem to have gotten off to a good start,

1st InvokeHTTP

2nd GetHTTP - LogAttribute

3rd GetHTTPs - ExtractText (although at present this is doing nothing).


Would anybody happen to have an idea flow diagram for where I should be going to do the following:

Connect to client (https) - list files - pull files - output files to mount point (this part I think I already have).

I have looked through all the different processors available and I am struggling to find the right one that fits.

Thanks.

4 REPLIES 4

Re: Newbie: Connecting to RESTful API to collect files

I will embark on doing it. Hope you can continue to contribute your talents in this area. Thank you. cool math games

Highlighted

Re: Newbie: Connecting to RESTful API to collect files

Super Guru

@Donna Leoni

Once you are able to get the files from InvokeHTTP processor, you can List and View the content of flowfile in NiFi.

-

If you want to store the files, then try using:

Store the files into Local using PutFile processor

(or)

Store to HDFS using PutHDFS processor.

-

Flow design will depends on your use case..

let us know if you need any help further..!!


Re: Newbie: Connecting to RESTful API to collect files

New Contributor

Many thanks for this @Shu, I am still struggling a little here. Really new as I mentioned before.

Here is what I already have in place: (the only flow that is doing something of meaningful stance for me:

107683-nifi-flow.pngThe first InvokeHTTP is going to 3 different URLS:

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-04-02T01:50:00Z.csv

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-02-12T02:51:00Z.x....csv

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-03-04T17:36:00Z.csv

The second is:

https://api.xxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:0....pdf

I have done this purely so that I can have the filenames set with the correct extension applied in the update attribute, as I cannot set two in the one processor.

I still have issues with this flow.

1, when the files land in the directory that I have set I get the following: 107703-1554804107895.png

These are not the file names that are set when I look at them directly using postman, so how can I ensure that the same filename is set to these files on our server as they already are at rest?

1a, Furthermore how do I make it so that only one of each of these files come down rather than the same file repeatedly? I tried to set the schedule via cron, but even then I get more than I need.

2, then I need to look at how I create an argument to look for date to bring in setting a variable for the date, as once this is up and running I wont be able to have: xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:00Z.pdf

as I will be looking for the previous days data as the general rule of thumb going forwards.

All help graciously appreciated.

Thanks.

Re: Newbie: Connecting to RESTful API to collect files

New Contributor

Many thanks for this @Shu, I am still struggling a little here. Really new as I mentioned before.

Here is what I already have in place: (the only flow that is doing something of meaningful stance for me):

107683-nifi-flow.pngThe first InvokeHTTP is going to 3 different URLS:

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-04-02T01:50:00Z.c...
https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-02-12T02:51:00Z.x...
https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-03-04T17:36:00Z.c...

The second is:

https://api.xxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:0...

I have done this purely so that I can have the filenames set with the correct extension applied in the update attribute, as I cannot set two in the one processor.

I still have issues with this flow.

1, when the files land in the directory that I have set I get the following: 107703-1554804107895.png

These are not the file names that are set when I look at them directly using postman, so how can I ensure that the same filename is set to these files on our server as they already are at rest?

1a, Furthermore how do I make it so that only one of each of these files come down rather than the same file repeatedly? I tried to set the schedule via cron, but even then I get more than I need.

2, then I need to look at how I create an argument to look for date to bring in setting a variable for the date, as once this is up and running I wont be able to have:

xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:00Z.pdf

as I will be looking for the previous days data as the general rule of thumb going forwards.

All help graciously appreciated.

Thanks.