Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Newbie: Connecting to RESTful API to collect files

Newbie: Connecting to RESTful API to collect files

New Contributor

Completely new here and even newer to Nifi.

I am trying to connect to a RESTful API in order to retrieve files outside of my org, I have so far created about 7 flows and only 2-3 of them seem to have gotten off to a good start,

1st InvokeHTTP

2nd GetHTTP - LogAttribute

3rd GetHTTPs - ExtractText (although at present this is doing nothing).


Would anybody happen to have an idea flow diagram for where I should be going to do the following:

Connect to client (https) - list files - pull files - output files to mount point (this part I think I already have).

I have looked through all the different processors available and I am struggling to find the right one that fits.

Thanks.

4 REPLIES 4

Re: Newbie: Connecting to RESTful API to collect files

I will embark on doing it. Hope you can continue to contribute your talents in this area. Thank you. cool math games

Highlighted

Re: Newbie: Connecting to RESTful API to collect files

Super Guru

@Donna Leoni

Once you are able to get the files from InvokeHTTP processor, you can List and View the content of flowfile in NiFi.

-

If you want to store the files, then try using:

Store the files into Local using PutFile processor

(or)

Store to HDFS using PutHDFS processor.

-

Flow design will depends on your use case..

let us know if you need any help further..!!


Re: Newbie: Connecting to RESTful API to collect files

New Contributor

Many thanks for this @Shu, I am still struggling a little here. Really new as I mentioned before.

Here is what I already have in place: (the only flow that is doing something of meaningful stance for me:

107683-nifi-flow.pngThe first InvokeHTTP is going to 3 different URLS:

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-04-02T01:50:00Z.csv

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-02-12T02:51:00Z.x....csv

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-03-04T17:36:00Z.csv

The second is:

https://api.xxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:0....pdf

I have done this purely so that I can have the filenames set with the correct extension applied in the update attribute, as I cannot set two in the one processor.

I still have issues with this flow.

1, when the files land in the directory that I have set I get the following: 107703-1554804107895.png

These are not the file names that are set when I look at them directly using postman, so how can I ensure that the same filename is set to these files on our server as they already are at rest?

1a, Furthermore how do I make it so that only one of each of these files come down rather than the same file repeatedly? I tried to set the schedule via cron, but even then I get more than I need.

2, then I need to look at how I create an argument to look for date to bring in setting a variable for the date, as once this is up and running I wont be able to have: xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:00Z.pdf

as I will be looking for the previous days data as the general rule of thumb going forwards.

All help graciously appreciated.

Thanks.

Re: Newbie: Connecting to RESTful API to collect files

New Contributor

Many thanks for this @Shu, I am still struggling a little here. Really new as I mentioned before.

Here is what I already have in place: (the only flow that is doing something of meaningful stance for me):

107683-nifi-flow.pngThe first InvokeHTTP is going to 3 different URLS:

https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-04-02T01:50:00Z.c...
https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-02-12T02:51:00Z.x...
https://api.xxxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxx/2019-03-04T17:36:00Z.c...

The second is:

https://api.xxxxx.com/reports/v1/scheduledReports/download/xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:0...

I have done this purely so that I can have the filenames set with the correct extension applied in the update attribute, as I cannot set two in the one processor.

I still have issues with this flow.

1, when the files land in the directory that I have set I get the following: 107703-1554804107895.png

These are not the file names that are set when I look at them directly using postman, so how can I ensure that the same filename is set to these files on our server as they already are at rest?

1a, Furthermore how do I make it so that only one of each of these files come down rather than the same file repeatedly? I tried to set the schedule via cron, but even then I get more than I need.

2, then I need to look at how I create an argument to look for date to bring in setting a variable for the date, as once this is up and running I wont be able to have:

xxxxx/xxxxx/xxxxxxxxxx/2019-04-02T17:39:00Z.pdf

as I will be looking for the previous days data as the general rule of thumb going forwards.

All help graciously appreciated.

Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here