Member since
07-28-2017
47
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
15109 | 03-13-2018 12:04 PM | |
8537 | 10-12-2017 08:52 AM |
10-12-2017
02:18 PM
In the flow below, the goal is to take some zip archives from an FTP server, unzip them and send them to HDFS.
The implication is that due to rerouting issues in this specific FTP location, i cannot use the built-in Get/List/FetchFTP processors as
they fail to follow the proper rerouting. What i can do is use a command line utility from the Nifi server that can handle rerouting, and indeed ncftp in this case does the trick.
So my plan is to use ExecuteProcess to run the bash script: ncftpget -f login.txt /home/user/test /path/to/remote/*.zip > /dev/null 2>&1
ls /home/user/test | grep ".zip" The first line gets the wanted zip archives from the FTP server while redirects all output/error streams, as we want to parse only the output of
the second line which lists the contents of the specified directory and parses the ones with 'zip' extension.
What i am trying to do is to recreate the proper filenames between ExtractText --> UpdateAttribute and pass them to FetchFile --> UnpackContent --> PutHDFS. So the output of the ExecuteProcess is something like: file1_timestamp.zip
file2_timestamp.zip
file3_timestamp.zip
.....
file100_timestamp.zip Next, ExtractText processor with an added property 'filename': '\w+.zip' looks for this regex in the flowfile content and outputs a flowfile with new attributes
filename1,filename2...filename100 for each match. Subsequently, UpdateAttribute specifies the local path the zip archives have been placed
from our bash script ('/home/user/test' in this case), as well as the proper filename so that ${path}/${filename} are passed to the rest of the flow for fetching, unpacking and finally putting to HDFS. The problem i have is that only the first match is passed to the rest of the flow as only this match corresponds to the 'filename' attribute. The other filenames are parsed according to the ExtractText
processor to the attributes 'filename.2', 'filename.3'... 'filename.100'.
I would like to find a way to update the attributes passed to FetchFile with some kind of incremental counter. I tried to configure the FetchFile processor with File to Fetch property
as ${path}/${filename: nextInt()} but this just looks for 'file_timestamp.zip#' filenames in the specified path that ofc are not there.
... View more
Labels:
- Labels:
-
Apache NiFi
10-12-2017
09:05 AM
CompressContent works fine for gzip archives. Thanks a lot, still exploring Nifi processors possibilities
... View more
10-12-2017
08:52 AM
I found a way to create empty flowfiles, update the attributes with proper authorization headers and send the request with InvokeHTTP.
... View more
10-10-2017
03:13 PM
Hi @Shu thanks for all the help, i am not sure i follow your logic here though: The flow starts with the ListHDFS processor where the directory is specified in HDFS: e.g. /user/foivos The FetchHDFS processor follows with the HDFS filename specification. Lets say we want to take only csv files so the HDFS filename property is ${path}/${filename:endsWith('csv')} The final processor is the ExecuteStreamCommand where:
Command path: /path/to/script.sh Command arguments: ${filename}, but where does the processor takes this value from?? Is it the ${filename} property parsed from the FetchHDFS processor where also the filtering is being done for csv files?
... View more
10-10-2017
10:19 AM
I have a directory with zip archives in the local filesystem of the Nifi server, and i would like to create a flow that unzips these archives with a bash script and then puts them in HDFS. The problem i have is that i cannot direct the output of the bash script in a correct way to the PutHDFS processor so that it parses the unzipped files. 1) With the use of ExecuteStreamCommand processor i have 2 options for the outgoing flow, the original relationship that contains the initial zipped archive and the outputstream relationship which it should be what i am looking for but it transfers only an empty file with the same name with the original. How should be this processor be configured when it runs a bash script/command to correctly contain the files produced from this script/command? 2) With the use of ExecuteProcess processor, where there is only a success/failure relationship and also this does not help to pass the outgoing flow as input of the PutHDFS processor to move the unzipped files to HDFS. Any help would be greately appreciated!
... View more
Labels:
- Labels:
-
Apache NiFi
10-06-2017
03:26 PM
I want to create a Nifi flow with GET requests to a REST API that supports authorization via a Bearer token, however i have problems to create the proper flow due to the several available HTTP-related processors in Nifi. My question is which is the proper flow to do this? I tried the flow GetHTTP --> Update Attribute --> PutHDFS to store also the results in HDFS after the filename is appended with a timestamp but i could not find a way to properly configure the GetHTTP processor for the token authorization. The flow InvokeHTTP --> UpdateAttribute --> PutHDFS also didnt work for me, setting the Send Attributes =Authorization and adding the token as value to the newly created Authorization property fails with a "Yielding processor due to exception encountered as source processor" error. Any ideas on how could i proceed with this? Thanks in advance!
... View more
Labels:
- Labels:
-
Apache NiFi
10-06-2017
08:06 AM
Hi @Shu, Thanks for the response, what do you mean with the Flowfiles attributes part, how do you pass the values to the arguments? What i want to do is run a ncftpput command where the last argument in {} is the filenames from the FetchHDFS processor that are passed automatically to the ExecuteStreamCommand processor. The other arguments are fixed and we dont have to automate them, only the filename(s) to be transffered to the FTP server. Command: ncftpput -f /path/to/login.txt /path/to/ftp/remote {HDFS_files} So if the ListHDFS-->FetchHDFS flow outputs lets say for example 2 files that in the specified HDFS directory /HDFS/path/file1 & /HDFS/path/file2, how can we pass these as arguments to the ncftpput command in the ExecuteStreamCommand processor? The goal is to transfer the files in the HDFS dir to the FTP server with the ExecuteStreamCommand processor.
... View more
10-05-2017
03:06 PM
@Shu Your answer is really awesome and exactly what i would normally need, however there is one problem. Due to connectivity and rerouting issues of the FTP server i am trying to transfer the files to, and because the PutFTP processor cannot be configured to handle these issues, i decided to use a command line ftp utility (ncftp in this case) that makes the FTP connection and transfer the files. At first i thought to create a ListHDFS --> FetchHDFS --> ExecuteProcess flow, but the ExecuteProcess processor does not accept incoming connections. Note here that the ExecuteProcess processor by its own works like a charm if i configure it to execute either a bash script file with the ncftp commands in it, or just execute the ncftp utility with the correct arguments without the bash script. So i replaced it with ExecuteStreamCommand processor that accepts incoming connections: The configuration of the processor: The correct command that works from the command line and the ExecuteProcess processor would be: ncftpput -f /path/to/login.txt /path/to/ftp/remote /path/to/localdir/filename But unfortunately the flow does not work and i get no error messages whatsoever to debug and find out what is possibly going wrong. I ve also tried to set IgnoreSTDIN: true and specify the local path to the filenames i want to transfer explicitly, but this does not work either. What i would want to do is list the HDFS contents, fetch the filenames and pass these filenames as arguments to the ExecuteStreamCommand processor as the last argument. In FetchHDFS processor there is the ${path}/${filename} HDFS filename attribute, and in the ExecuteStreamCommand processor ive set ${filename} as the last argument, but apparently the files are not stored somewhere locally in the Nifi server and they are lost somewhere in the way of the flow after the FetchHDFS processor. My question is how would you approach this issue? Any other ideas on how we could achieve the same result?
... View more
09-27-2017
08:35 PM
Interesting, definitely the latter i would like to handle it as a retry. The files should be transfered only in pairs, how would you approach it?
... View more
09-27-2017
02:54 PM
i figured GenerateFlowFile processor could be a solution here, so have 2 flows with same start and end point like: GetHDFS --> PutFTP for the original HDFS files GetHDFS --> GenerateFlowFile --> PutFTP for the newly created files based on the original HDFS ones However the GUI wont let me connect the GetHDFS processor to GenerateFlowFile one, some input here would be really appreciated!
... View more
- « Previous
- Next »