Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

fetchFtp is it mandatory to have upstream connections

Solved Go to solution

fetchFtp is it mandatory to have upstream connections

Contributor

Hi All,

Thanks a lot to this awesome community

I want to fetch some files from ftp and I do not intend to use on a clsuter, using primary only for this, because it runs only once a day.

Right now fetchftp says it needs an upstream connection, basically looking for a corresponding listFTP

Can we just fetch the file using ftp, since the use case is top fetch from a mainframe and it does not have a directory structure to list the files

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Super Guru

Are you saying your FTP server does not support listing the top level directory? It shouldn't matter if you don't have a directory structure, as long as the FTP server itself can respond to the list commands (MLSD and/or NLST I think).

Alternatively, iIf you know the filename(s) you want to fetch, you can do one of two things:

1) Use GetFTP rather than ListFTP -> FetchFTP, setting the File Filter Regex to include the files you want (perhaps .* for all)

2) Use GenerateFlowFile in place of ListFTP, setting the "filename" attribute to the file you want to fetch. This runs at the rate scheduled for GenerateFlowFile, and will generate the same filenames over and over, unless you are using Expression Language to set the filename at each execution.

Basically FetchFTP needs an incoming connection to provide the "filename" attribute so it knows which file to fetch. GetFTP is kind of a combination of ListFile->FetchFile.

View solution in original post

6 REPLIES 6
Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Super Guru

Are you saying your FTP server does not support listing the top level directory? It shouldn't matter if you don't have a directory structure, as long as the FTP server itself can respond to the list commands (MLSD and/or NLST I think).

Alternatively, iIf you know the filename(s) you want to fetch, you can do one of two things:

1) Use GetFTP rather than ListFTP -> FetchFTP, setting the File Filter Regex to include the files you want (perhaps .* for all)

2) Use GenerateFlowFile in place of ListFTP, setting the "filename" attribute to the file you want to fetch. This runs at the rate scheduled for GenerateFlowFile, and will generate the same filenames over and over, unless you are using Expression Language to set the filename at each execution.

Basically FetchFTP needs an incoming connection to provide the "filename" attribute so it knows which file to fetch. GetFTP is kind of a combination of ListFile->FetchFile.

View solution in original post

Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Contributor

@Matt Burgess I tried the approach of using generating flow file and schedule it using Cron, and set a custom property as filename, attached is the image,

the problem is as you mentioned runs at the rate of generateflow file, how can control the speed, I scheduled it run only once still it runs very fast

41527-generate.png

Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Super Guru

On the Scheduling tab, set Run Schedule to something like 30 seconds, then you can start and stop the processor immediately, it will run only once. I think GetFTP might be the better of the two options, unless there's some reason it doesn't work with your system.

Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Contributor

@Matt Burgess it worked my bad, it was basically creating flow file for each of the nodes Appreciate it Thanks

Highlighted

Re: fetchFtp is it mandatory to have upstream connections

@dhieru singh

For this use GetFTP instead of FetchFTP

Highlighted

Re: fetchFtp is it mandatory to have upstream connections

Contributor

@Abdelkrim Hadjidj @Matt Burgess The use case here is getting the files from Mainframe, once a each day, however I got to know that there is no concept directory structure in Mainframes ( no idea how mainframes work), so it is not able to list the files (GetFTP as well as ListFTP and FetchFTP)

Is there any other way to get around? I read some blogs and answers which suggested to use syncsort or Informatica powercenter.

We tried using our current way of running shell script which goes and fetch the files. We can run the script using executeProcess and save it on one of the nodes (primary nodes), however primary node keeps changing. yesterday it was different primary today it is different one.

In addition, if we mount it to share the directory across the node, it will be against the policies (too much admin work) Any help, thoughts

Don't have an account?
Coming from Hortonworks? Activate your account here