Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to determine if files exist in HDFS directory

Highlighted

How to determine if files exist in HDFS directory

Contributor

As the last step in my process, I need to check to see if any more files exist in an HDFS directory. I tried using FetchHDFS which can take an existing flow file (unlike ListHDFS which won't accept an incoming flow file), but I discovered the hard way that FetchHDFS can't take wildcards, only an HDFS path and filename. I looked for, but can't find anything on calling existing Java HDFS methods from ExecuteScript and groovy. I was hoping not to need to build a custom processor. The only option I've come up with so far is to write a small standalone Java app and call it using ExecuteStreamCommand. But that loads a JVM every time (presumably). Any other ideas?

1 REPLY 1

Re: How to determine if files exist in HDFS directory

Super Guru

@Jeff Watson

Could you try using GetHDFSFileInfo processor, as this processor accepts incoming connections and regex to match only the required directories/files/exclude files..!