Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Copying files within HDFS using wildcard in NiFi

Copying files within HDFS using wildcard in NiFi

New Contributor

Hello,


In my flow I need to copy HDFS files based on dynamic wildcard to another HDFS location within the same cluster.


I have Process Group Variables:

  • source_path = 'hdfs:///source/'
  • file_prefix = 'myflow_'

And a Flowfile Attribute:

  • file_timestamp = '20190520'


The source directory contains 4 files, and I need to copy the bolded two of them. The filenames to be copied are "${source_path}${file_prefix}${file_timestamp}.part*".


hdfs:///source/myflow_20190412.part000

hdfs:///source/myflow_20190520.part000

hdfs:///source/myflow_20190520.part001

hdfs:///source/otherflow_20190625.part000


The MoveHDFS processor in NiFi v1.8.0 does not support Expression Language in the File Filter Regex field. How could I achieve this functionality - except for using ExecuteStreamCommand with "hdfs dfs -cp"?


Thank you for your help,

Piotr

2 REPLIES 2

Re: Copying files within HDFS using wildcard in NiFi

Super Guru

@Piotr Grzegorski

Try using ListHDFS + FetchHDFS processors.

You can simulate MoveHDFS processor with the below Flow:

ListHDFS //list all the files in HDFS directory
RouteOnAttribute //Use nifi expression language to filter out the required files
FetchHDFS //fetch the files from HDFS
PutHDFS //put the files into HDFS directory.
DeleteHDFS //delete the file from HDFS directory that are pulled from FetchHDFS

-

If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly :-)

Re: Copying files within HDFS using wildcard in NiFi

New Contributor

Thank you for your answer.


The problem is ListHDFS is a starting processor - it does not accept incoming connections, so I can't provide the changing input directory using a flowfile.

Don't have an account?
Coming from Hortonworks? Activate your account here