Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Copying files within HDFS using wildcard in NiFi

avatar
New Contributor

Hello,


In my flow I need to copy HDFS files based on dynamic wildcard to another HDFS location within the same cluster.


I have Process Group Variables:

  • source_path = 'hdfs:///source/'
  • file_prefix = 'myflow_'

And a Flowfile Attribute:

  • file_timestamp = '20190520'


The source directory contains 4 files, and I need to copy the bolded two of them. The filenames to be copied are "${source_path}${file_prefix}${file_timestamp}.part*".


hdfs:///source/myflow_20190412.part000

hdfs:///source/myflow_20190520.part000

hdfs:///source/myflow_20190520.part001

hdfs:///source/otherflow_20190625.part000


The MoveHDFS processor in NiFi v1.8.0 does not support Expression Language in the File Filter Regex field. How could I achieve this functionality - except for using ExecuteStreamCommand with "hdfs dfs -cp"?


Thank you for your help,

Piotr

2 REPLIES 2

avatar
Master Guru

@Piotr Grzegorski

Try using ListHDFS + FetchHDFS processors.

You can simulate MoveHDFS processor with the below Flow:

ListHDFS //list all the files in HDFS directory
RouteOnAttribute //Use nifi expression language to filter out the required files
FetchHDFS //fetch the files from HDFS
PutHDFS //put the files into HDFS directory.
DeleteHDFS //delete the file from HDFS directory that are pulled from FetchHDFS

-

If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

avatar
New Contributor

Thank you for your answer.


The problem is ListHDFS is a starting processor - it does not accept incoming connections, so I can't provide the changing input directory using a flowfile.