Support Questions

Find answers, ask questions, and share your expertise

Copying files within HDFS using wildcard in NiFi

avatar
New Contributor

Hello,


In my flow I need to copy HDFS files based on dynamic wildcard to another HDFS location within the same cluster.


I have Process Group Variables:

  • source_path = 'hdfs:///source/'
  • file_prefix = 'myflow_'

And a Flowfile Attribute:

  • file_timestamp = '20190520'


The source directory contains 4 files, and I need to copy the bolded two of them. The filenames to be copied are "${source_path}${file_prefix}${file_timestamp}.part*".


hdfs:///source/myflow_20190412.part000

hdfs:///source/myflow_20190520.part000

hdfs:///source/myflow_20190520.part001

hdfs:///source/otherflow_20190625.part000


The MoveHDFS processor in NiFi v1.8.0 does not support Expression Language in the File Filter Regex field. How could I achieve this functionality - except for using ExecuteStreamCommand with "hdfs dfs -cp"?


Thank you for your help,

Piotr

2 REPLIES 2

avatar
Master Guru

@Piotr Grzegorski

Try using ListHDFS + FetchHDFS processors.

You can simulate MoveHDFS processor with the below Flow:

ListHDFS //list all the files in HDFS directory
RouteOnAttribute //Use nifi expression language to filter out the required files
FetchHDFS //fetch the files from HDFS
PutHDFS //put the files into HDFS directory.
DeleteHDFS //delete the file from HDFS directory that are pulled from FetchHDFS

-

If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

avatar
New Contributor

Thank you for your answer.


The problem is ListHDFS is a starting processor - it does not accept incoming connections, so I can't provide the changing input directory using a flowfile.