Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to filter file with multiple pattern

How to filter file with multiple pattern


Currently i am having simple dataflow which has ListFile-->FetchFile-->PutS3. This flow reads data from source and uploads it to S3. 

Now, i have source directory /users/data/ which has abc_mmddyyy.pdf, xyz_mmddyyy.pdf and hij_mmddyyyy.pdf which uploads into S3 folder \aws\data\ .

We are planning to source new files which would be published in /users/data/demo/www_mmddyyyy.csv and has to be uploaded to \aws\data\demo.


How will i achieve this @stevenmatison 


Re: How to filter file with multiple pattern

Master Collaborator

@Gubbi Simplest method would be to create a separate ListFile-->FetchFile-->PutS3 flow.   However you should be able to grab everything with the same ListFile processor.  For a single flow you would add updateAttribute to set path you want to use in PutS3.  For example if you see "demo" in the source of the original file, set path to append "demo\".  If not, leave it as " \aws\data\".     You could also use a RouteOnAttribute to send each data type to different PutS3.


If I was doing this myself I might start initially with separate flows which I clearly know are operating as expected.    Then I would start on a third dynamic single flow that achieves the results of both. 





If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  




Don't have an account?
Coming from Hortonworks? Activate your account here