Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Extracting a path value from a column in an Excel file and using that extracted path value in a new GetFile processor

New Contributor

Hello friends,


I have a series of Excel files that I am retrieving from a local drive and one of the columns in these Excel files contains specific local path locations for additional files (in CSV format).


I am wondering if anyone knows a way to extract the path locations from this column, for each specific row, and use the extracted path value to automatically configure a new GetFile processor to retrieve the additional CSV files that are located at that path value that is being extracted.

I understand that I may have worded this poorly, so if any clarification is needed, please let me know.


1 REPLY 1

Super Guru

@Noah Brace

Use ConvertExcelToCSVProcessor to convert into CSV processor then by using SplitRecord processor to write only the required column with Records for Splits as 1.

Use ExtractText processor to extract the content as flowfile attribute then pass the attribute name in FetchFile processor

Flow:

1.ConvertExcelToCSV
2.SplitRecord //configure to read the csv and write only the path column & with records per split as 1
3.ExtractText //add new property as fn with value as (.*) & now we will have attribute named fn to flowfile.
4.FetchFile/FetchSFTP/FetchFTP //keep filename property value as ${fn}
5.other processors.



Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.