Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Extracting a path value from a column in an Excel file and using that extracted path value in a new GetFile processor

Extracting a path value from a column in an Excel file and using that extracted path value in a new GetFile processor

New Contributor

Hello friends,


I have a series of Excel files that I am retrieving from a local drive and one of the columns in these Excel files contains specific local path locations for additional files (in CSV format).


I am wondering if anyone knows a way to extract the path locations from this column, for each specific row, and use the extracted path value to automatically configure a new GetFile processor to retrieve the additional CSV files that are located at that path value that is being extracted.

I understand that I may have worded this poorly, so if any clarification is needed, please let me know.


1 REPLY 1

Re: Extracting a path value from a column in an Excel file and using that extracted path value in a new GetFile processor

Super Guru

@Noah Brace

Use ConvertExcelToCSVProcessor to convert into CSV processor then by using SplitRecord processor to write only the required column with Records for Splits as 1.

Use ExtractText processor to extract the content as flowfile attribute then pass the attribute name in FetchFile processor

Flow:

1.ConvertExcelToCSV
2.SplitRecord //configure to read the csv and write only the path column & with records per split as 1
3.ExtractText //add new property as fn with value as (.*) & now we will have attribute named fn to flowfile.
4.FetchFile/FetchSFTP/FetchFTP //keep filename property value as ${fn}
5.other processors.



Don't have an account?
Coming from Hortonworks? Activate your account here