Created 03-07-2018 07:44 PM
I have a process that fetches files from an ftp location and processes them HDFS -> hive. At the end of the day I would like to reconcile that I loaded all files on FTP into the hive table. The hive table has a field for filename so I can get the distinct listing of files loaded for that day using the selectHiveQL processor. I tried getting the list of files off FTP from the listFTP processor but it is just queueing up 20 zero byte files. I envisioned being able to just listFTP -> MergeContent to have a text file of all filenames in ftp directory and then somehow compare the results of selectHiveQL and listFTP/mergecontent but mergecontent doesnt even run with the zero byte files input. Any suggestions on how to do this correctly?
Created on 03-07-2018 08:17 PM - edited 08-18-2019 03:03 AM
Use ReplaceText after ListFTP processor and before MergeContent processor with replacing the filename as the contents of the flowfile.
Replace text Processor Configs:-
So we are keeping the filename as the contents of the flowfile with above configs.
Then use MergeContent Processor with Below Config:-
Configure the merge content processor as per your requirements and change the delimiter strategy as text and Demarcator with , and new line.
Output:-
940630588913985, <br>940634934689001
As i'm having 2 files from generateflowfile processor then i did replacetext and changed the contents of flowfile as the flowfile name in it.
After mergecontent processor we are having 2 flowfilenames with , and newline as demarcators.
For comparing store the merged file having filenames in it into Hive and then get the distinct filenames that are loaded into hive table.
Then compare both filenames
Created on 03-07-2018 08:17 PM - edited 08-18-2019 03:03 AM
Use ReplaceText after ListFTP processor and before MergeContent processor with replacing the filename as the contents of the flowfile.
Replace text Processor Configs:-
So we are keeping the filename as the contents of the flowfile with above configs.
Then use MergeContent Processor with Below Config:-
Configure the merge content processor as per your requirements and change the delimiter strategy as text and Demarcator with , and new line.
Output:-
940630588913985, <br>940634934689001
As i'm having 2 files from generateflowfile processor then i did replacetext and changed the contents of flowfile as the flowfile name in it.
After mergecontent processor we are having 2 flowfilenames with , and newline as demarcators.
For comparing store the merged file having filenames in it into Hive and then get the distinct filenames that are loaded into hive table.
Then compare both filenames
Created 03-08-2018 03:19 PM
This worked great, thank you!