Member since
05-19-2018
2
Posts
0
Kudos Received
0
Solutions
05-21-2018
06:45 AM
@Shu Thanks for the detailed answer. I have few other questions relating to the above. As you had mentioned that use route based on file name. However, I have a folder with recursive folders where there are multiple CSV files and I am not sure of all the file names. In that case is there a way I can create dynamic route based on file names found during fetch and not manual entry. Also in some cases for ex: In 2011 the csv file had 3 cols and 2017 it became 4 cols. 2 things with this a. How can I detect schema of all files which has same file name and say there were 9 files with 3 cols and 1 file with 4 cols. In this case we need to create table with 4 cols and add all files 9+1 but 3 col files will have 1 col empty based on which is from 4 Col file. b. When appending new files how can I make sure that it matches the existing schema of the table. But with a little change where the column might be named "Username" in hive but in file it can be "user-name" kind of dictionary mapping advanced loading it should not fail inserting the new files With regards to excel file here is the file that is taken as example. 1. How do we ingest files which has multi table with different headers in the same sheet. 2. Is there a way we can create a table detection processor - so that it auto detects tables in excel outputs each table in the excel sheet as independent table for hive input. 3. Is there a way where we can provide configuration like below to extract all tables in all excels and sheets that match the criteria and load as table in some DB.
A4::C16
... View more