Member since
05-19-2018
2
Posts
0
Kudos Received
0
Solutions
05-21-2018
06:45 AM
@Shu Thanks for the detailed answer. I have few other questions relating to the above. As you had mentioned that use route based on file name. However, I have a folder with recursive folders where there are multiple CSV files and I am not sure of all the file names. In that case is there a way I can create dynamic route based on file names found during fetch and not manual entry. Also in some cases for ex: In 2011 the csv file had 3 cols and 2017 it became 4 cols. 2 things with this a. How can I detect schema of all files which has same file name and say there were 9 files with 3 cols and 1 file with 4 cols. In this case we need to create table with 4 cols and add all files 9+1 but 3 col files will have 1 col empty based on which is from 4 Col file. b. When appending new files how can I make sure that it matches the existing schema of the table. But with a little change where the column might be named "Username" in hive but in file it can be "user-name" kind of dictionary mapping advanced loading it should not fail inserting the new files With regards to excel file here is the file that is taken as example. 1. How do we ingest files which has multi table with different headers in the same sheet. 2. Is there a way we can create a table detection processor - so that it auto detects tables in excel outputs each table in the excel sheet as independent table for hive input. 3. Is there a way where we can provide configuration like below to extract all tables in all excels and sheets that match the criteria and load as table in some DB.
A4::C16
... View more
05-19-2018
01:22 PM
Infer Schema and Create Table In Hive from NIFI Node based on Input file ex: CSV or MYSQL Case 1: For example I have a CSV file IRIS.csv which has headers in it (Folder contains 100 IRIS<NUMBER>.csv) need to ingest all these files (APPEND) as one table in HIVE. Currently I create a table in HIVE manually. However, I need to create table in HIVE from NIFI flow itself. So that I can parameterize the flow and ingest varieties of schema data later. Case 2: Similarly how can I ingest Excel Data and inferschema and create table in HIVE dynamically Case 3: I have excel where there are more than 1 Table and they have different schema, how can I use NIFI to detect table and identify schema and create table in HIVE from NIFI. Case 4: In a folder there are multiple csv files having same schema and different schema. I want to run a flow where it can group all same schema csv and create hive table and append those data in hive. Apologies for long post let me know if I need to post each item as separate post.
... View more
Labels:
- Labels:
-
Apache NiFi