I have 5 source systems, 2 are RDBMS systems and 3 are files that need to be picked up from some server X.we do not have Kafka. I cannot use Flume, since it doesn't have DB connectors and using for file movement is not a good choice.
I can use NiFi, but my question is, do i need to create 5 workflows for each source system and schedule them? or does NiFi has something similar to Kafka topic, where i can create 5 topics in a directory and respective target system can consume from those topics
You need to create atleast 2 workflows to pull data from different source systems (RDBMS and Files).
-> For RDBMS you can use QueryDatabaseTable processor(2 QDT processors) for 2 RDBMS systems (or) By using dynamic Connection pool service we can use One ExecuteSQL/GenerateTableFetch Processors to get data from RDBMS systems.
-> For files use ListFile processors(each for one server) then feed the connection to FetchFile processor to fetch the file from remote server.
-> Once you fetch data from RDBMS sources by default Avro will be the output flowfile data format unless if you use QueryDatabaseTableRecord processor.
-> If you are processing Textfiles then you can use ConvertRecord processor and add avro schema attribute to convert Textfiles into Avroformat (or) some unified format in NiFi.
-> Once your data from RDBMS and Files are in same format then create flow that you can process the data and store them to HDFS/Hive/Hbase..etc systems.
Thanks Shu. makes sense. Let me checkout the feasibility and in case I get into some issues, I'll let you know. Thanks again for taking time and replying to my question