Support Questions
Find answers, ask questions, and share your expertise

Data pipeline options

Hello Team,

I have 5 source systems, 2 are RDBMS systems and 3 are files that need to be picked up from some server X.we do not have Kafka. I cannot use Flume, since it doesn't have DB connectors and using for file movement is not a good choice.

I can use NiFi, but my question is, do i need to create 5 workflows for each source system and schedule them? or does NiFi has something similar to Kafka topic, where i can create 5 topics in a directory and respective target system can consume from those topics


Re: Data pipeline options

Super Guru

@Mr Anticipation

You need to create atleast 2 workflows to pull data from different source systems (RDBMS and Files).

-> For RDBMS you can use QueryDatabaseTable processor(2 QDT processors) for 2 RDBMS systems (or) By using dynamic Connection pool service we can use One ExecuteSQL/GenerateTableFetch Processors to get data from RDBMS systems.

-> For files use ListFile processors(each for one server) then feed the connection to FetchFile processor to fetch the file from remote server.

-> Once you fetch data from RDBMS sources by default Avro will be the output flowfile data format unless if you use QueryDatabaseTableRecord processor.

-> If you are processing Textfiles then you can use ConvertRecord processor and add avro schema attribute to convert Textfiles into Avroformat (or) some unified format in NiFi.

-> Once your data from RDBMS and Files are in same format then create flow that you can process the data and store them to HDFS/Hive/Hbase..etc systems.

Re: Data pipeline options

Thanks Shu. makes sense. Let me checkout the feasibility and in case I get into some issues, I'll let you know. Thanks again for taking time and replying to my question