Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Data pipeline options

Hello Team,

I have 5 source systems, 2 are RDBMS systems and 3 are files that need to be picked up from some server X.we do not have Kafka. I cannot use Flume, since it doesn't have DB connectors and using for file movement is not a good choice.

I can use NiFi, but my question is, do i need to create 5 workflows for each source system and schedule them? or does NiFi has something similar to Kafka topic, where i can create 5 topics in a directory and respective target system can consume from those topics


Super Guru

@Mr Anticipation

You need to create atleast 2 workflows to pull data from different source systems (RDBMS and Files).

-> For RDBMS you can use QueryDatabaseTable processor(2 QDT processors) for 2 RDBMS systems (or) By using dynamic Connection pool service we can use One ExecuteSQL/GenerateTableFetch Processors to get data from RDBMS systems.

-> For files use ListFile processors(each for one server) then feed the connection to FetchFile processor to fetch the file from remote server.

-> Once you fetch data from RDBMS sources by default Avro will be the output flowfile data format unless if you use QueryDatabaseTableRecord processor.

-> If you are processing Textfiles then you can use ConvertRecord processor and add avro schema attribute to convert Textfiles into Avroformat (or) some unified format in NiFi.

-> Once your data from RDBMS and Files are in same format then create flow that you can process the data and store them to HDFS/Hive/Hbase..etc systems.

Thanks Shu. makes sense. Let me checkout the feasibility and in case I get into some issues, I'll let you know. Thanks again for taking time and replying to my question

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.