Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Data pipeline options

Hello Team,

I have 5 source systems, 2 are RDBMS systems and 3 are files that need to be picked up from some server X.we do not have Kafka. I cannot use Flume, since it doesn't have DB connectors and using for file movement is not a good choice.

I can use NiFi, but my question is, do i need to create 5 workflows for each source system and schedule them? or does NiFi has something similar to Kafka topic, where i can create 5 topics in a directory and respective target system can consume from those topics

2 REPLIES 2

Super Guru

@Mr Anticipation

You need to create atleast 2 workflows to pull data from different source systems (RDBMS and Files).

-> For RDBMS you can use QueryDatabaseTable processor(2 QDT processors) for 2 RDBMS systems (or) By using dynamic Connection pool service we can use One ExecuteSQL/GenerateTableFetch Processors to get data from RDBMS systems.

-> For files use ListFile processors(each for one server) then feed the connection to FetchFile processor to fetch the file from remote server.

-> Once you fetch data from RDBMS sources by default Avro will be the output flowfile data format unless if you use QueryDatabaseTableRecord processor.

-> If you are processing Textfiles then you can use ConvertRecord processor and add avro schema attribute to convert Textfiles into Avroformat (or) some unified format in NiFi.

-> Once your data from RDBMS and Files are in same format then create flow that you can process the data and store them to HDFS/Hive/Hbase..etc systems.

Thanks Shu. makes sense. Let me checkout the feasibility and in case I get into some issues, I'll let you know. Thanks again for taking time and replying to my question

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.