About malogest

malogest · ‎07-28-2016

Hello I am currently using an Oozie workflow on my cluster, but would like to migrate to NiFi. My current workflow is as follows: Sqoop queries a DB2 server every 15 minutes, and the result is placed in a directory on HDFS. A Hive table points to that directory, and analysts will make queries to that table. I was thinking running NiFi on the namenode of the cluster, and using QueryDatabaseTable processor to get the data, and PutHDFS, respectively. But what will happen, if the QDT processor gets a huge batch that will use up CPU/Memory/Disk of the namenode? Will this result in unexpected behavior, because the datanodes won’t be able to communicate with the namenode/have their communication delayed, which will stop/delay the whole cluster? - I have been experimenting in my sandbox with this setup, getting a batch of 1.1GB. Ambari Metrics tells me, that this uses 50% of the CPU. I’m aware of this new processor in the making; GenerateTableFetch. Would a good solution be to fetch data in small portions using GenerateTableFetch, and then ExecuteSQL and PutHDFS (on the namenode)?

Online	Offline
Last Visited	‎09-12-2016 09:05 AM

Member Since	‎07-28-2016 12:46 PM
Last Visited	‎09-12-2016 09:05 AM
Posts	3
Kudos received	4

Cloudera Community

Using NiFi to quey RDBMS