Hello I am currently using an Oozie workflow on my cluster, but
would like to migrate to NiFi. My current workflow is as follows: Sqoop
queries a DB2 server every 15 minutes, and the result is placed in a directory
on HDFS. A Hive table points to that directory, and analysts will make queries
to that table. I was thinking running NiFi on the namenode of the cluster,
and using QueryDatabaseTable processor to get the data, and PutHDFS, respectively.
But what will happen, if the QDT processor gets a huge batch that will use up
CPU/Memory/Disk of the namenode? Will this result in unexpected behavior,
because the datanodes won’t be able to communicate with the namenode/have their
communication delayed, which will stop/delay the whole cluster? - I have been experimenting in my sandbox with this setup, getting a batch of 1.1GB. Ambari Metrics tells me, that this uses 50% of the CPU. I’m aware of this new processor in the making; GenerateTableFetch.
Would a good solution be to fetch data in small portions using GenerateTableFetch,
and then ExecuteSQL and PutHDFS (on the namenode)?
... View more