Support Questions

malogest · ‎07-28-2016

Hello

I am currently using an Oozie workflow on my cluster, but would like to migrate to NiFi. My current workflow is as follows: Sqoop queries a DB2 server every 15 minutes, and the result is placed in a directory on HDFS. A Hive table points to that directory, and analysts will make queries to that table.

I was thinking running NiFi on the namenode of the cluster, and using QueryDatabaseTable processor to get the data, and PutHDFS, respectively. But what will happen, if the QDT processor gets a huge batch that will use up CPU/Memory/Disk of the namenode? Will this result in unexpected behavior, because the datanodes won’t be able to communicate with the namenode/have their communication delayed, which will stop/delay the whole cluster? - I have been experimenting in my sandbox with this setup, getting a batch of 1.1GB. Ambari Metrics tells me, that this uses 50% of the CPU.

I’m aware of this new processor in the making; GenerateTableFetch. Would a good solution be to fetch data in small portions using GenerateTableFetch, and then ExecuteSQL and PutHDFS (on the namenode)?

TimothySpann · ‎07-28-2016

Andread B

You really want NiFi on a seperate server if possible.

Sqoop is really fast as it designed for accessing RDBMS data. NiFi is a great solution for a continuous feed.

See: https://community.hortonworks.com/questions/25228/can-i-use-nifi-to-replace-sqoop.html

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

http://www.batchiq.com/database-injest-with-nifi.html

http://funnifi.blogspot.com/2016/04/sql-in-nifi-with-executescript.html

View solution in original post

TimothySpann · ‎07-28-2016

Andread B

You really want NiFi on a seperate server if possible.

Sqoop is really fast as it designed for accessing RDBMS data. NiFi is a great solution for a continuous feed.

See: https://community.hortonworks.com/questions/25228/can-i-use-nifi-to-replace-sqoop.html

https://community.hortonworks.com/questions/36464/how-to-use-nifi-to-incrementally-ingest-data-from....

http://www.batchiq.com/database-injest-with-nifi.html

http://funnifi.blogspot.com/2016/04/sql-in-nifi-with-executescript.html

ahadjidj · ‎08-26-2016

Hi @Andread B,

Why do you want to run NiFi on the NameNode ?

If you are ingesting lot of data I would recommend running NiFi on a dedicated host or at least on edge node.

Also, if you will ingest lot of data for a single NiFi instance, you can use GenerateTableFetch (coming in NiFi 1.0) to divide your import into several chunks, and distribute them on several NiFi nodes. This processor will generate several FlowFiles based on the Partition Size property where each FlowFile is a query to get a part of the data.

You can try this by downloading NiFi 1.0 Beta : https://nifi.apache.org/download.html

Cloudera Community

Support Questions

Using NiFi to quey RDBMS