Support Questions

Find answers, ask questions, and share your expertise

PutHiveQL and putHiveStreaming processors in Apache Nifi are very slow

avatar
Explorer

I am using the PutHiveSql to insert data into hive table. But it is very slow. It is approximately inserting each row in 2 to 3 secs.

Is there a way to increase the speed of insertion ?

It took around 3 days to insert 15000 rows !

please find below the puthiveQl processor configuration:

17459-puthiveql.png

Complete flow:

17460-complete-flow.png


puthiveql.png
1 ACCEPTED SOLUTION

avatar
Master Guru

What version of NiFi/HDF are you using? As of NiFi 1.2.0 / HDF 3.0.0, PutHiveQL can accept multiple statements in one flow file, so if you are currently dealing with one INSERT statement per flow file, try MergeContent to batch them up into a single flow file. This should increase performance, but since Hive is an auto-commit database, PutHiveQL is probably not the best choice for large/fast ingest needs. You may be better off putting the data into HDFS and creating/loading a table from it.

For PutHiveStreaming, there is a known issue that can reduce performance, it also was fixed in NiFi 1.2.0 / HDF 3.0.0.

View solution in original post

3 REPLIES 3

avatar
Master Guru

What version of NiFi/HDF are you using? As of NiFi 1.2.0 / HDF 3.0.0, PutHiveQL can accept multiple statements in one flow file, so if you are currently dealing with one INSERT statement per flow file, try MergeContent to batch them up into a single flow file. This should increase performance, but since Hive is an auto-commit database, PutHiveQL is probably not the best choice for large/fast ingest needs. You may be better off putting the data into HDFS and creating/loading a table from it.

For PutHiveStreaming, there is a known issue that can reduce performance, it also was fixed in NiFi 1.2.0 / HDF 3.0.0.

avatar
New Contributor

What is the most straightforward way to load data into Hive tables using Nifi? We use Hive 1.1 and have ingested the data and put it into HDFS as Avro files. @Matt Burgess

avatar
Master Guru

In an upcoming release you'll be able to use Hive 1.1 processors, so in your case you'd want to keep what you have (Avro in HDFS) and use PutHive_1_1QL to issue a LOAD DATA or CREATE EXTERNAL TABLE statement so Hive can see your data.