Community Articles

Find and share helpful community-sourced technical articles.
Super Guru

In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL.


Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster.



You can just enter a regular SQL that you are doing in Hive.



For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool.



For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok.



CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.


Thanks for this helpful article. I'm a little bit confused about how PutHiveQL can receive a HQL DDL/DML command. Do you have an example of a flow in which a PutHiveQL process receives a command from another process?

Super Guru

if you do a PutHDFS

it generates an attribute hive.ddl that can be used to create a hive table.

you can also generate hive.ddl with updateattribute with your code

${hive.ddl} LOCATION '${absolute.hdfs.path}'

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎08-17-2019 11:18 AM
Updated by:
Top Kudoed Authors