Created on 07-19-201612:26 AM - edited 08-17-201911:18 AM
In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL.
Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster.
You can just enter a regular SQL that you are doing in Hive.
For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. ForHive Configuration Resources: you set the hive configuration files. You can set the Database Userand Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool.
For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok.
CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.
Thanks for this helpful article. I'm a little bit confused about how PutHiveQL can receive a HQL DDL/DML command. Do you have an example of a flow in which a PutHiveQL process receives a command from another process?