Community Articles

Find and share helpful community-sourced technical articles.
avatar
Master Guru

In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL.

5847-hiveql.png

Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster.

5841-selecthiveql.png

5846-hiveql5.png

You can just enter a regular SQL that you are doing in Hive.

5844-hiveql3.png

5845-hiveql4.png

For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool.

5843-hiveconnectionpool2.png

5842-hiveconnectionpool.png

For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok.

5840-puthiveql2.png

5839-puthiveql.png

CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.

19,465 Views
Comments
avatar

Thanks for this helpful article. I'm a little bit confused about how PutHiveQL can receive a HQL DDL/DML command. Do you have an example of a flow in which a PutHiveQL process receives a command from another process?

avatar
Master Guru

if you do a PutHDFS

it generates an attribute hive.ddl that can be used to create a hive table.

you can also generate hive.ddl with updateattribute with your code

${hive.ddl} LOCATION '${absolute.hdfs.path}'