Community Articles

TimothySpann · ‎07-19-2016

In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL.

Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster.

You can just enter a regular SQL that you are doing in Hive.

For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool.

For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok.

CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.

koosha_tahmaseb · ‎09-01-2016

Thanks for this helpful article. I'm a little bit confused about how PutHiveQL can receive a HQL DDL/DML command. Do you have an example of a flow in which a PutHiveQL process receives a command from another process?

TimothySpann · ‎01-24-2018

if you do a PutHDFS

it generates an attribute hive.ddl that can be used to create a hive table.

you can also generate hive.ddl with updateattribute with your code

${hive.ddl} LOCATION '${absolute.hdfs.path}'

Cloudera Community

Community Articles

Using HiveQL Processors in Apache NiFi 1.2

Apache Hive

Apache NiFi

Cloudera DataFlow (CDF)

Re: Using HiveQL Processors in Apache NiFi 1.2

Re: Using HiveQL Processors in Apache NiFi 1.2