- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 07-19-2016 12:26 AM - edited 08-17-2019 11:18 AM
In Apache NiFi 1.2, there are processors for Reading Hive data via HiveQL and Storing to Hive via HiveQL. These processors are SelectHiveQL and PutHiveQL.
Configuring a HiveQL processor is simple, you need to enter your query and pick either AVRO or CSV format. AVRO is a better fit, I am waiting for ORC. Most important you need to set a Connection Pool to connect to your cluster.
You can just enter a regular SQL that you are doing in Hive.
For Hive to work, you must setup a HiveConnectionPool Controller Service. After configuration you will need to enable that and then you can enable your processor(s). For connecting to Hive on the Sandbox, set the Database Connection URL: jdbc://hive2://localhost:10000/default. For Hive Configuration Resources: you set the hive configuration files. You can set the Database User and Password of the user that has access you require for Hive. For documentation on the HiveConnectionPool.
For a PutHiveQL, you just need to set a connection pool, batch size for updates and a character set. The defaults for this are ok.
CAVEAT: Once you have it set make sure you have all the relationships terminated somewhere either in a Sink or with auto terminate.
Created on 09-01-2016 03:03 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks for this helpful article. I'm a little bit confused about how PutHiveQL can receive a HQL DDL/DML command. Do you have an example of a flow in which a PutHiveQL process receives a command from another process?
Created on 01-24-2018 07:56 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
if you do a PutHDFS
it generates an attribute hive.ddl that can be used to create a hive table.
you can also generate hive.ddl with updateattribute with your code
${hive.ddl} LOCATION '${absolute.hdfs.path}'