Created 09-01-2017 06:22 PM
I'm using NIFI 1.2.0 and I'm trying to load csv data into Hive table.
My flow looks like:
GetHDFS (get csv files from hdfs)->UpdateAttribute (setting schema.name atr)->QueryRecord (select all columns from csv + add additional column "loaded_ts" - hive table is partitioned based on this field) -> ConversCSVToAvro (mandatory action for next HiveStreaming processor) -> PutHiveStreaming
1. When I create non-partitioned table in Hive - everything goes ok and data is loaded to the Hive table:
CREATE TABLE `default.nifi_stream_table`( `id` string, `company` string, `city` string, `state` string, `country` string, `loaded_ts` string) CLUSTERED BY (id) INTO 16 BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');
2. When I created partitioned table in Hive - data stream seems goes ok through PutHiveStreaming processor , and there are no any errors, and I see on hdfs in hive warehouse buckets have been created with data, but "select * from default.nifi_stream_table" - fetches nothing.
CREATE TABLE `default.nifi_stream_table`( `id` string, `company` string, `city` string, `state` string, `country` string) PARTITIONED BY (`loaded_ts` string) CLUSTERED BY (id) INTO 16 BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true');
In NIFI PutHiveStreaming processor I've tried to set all combinations of these two properties:
Partition Columns: No value set/ loaded_ts
Auto-Create Partitions: False/true
Any ideas what I'm doing wrong?