Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NIFI HiveStreaming to partitioned bucketed table

Highlighted

NIFI HiveStreaming to partitioned bucketed table

New Contributor

I'm using NIFI 1.2.0 and I'm trying to load csv data into Hive table.

My flow looks like:

GetHDFS (get csv files from hdfs)->UpdateAttribute (setting schema.name atr)->QueryRecord (select all columns from csv + add additional column "loaded_ts" - hive table is partitioned based on this field) -> ConversCSVToAvro (mandatory action for next HiveStreaming processor) -> PutHiveStreaming

1. When I create non-partitioned table in Hive - everything goes ok and data is loaded to the Hive table:

CREATE TABLE `default.nifi_stream_table`(
  `id` string,  
  `company` string, 
  `city` string, 
  `state` string, 
  `country` string,  
  `loaded_ts` string)
CLUSTERED BY (id) INTO 16 BUCKETS
STORED AS ORC
TBLPROPERTIES('transactional'='true');

2. When I created partitioned table in Hive - data stream seems goes ok through PutHiveStreaming processor , and there are no any errors, and I see on hdfs in hive warehouse buckets have been created with data, but "select * from default.nifi_stream_table" - fetches nothing.

CREATE TABLE `default.nifi_stream_table`(
  `id` string,  
  `company` string, 
  `city` string, 
  `state` string, 
  `country` string)  
PARTITIONED BY (`loaded_ts` string)
CLUSTERED BY (id) INTO 16 BUCKETS
STORED AS ORC
TBLPROPERTIES('transactional'='true');

In NIFI PutHiveStreaming processor I've tried to set all combinations of these two properties:

Partition Columns: No value set/ loaded_ts

Auto-Create Partitions: False/true

Any ideas what I'm doing wrong?

Don't have an account?
Coming from Hortonworks? Activate your account here