Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best Practices for Hive Insert using Nifi

Best Practices for Hive Insert using Nifi

New Contributor

Hi,

Using ConvertCSVtoAvro processor, I was successfully able to convert CSV to AVRO .Now my requirement is to insert this Avro data to existing Hive table using NIFI but i am stuck here. How can i do this?.

I am using below statement currently to insert the Avro file to Hive table.

CREATE EXTERNAL TABLE  avroTEST 
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/test/csvData/AVRO'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/hive/schemas/newavro.schema');

also please let me know if there is any other optimal way of doing this.

1 REPLY 1

Re: Best Practices for Hive Insert using Nifi

You can use MergeContent with a Merge Strategy of "Avro" and a Max Bin Size equal to (some multiple of) your HDFS block size, then PutHDFS to place the Avro file(s) into your location above (/user/test/csvData/AVRO). Then you should be able to query it from Hive.

Alternatively if you can configure your Hive Server according to these requirements, and if you can create your table backed by ORC instead of Avro and can set

TBLPROPERTIES("transactional"="true")

(see link for more info), then you could use PutHiveStreaming to send your Avro files to Hive.