Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to use puthivestreaming

avatar
Expert Contributor

I want to move data from hdfs to hive using puthivestreaming of nifi. does anyone have example?

1 ACCEPTED SOLUTION

avatar
Contributor

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

View solution in original post

2 REPLIES 2

avatar
Contributor

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

avatar
Master Guru

Great answer! Just to add a caveat, if you are using HDF 2.0 and HDP 2.5, please see the following: https://community.hortonworks.com/questions/59681/puthivestreaming-nifi-processor-various-errors.htm...