Support Questions
Find answers, ask questions, and share your expertise

how to use puthivestreaming

Solved Go to solution

how to use puthivestreaming

Contributor

I want to move data from hdfs to hive using puthivestreaming of nifi. does anyone have example?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: how to use puthivestreaming

Explorer

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

View solution in original post

2 REPLIES 2

Re: how to use puthivestreaming

Explorer

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

View solution in original post

Re: how to use puthivestreaming

Super Guru

Great answer! Just to add a caveat, if you are using HDF 2.0 and HDP 2.5, please see the following: https://community.hortonworks.com/questions/59681/puthivestreaming-nifi-processor-various-errors.htm...