Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to use puthivestreaming

Solved Go to solution
Highlighted

how to use puthivestreaming

Contributor

I want to move data from hdfs to hive using puthivestreaming of nifi. does anyone have example?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to use puthivestreaming

Explorer

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

View solution in original post

2 REPLIES 2
Highlighted

Re: how to use puthivestreaming

Explorer

Hi

Hive streaming is supported against tables having the following :

  1. ORC is the only format supported currently. So your table must have "stored as orc"
  2. transactional = "true" should be set in the table create statement
  3. Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

  1. hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
  2. hive.compactor.initiator.on = true
  3. hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

8202-screen-shot-2016-10-03-at-122338-pm-copy.jpg

8201-screen-shot-2016-10-03-at-121935-pm.png

Let me know if this helps.

View solution in original post

Highlighted

Re: how to use puthivestreaming

Super Guru

Great answer! Just to add a caveat, if you are using HDF 2.0 and HDP 2.5, please see the following: https://community.hortonworks.com/questions/59681/puthivestreaming-nifi-processor-various-errors.htm...

Don't have an account?
Coming from Hortonworks? Activate your account here