Cloudera Community

Support Questions

Find answers, ask questions, and share your expertise

Advanced Search

Solved

Expert Contributor

I want to move data from hdfs to hive using puthivestreaming of nifi. does anyone have example?

7,466 Views

1 ACCEPTED SOLUTION

New Member

Hi

Hive streaming is supported against tables having the following :

ORC is the only format supported currently. So your table must have "stored as orc"
transactional = "true" should be set in the table create statement
Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on = true
hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

Let me know if this helps.

View solution in original post

4,218 Views

2 REPLIES 2

New Member

Hi

Hive streaming is supported against tables having the following :

ORC is the only format supported currently. So your table must have "stored as orc"
transactional = "true" should be set in the table create statement
Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"

Also, Hive must have the following properties set

hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on = true
hive.compactor.worker.threads > 0

Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/

Considering your data is JSON, we can construct a Nifi workflow as follows:

ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming

Let me know if this helps.

4,219 Views

Master Guru

Great answer! Just to add a caveat, if you are using HDF 2.0 and HDP 2.5, please see the following: https://community.hortonworks.com/questions/59681/puthivestreaming-nifi-processor-various-errors.htm...

4,218 Views

Announcements

Community Announcements

December 2025 Community Highlights

Community Announcements

Announcing the Launch of Cloudera Community Blogs

Community Announcements

October / November 2025 Community Highlights

What's New @ Cloudera

Announcing Cloudera Streaming Analytics - Kubernetes Operato...

What's New @ Cloudera

Announcing Cloudera Streams Messaging - Kubernetes Operator ...