- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to use puthivestreaming
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache NiFi
Created ‎10-02-2016 02:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to move data from hdfs to hive using puthivestreaming of nifi. does anyone have example?
Created on ‎10-03-2016 06:51 AM - edited ‎08-19-2019 04:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Hive streaming is supported against tables having the following :
- ORC is the only format supported currently. So your table must have "stored as orc"
- transactional = "true" should be set in the table create statement
- Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"
Also, Hive must have the following properties set
- hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- hive.compactor.initiator.on = true
- hive.compactor.worker.threads > 0
Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/
Considering your data is JSON, we can construct a Nifi workflow as follows:
ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming
Let me know if this helps.
Created on ‎10-03-2016 06:51 AM - edited ‎08-19-2019 04:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Hive streaming is supported against tables having the following :
- ORC is the only format supported currently. So your table must have "stored as orc"
- transactional = "true" should be set in the table create statement
- Bucketed but not sorted. So your table must have "clustered by (colName) into (n) buckets"
Also, Hive must have the following properties set
- hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
- hive.compactor.initiator.on = true
- hive.compactor.worker.threads > 0
Please follow the documentation here https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
PutHiveStreaming requires your input data to be in Avro format, as is provided in the documentation here:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hive.PutHiveStreaming/
Considering your data is JSON, we can construct a Nifi workflow as follows:
ListHDFS--> FetchHDFS--> ConvertJsonToAvro-->PutHiveStreaming
Let me know if this helps.
Created ‎10-04-2016 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great answer! Just to add a caveat, if you are using HDF 2.0 and HDP 2.5, please see the following: https://community.hortonworks.com/questions/59681/puthivestreaming-nifi-processor-various-errors.htm...
