Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi - Using PutHiveStreaming processor for the table located on S3

NiFi - Using PutHiveStreaming processor for the table located on S3

New Contributor

Hi. I'm trying to use PutHiveStreaming processor for inserting data to Hive table located on S3.

below sharing the information about.

1. Table creation query

CREATE TABLE table_s3 ( col1 STRING, col2 STRING) CLUSTERED BY (col2) INTO 3 BUCKETS STORED AS ORC LOCATION 's3://bucket/path' TBLPROPERTIES ('transactional'='true', 'NO_AUTO_COMPACTION'='true');

2. PutHiveStreaming error log

2017-03-09 01:55:50,154 ERROR [Timer-Driven Process Thread-6] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=34983a9d-f092-144e-ba19-7ac561303cf7] Error connecting to Hive endpoint: table bixby at thrift://thriftURI:9083 2017-03-09 01:55:50,154 ERROR [Timer-Driven Process Thread-6] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=34983a9d-f092-144e-ba19-7ac561303cf7] Hive Streaming connect/write error, flow file will be penalized and routed to retry 2017-03-09 01:55:50,155 ERROR [Timer-Driven Process Thread-6] o.a.n.processors.hive.PutHiveStreaming org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://thriftURI:9083', database='default', table='table_s3', partitionVals=[] } at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:80) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:45) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:829) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:740) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$7(PutHiveStreaming.java:464) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2082) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2053) ~[na:na] at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:391) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.jar:1.1.0] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.jar:1.1.0] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.jar:1.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] Caused by: org.apache.nifi.util.hive.HiveWriter$TxnBatchFailure: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://thriftURI:9083', database='default', table='table_s3', partitionVals=[] } at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:255) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:74) ~[nifi-hive-processors-1.1.0.jar:1.1.0] ... 19 common frames omitted Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Unable to get new record Updater at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:124) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.StrictJsonWriter.newBatch(StrictJsonWriter.java:37) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:509) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:461) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:345) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:325) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.nifi.util.hive.HiveWriter.lambda$nextTxnBatch$2(HiveWriter.java:250) ~[nifi-hive-processors-1.1.0.jar:1.1.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91] ... 3 common frames omitted Caused by: java.io.IOException: No FileSystem for scheme: s3 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[hadoop-common-2.6.2.jar:na] at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:221) ~[hive-exec-1.2.1.jar:1.2.1] at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:292) ~[hive-exec-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.createRecordUpdater(AbstractRecordWriter.java:141) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:121) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1] ... 10 common frames omitted

It looks like that Nifi could not implement S3(path) scheme for table_s3.

If you want more information for resolving this issue, I'll share it again.

Thank you.

P.S PutHiveStreaming processor is worked well for the table on HDFS.

Don't have an account?
Coming from Hortonworks? Activate your account here