Member since
04-29-2016
192
Posts
20
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
861 | 07-14-2017 05:01 PM | |
1448 | 06-28-2017 05:20 PM |
09-25-2019
10:23 AM
The question posted is not a hypothetical one, it is a real use case. fyi, here is another thread related to partial file consumption; - https://stackoverflow.com/questions/45379729/nifi-how-to-avoid-copying-file-that-are-partially-written that thread does not suggest the OS automatically takes care of this. The solution proposed there is to add a time wait between ListFile and FetchFile, but in our case, the requirement is to wait for an indicator file before we start file ingestion;
... View more
09-25-2019
09:21 AM
Hello All, We're using Apache NiFi 1.0.1, I know we're way behind in upgrading. Our use case is to get files from a local NiFi server mount and write to HDFS; we're using ListFile and FetchFile to achieve this. Some files are huge, so the concern is that NiFi might start to fetch files before they're completely written to the mount, which would cause partial file loads in HDFS. So the solution proposed is, the source system would send us an indicator file (located on a different directory) with a specific name; once we get that file, then we should start fetching the files with FetchFile processor. So, the question is, how do we build the NiFi dataflow in such a way that FetchFile will only start after the indicator file is received. Do you have any suggestions on how to achieve this. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
03-14-2018
02:10 AM
@Pranay Vyas The Hive Export/Import worked well for us. Thanks.
... View more
02-21-2018
04:00 PM
Hello, We've the following Hive migration scenario where there are several variable/changes, we need to migrate Hive data from Source to Target Source Target Cluster A Cluster B HDP 2.5.3 HDP 2.6.2 Hive metastore DB - MySQL Hive metastore DB - Oracle Has 7 databases to migrate No existing data to preserve Both clusters are on the same network, both have HDP running. What's the most efficient way to migrate existing Hive data to the new cluster. Thanks.
... View more
Labels:
- Labels:
-
Apache Hive
11-03-2017
07:50 PM
@Matt BurgessI tried testing PutHiveStreaming on HDF 3.0 (with HDP 2.5) and I'm still getting an error
... View more
10-25-2017
07:47 PM
@Matt Burgess thank you. I tried connecting to HDP 2.6 also (from NiFi 1.2.0) and I'm still getting an error with PutHiveStreaming, but a different one, I'm posting both HDP 2.5 and HDP 2.6 error traces below: HDP 2.6 error trace: 2017-10-25 14:34:52,391 ERROR [Timer-Driven Process Thread-9] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=015a1006-2600-129d-8dc3-e4a194564d35] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@96dd112] to Hive Streaming transaction due to java.lang.reflect.UndeclaredThrowableException: {}
org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@96dd112] to Hive Streaming transaction due to java.lang.reflect.UndeclaredThrowableException
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:535)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:542)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$12(PutHiveStreaming.java:674)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2125)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2095)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:628)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:552)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:552)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.UndeclaredThrowableException: null
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:551)
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:261)
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:73)
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:965)
at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:876)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$null$8(PutHiveStreaming.java:677)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
... 19 common frames omitted
Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable to acquire lock on {metaStoreUri='thrift://server.domain.com:9083', database='default', table='hive_streaming_test_10', partitionVals=[] }
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.access$700(HiveEndPoint.java:461)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$2.run(HiveEndPoint.java:555)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$2.run(HiveEndPoint.java:552)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
... 27 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException: null
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:3906)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:3893)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1863)
at sun.reflect.GeneratedMethodAccessor578.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy243.lock(Unknown Source)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573)
... 33 common frames omitted
HDP 2.5 error trace: 2017-10-25 14:41:00,537 ERROR [Timer-Driven Process Thread-4] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=015a1004-2600-129d-0dc7-5cee73298001] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@35253e7] to Hive Streaming transaction due to java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.thrift.TApplicationException: Internal error processing open_txns: {}
org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@35253e7] to Hive Streaming transaction due to java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:535)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:542)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$12(PutHiveStreaming.java:674)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2125)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2095)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:628)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:552)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:552)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.nifi.util.hive.HiveWriter.callWithTimeout(HiveWriter.java:400)
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:258)
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:73)
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:965)
at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:876)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$null$8(PutHiveStreaming.java:677)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
... 19 common frames omitted
Caused by: java.util.concurrent.ExecutionException: org.apache.thrift.TApplicationException: Internal error processing open_txns
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.nifi.util.hive.HiveWriter.callWithTimeout(HiveWriter.java:382)
... 26 common frames omitted
Caused by: org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_open_txns(ThriftHiveMetastore.java:3834)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.open_txns(ThriftHiveMetastore.java:3821)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxns(HiveMetaStoreClient.java:1841)
at sun.reflect.GeneratedMethodAccessor557.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy231.openTxns(Unknown Source)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$1.run(HiveEndPoint.java:525)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.openTxnImpl(HiveEndPoint.java:522)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:504)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:461)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:345)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.access$500(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:332)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:328)
at org.apache.nifi.util.hive.HiveWriter.lambda$nextTxnBatch$2(HiveWriter.java:259)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:368)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:368)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 common frames omitted
... View more
10-25-2017
01:36 PM
@Wynner NiFi 1.2.0 and tried connecting to both HDP 2.5 and 2.6, neither works.
... View more
10-24-2017
06:28 PM
I have the same issue - connecting to Hive (on Kerberized HDP) from NiFi does not work; both PutHiveQL and PutHiveStreaming give me errors; I posted 2 questions, but so far no answers that resolved the issue. https://community.hortonworks.com/questions/142302/nifi-puthivestreaming-connection-issue-when-connec.html https://community.hortonworks.com/questions/142110/nifi-processor-puthiveql-cannot-connect-to-kerberi.html
... View more
10-23-2017
06:48 PM
Hi guys, We cannot connect to Kerberized Hive (on HDP 2.5 and Hive 2.6) from NiFi instance (NiFi 1.2.0); both PutHiveQL and PutHiveStreaming are erroring. Using the same properties (principal and keytab values) and the same XML files (hive-site.xml, core-site.xml, hdfs-site.xml, hbase-site.xml), NiFi can read and write from HDFS and HBase, so not sure why the problem only with Hive. Also, from the same NiFi server, using a simple java program, the connection to Hive works, it's only through NiFi that it fails. Did a klist and it shows a valid ticket; also, manually ran kinit with principal and keytab and restarted NiFi, but still same error. In nifi.properties, both nifi.kerberos.service.principal and nifi.kerberos.service.keytab.location are commented out, not sure if they should be uncommented or not, because those values are present in the processor properties; also, this entry is present in the properties file - nifi.kerberos.krb5.file=/etc/krb5.conf; Below is the error trace from NiFi log for PutHiveStreaming: 2017-10-23 11:32:23,841 INFO [put-hive-streaming-0] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,856 INFO [put-hive-streaming-0] hive.metastore Connected to metastore.
2017-10-23 11:32:23,885 INFO [Timer-Driven Process Thread-7] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,895 INFO [Timer-Driven Process Thread-7] hive.metastore Connected to metastore.
2017-10-23 11:32:24,730 WARN [put-hive-streaming-0] o.a.h.h.m.RetryingMetaStoreClient MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_open_txns(ThriftHiveMetastore.java:3834)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.open_txns(ThriftHiveMetastore.java:3821)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxns(HiveMetaStoreClient.java:1841)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy231.openTxns(Unknown Source)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$1.run(HiveEndPoint.java:525)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.openTxnImpl(HiveEndPoint.java:522)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:504)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:461)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:345)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.access$500(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:332)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:328)
at org.apache.nifi.util.hive.HiveWriter.lambda$nextTxnBatch$2(HiveWriter.java:259)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:368)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:368)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
10-23-2017
04:35 PM
@mkalyanpur how would this be different for a Kerberized HDP environment; I'm having so much trouble connecting to Kerberized HDP 2.5 and 2.6, from NiFi 1.2.0; both PutHiveStreaming and PutHiveQL are not working. For PutHiveQL here is the detail on the error I get - https://community.hortonworks.com/questions/142110/nifi-processor-puthiveql-cannot-connect-to-kerberi.html For PutHiveStreaming, I get the error below: 2017-10-23 11:32:23,841 INFO [put-hive-streaming-0] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,856 INFO [put-hive-streaming-0] hive.metastore Connected to metastore.
2017-10-23 11:32:23,885 INFO [Timer-Driven Process Thread-7] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,895 INFO [Timer-Driven Process Thread-7] hive.metastore Connected to metastore.
2017-10-23 11:32:24,730 WARN [put-hive-streaming-0] o.a.h.h.m.RetryingMetaStoreClient MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_open_txns(ThriftHiveMetastore.java:3834)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.open_txns(ThriftHiveMetastore.java:3821)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxns(HiveMetaStoreClient.java:1841)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy231.openTxns(Unknown Source)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$1.run(HiveEndPoint.java:525)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.openTxnImpl(HiveEndPoint.java:522)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:504)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:461)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:345)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.access$500(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:332)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:328)
at org.apache.nifi.util.hive.HiveWriter.lambda$nextTxnBatch$2(HiveWriter.java:259)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:368)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:368)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
<br> The strange thing is, using the same core-site, hdfs-site, hive-site config files and the same principal and keytab, NiFi can connect to HDFS and HBase without any issues, it's only Hive connection that errors; even using a sample java program to connect to Hive using Kerberos principal and keytab works fine. Thanks for your time.
... View more
10-20-2017
02:54 PM
@Andrew Lim thanks for clarifying further.
... View more
10-20-2017
02:52 PM
@Abdelkrim Hadjidj Thanks for clarifying. It would have been nicer to let the controller services be accessible throughout the UI, regardless of where they were created.
... View more
10-20-2017
02:44 PM
Hello, In our NiFi instance (1.2.0) we're finding that controller services created from the Controller Settings menu (top right corner in the UI) are not visible/accessible when you try to look for them through a processor; for example, after a HiveConnectionPool Controller service is created through the controller settings menu, it does not show up in PutHiveinQL's "Hive Database Connection Pooling Service" drop down values; also, when a new Controller service is created by selecting the "Create new service..." option from the dropdown values in PutHiveQL processor, that controller service does not show up in the Controller services listing (accessed through the Controller Settings menu). It seem like it is something to do with user access permissions in the UI; if yes, how can this be corrected. I'm not familiar with the user access permissions settings, our Admin handles that. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi
10-20-2017
01:42 PM
Also, uncommenting nifi.kerberos.service.principal and nifi.kerberos.service.keytab.location in nifi.properties didn't help either.
... View more
10-20-2017
01:10 PM
@njayakumar, adding hdfs-site.xml didn't help.
... View more
10-19-2017
05:50 PM
@njayakumar I already have those files included in the Hive Connection Pooling Service.
... View more
10-19-2017
04:25 PM
Hello, We cannot connect to Kerberized Hive (on HDP 2.5) from NiFi instance (NiFi 1.2.0); both PutHiveQL and PutHiveStreaming are erroring. Using the same properties (principal and keytab values) and the same XML files (hive-site.xml, core-site.xml), NiFi can read and write from HDFS and HBase, so not sure why the problem only with Hive. Also, from the same NiFi server, using a simple java program, the connection to Hive works, it's only through NiFi that it fails. Did a klist and it shows a valid ticket; also, manually ran kinit with principal and keytab and restarted NiFi, but still same error. In nifi.properties, both nifi.kerberos.service.principal and nifi.kerberos.service.keytab.location are commented out, not sure if they should be uncommented or not, because those values are present in the processor properties; also, this entry is present in the properties file - nifi.kerberos.krb5.file=/etc/krb5.conf; Below is the error trace from NiFi log for PutHiveQL: 2017-10-19 10:45:49,922 ERROR [Timer-Driven Process Thread-6] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=015a100e-2600-129d-7130-830324d7a86b] Failed to update Hive for StandardFlowFileRecord[uuid=320bc13f-b355-4b15-9518-2b3a03046d8f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1508427927791-1, container=default, section=1], offset=0, length=87],offset=0,name=157442040917404,size=87] due to java.sql.SQLException: org.apache.thrift.transport.TTransportException: org.apache.http.client.ClientProtocolException; it is possible that retrying the operation will succeed, so routing to retry: java.sql.SQLException: org.apache.thrift.transport.TTransportException: org.apache.http.client.ClientProtocolException
java.sql.SQLException: org.apache.thrift.transport.TTransportException: org.apache.http.client.ClientProtocolException
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:262)
at org.apache.hive.jdbc.HivePreparedStatement.execute(HivePreparedStatement.java:98)
at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172)
at org.apache.nifi.processors.hive.PutHiveQL.lambda$null$3(PutHiveQL.java:218)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
at org.apache.nifi.processors.hive.PutHiveQL.lambda$new$4(PutHiveQL.java:199)
at org.apache.nifi.processor.util.pattern.Put.putFlowFiles(Put.java:59)
at org.apache.nifi.processor.util.pattern.Put.onTrigger(Put.java:101)
at org.apache.nifi.processors.hive.PutHiveQL.lambda$onTrigger$6(PutHiveQL.java:255)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHiveQL.onTrigger(PutHiveQL.java:255)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1118)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException: org.apache.http.client.ClientProtocolException
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:297)
at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at org.apache.hive.service.cli.thrift.TCLIService$Client.send_ExecuteStatement(TCLIService.java:219)
at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:211)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
... 23 common frames omitted
Caused by: org.apache.http.client.ClientProtocolException: null
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:117)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:251)
... 28 common frames omitted
Caused by: org.apache.http.HttpException: null
at org.apache.hive.jdbc.HttpRequestInterceptorBase.process(HttpRequestInterceptorBase.java:86)
at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:132)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:182)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
... 31 common frames omitted
Caused by: org.apache.http.HttpException: null
at org.apache.hive.jdbc.HttpKerberosRequestInterceptor.addHttpAuthHeader(HttpKerberosRequestInterceptor.java:68)
at org.apache.hive.jdbc.HttpRequestInterceptorBase.process(HttpRequestInterceptorBase.java:74)
... 37 common frames omitted
Caused by: java.lang.reflect.UndeclaredThrowableException: null
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hive.service.auth.HttpAuthUtils.getKerberosServiceTicket(HttpAuthUtils.java:83)
at org.apache.hive.jdbc.HttpKerberosRequestInterceptor.addHttpAuthHeader(HttpKerberosRequestInterceptor.java:62)
... 38 common frames omitted
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at org.apache.hive.service.auth.HttpAuthUtils$HttpKerberosClientAction.run(HttpAuthUtils.java:183)
at org.apache.hive.service.auth.HttpAuthUtils$HttpKerberosClientAction.run(HttpAuthUtils.java:151)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
... 40 common frames omitted
Any suggestions for what to check or what could be causing the issue ? Thanks.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
10-16-2017
04:33 PM
Hello, My requirement is to overwrite (or delete prior to the import) the existing data in an hcatalog table during sqoop import. It appears hive-overwrite and delete-target-dir arguments don't work for this purpose. Any suggestions on how to do this. Thanks.
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Sqoop
08-09-2017
04:46 PM
@mel mendoza, in my case, after splitting the files, I was doing further processing on the split files; but if your requirement is to store/write the split files, you could use PutFile or PutHDFS to write to local file system or HDFS.
... View more
08-05-2017
01:37 PM
@rich @William Gonzalez any updates on the Certified Professional Data Engineer (HCPDE) ?
... View more
07-14-2017
05:01 PM
This is a known issue with GetHDFS - https://issues.apache.org/jira/browse/NIFI-2956, which is resolved in NiFi 1.1.0
... View more
07-14-2017
01:36 PM
Once I stop and start the GetHDFS processor, it appears the expression for 'Directory' is getting re-evaluated and it is then correctly pointing to previous day's directory and processes the files from that directory. This behavior further confirms that the expression is getting evaluated only for the first scheduled run and not for all subsequent runs; so, is there a work around to force the expression to evaluate for each run ?
... View more
07-12-2017
05:57 PM
Hello, When a NiFi processor property includes expression language and the processor is scheduled to run at certain intervals, does the expression in the property get evaluated for each scheduled run or only once for the first run ? The reason I'm asking is, I've a GetHDFS processor that's scheduled to run once daily; the 'Directory' property of the processor includes expression language; since I want the processor to point to previous day's directory, I have set the directory property as follows: /user/nifitest/${now():toNumber():minus(86400000):format('yyyy')}/${now():toNumber():minus(86400000):format('MM')}/${now():toNumber():minus(86400000):format('yyyy_MM_dd')} The above expression evaluates correctly to a directory that points to one that was created the previous day; for example, today's run (7-12-2017) would point to this directory - /user/nifitest/2017/07/2017_07_11; After it is scheduled, for the first run, the GetHDFS processor starts at the scheduled time and works perfectly, it processes all the files in the directory from the previous day, but it is not finding any files on subsequent scheduled runs; in the nifi log, I was not able to find the exact directory path to which the processor points to, but below is what it shows in the log; 2017-06-30 08:18:00,000 ERROR [NiFi logging handler] org.apache.nifi.StdErr [Timer-Driven Process Thread-10] INFO org.apache.nifi.processors.hadoop.GetHDFS - GetHDFS[id=b0d21ab8-1001-1159-15dd-4d380d420cab] Kerber
os ticket age exceeds threshold [14400 seconds] attempting to renew ticket for user nifitest/dcdrlhadoop1a.mdanderson.edu@MDANDERSON.EDU
2017-06-30 08:18:00,057 ERROR [NiFi logging handler] org.apache.nifi.StdErr [Timer-Driven Process Thread-10] INFO org.apache.nifi.processors.hadoop.GetHDFS - GetHDFS[id=b0d21ab8-1001-1159-15dd-4d380d420cab] Kerber
os relogin successful or ticket still valid
2017-06-30 08:18:00,154 ERROR [NiFi logging handler] org.apache.nifi.StdErr [Timer-Driven Process Thread-6] INFO org.apache.nifi.processors.standard.GetHTTP - GetHTTP[id=19a2140b-1178-102e-de2f-9e978bc6b90a] conte
nt not retrieved because server returned HTTP Status Code 304: Not Modified
2017-06-30 08:18:00,182 ERROR [NiFi logging handler] org.apache.nifi.StdErr [Timer-Driven Process Thread-10] INFO org.apache.nifi.processors.hadoop.GetHDFS - GetHDFS[id=b0d21ab8-1001-1159-15dd-4d380d420cab] Obtain
ed file listing in 181 milliseconds; listing had 0 items, 0 of which were new
the fact that the first run of the processor (after it was scheduled to run) works perfectly (it processes all the files in the directory from the previous day), but not the subsequent runs, makes me suspicious that the 'Directory' property is evaluated once and that the same value is used for each subsequent scheduled run, essentially pointing to the same directory during each run; the log says - "Obtained file listing in 181 milliseconds; listing had 0 items, 0 of which were new", that's what makes me think it's pointing to the same directory as the first run's. I was expecting the processor to evaluate the 'Directory' property for each scheduled run; does it do that ? if not, how do I make this work? Since Get* processors do not accept any inbound connections, I'm not able to calculate/evaluate the 'Directory' property first in a UpdateAttribute property and pass the correct value to GetHDFS. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
06-28-2017
05:20 PM
Thanks to @Bryan Bende, I needed to change the batch size property in GetHDFS, to read all files in the directory. https://community.hortonworks.com/questions/108547/need-clarification-on-how-nifi-processors-run-with.html#answer-109798
... View more
06-28-2017
05:13 PM
Thank you, it's the batch schedule that needed to be changed in my case.
... View more
06-28-2017
04:50 PM
@Bryan Bende thank you. this is my use case: GetHDFS is on CRON schedule to run daily at 12:30 am, to process files that were inserted in a HDFS directory; these files would be created the previous day. The GetHDFS processor does start at 12:30 am, as expected, but not all files from the directory are processed. So, ti seems the processor is not staying in the running state until all the files are processed. Is that the expected behavior since you are saying "the processor does not remain running." So, 1) at what time (how long after starting) does the processor stop 2) how do you control how long the processor should stay running after it was triggered to start (in my case, to let all files be processed) 3) when the processor is not running, does the icon on the processor(in the NiFi UI) change to the stopped state ? Thanks.
... View more
06-28-2017
02:45 PM
Hello, I'm trying to understand how NiFi processor "runs" work with CRON scheduler. I understand, by default the processor is running all the time (which is the 0 sec "Timer driven" schedule in all processors by default). When a processor is scheduled to run on a CRON driven schedule, I understand that the schedule dictates when the processor is triggered to run. But once the processor is triggered to run, how long does it stay running? does it stop after a certain amount of time? the CRON run schedule only specifies when and how often the processor should be triggered to start, but where do you specify how long it should run for and then stop; For example, let's say I set a Get* processor to run daily at 1 am; once the current system time is 1 am, the processor starts running, but does it ever stop once it is started by the scheduler, or it will stay running ? if it stays running, then it doesn't need to be triggered by the scheduler again the next day at 1 am, because it would already be running, right ? if it does stop after the scheduler triggers it to start, how long after starting does the processor stop and where do you specify how long should the processor run for. Thank you.
... View more
Labels:
- Labels:
-
Apache NiFi
06-26-2017
03:09 PM
Not sure why I need to schedule the GetHDFS processor to run continuously (I set to run every 15 seconds), but this schedule exhausts all files from the directory - 0/15 * * * * ? In my case since I'm loading files the next day (GetHDFS directory path points to previous day's directory), this resolves the issue I was facing.
... View more
06-26-2017
03:06 PM
@Shashank Chandhok actually, the files I'm trying to process are from the day before; in my directory path in GetHDFS processor, I'm using expression language to point to the directory that was created yesterday and the files in that directory are from yesterday. So when the CRON scheduler starts at 12:30 am, all files that would need to be processed should all be there already in that directory.
... View more
06-26-2017
01:41 PM
@Shashank Chandhok the schedule change to "0,30 30 0 * * ?" helped to read few additional files, but many files still remain in the directory
... View more