Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Problem with PutHiveStreaming in HDP 3.x

avatar
Contributor

Hello Everybody ,

I am currently trying to test a flow using the PutHiveStreaming processor in NiFi-1.7.x . I'm getting the following error . However I'm able to execute the same flow in another/lower environment . It is not working in Prod environment and throws me this error in the Nifi-app.log

kindly help me is that the problem with Latest HIveserver2 in HDP or is it nifi specific problem . Do I need to use PutHive3streaming instead of PutHivestreaming

2018-11-14 06:26:23,553 ERROR [Timer-Driven Process Thread-5] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=e34c533f-04c8-37af-b523-02eee3925269] Error connecting to Hive endpoint: table performanc
e_metrics at thrift://prodcluster:9083
2018-11-14 06:26:23,553 ERROR [Timer-Driven Process Thread-5] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=e34c533f-04c8-37af-b523-02eee3925269] Hive Streaming connect/write error, flow file will
be penalized and routed to retry. org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', partitionVals=[santaclara] }: org.apache.nifi.processors.hive.PutHiveStreaming$ShouldRetryException: Hive Streaming connect/write error, flow file will be penalized and routed to retry. org.a
pache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', partitionVals=[LLO1Y] }
org.apache.nifi.processors.hive.PutHiveStreaming$ShouldRetryException: Hive Streaming connect/write error, flow file will be penalized and routed to retry. org.apache.nifi.util.hive.HiveWriter$ConnectFailure: F
ailed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed, partitionVals=[santaclara] }
        at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:630)
        at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54)
        at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:648)
        at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148)
        at org.apache.nifi.processors.hive.PutHiveStreaming$1.process(PutHiveStreaming.java:839)
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2211)
        at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2179)
        at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:792)
        at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:658)
        at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
        at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
        at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:658)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', parti
tionVals=[] }
        at org.apache.nifi.util.hive.HiveWriter.newConnection(HiveWriter.java:249)
        at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:70)
        at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
        at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:1138)
        at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:1049)
        at org.apache.nifi.processors.hive.PutHiveStreaming.access$1000(PutHiveStreaming.java:113)
        at org.apache.nifi.processors.hive.PutHiveStreaming$1.lambda$process$2(PutHiveStreaming.java:842)
        at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
        ... 18 common frames omitted



1 ACCEPTED SOLUTION

avatar
Master Guru

If you are using HDP 3.x, then you are using Hive 3. Apache NiFi 1.7+ has Hive 3 versions of the Hive processors, so you will want to use PutHive3Streaming instead of PutHiveStreaming.

Having said that, the Apache NiFi distribution does not include the Hive 3 NAR by default (due to its size). HDF 3.2 NiFi does include the Hive 3 NAR, so perhaps try that instead of the Apache NiFi distribution. Otherwise you can download the Apache NiFi 1.7.1 Hive 3 NAR here.

I recommend using HDF NiFi with HDP rather than Apache NiFi, as HDF has HDP-specific Hive JARs, where Apache NiFi uses Apache Hive JARs. Often the two can work interchangeably, but there are some times where HDP Hive is not compatible with a corresponding Apache Hive version, and for that you'll want HDP Hive JARs, which are bundled with HDF NiFi.

View solution in original post

5 REPLIES 5

avatar
Master Guru

If you are using HDP 3.x, then you are using Hive 3. Apache NiFi 1.7+ has Hive 3 versions of the Hive processors, so you will want to use PutHive3Streaming instead of PutHiveStreaming.

Having said that, the Apache NiFi distribution does not include the Hive 3 NAR by default (due to its size). HDF 3.2 NiFi does include the Hive 3 NAR, so perhaps try that instead of the Apache NiFi distribution. Otherwise you can download the Apache NiFi 1.7.1 Hive 3 NAR here.

I recommend using HDF NiFi with HDP rather than Apache NiFi, as HDF has HDP-specific Hive JARs, where Apache NiFi uses Apache Hive JARs. Often the two can work interchangeably, but there are some times where HDP Hive is not compatible with a corresponding Apache Hive version, and for that you'll want HDP Hive JARs, which are bundled with HDF NiFi.

avatar
Contributor

Sure . Thanks for the Reply@Matt Burgess . As of now I'm using HDP Hive along with HDF Nifi . However I can see that Recordreader is included in the PutHive3Streaming . What would be option I need to use for PutHive3Streaming. This is seems little new to me. Do you have any sample document to configure PutHive3Streaming.

Current version is HDF 3.2

avatar
Master Guru

If your incoming data is in Avro format with an embedded schema (i.e. what you would need to use the Hive 1 version PutHiveStreaming), then you can add a Controller Service of type "AvroReader". Configure that to Use Embedded Schema (as the Schema Access Strategy), then Apply and Enable it. Then go back to PutHive3Streaming and select that as your reader.

If you have data in other formats (JSON, XML, CSV, etc.) you'd need to specify the schema for that data somehow (it is not embedded in the file as Avro is). See this blog for more details on the Record Reader/Writer stuff in NiFi. It may take a little bit to get used to it (and to configure it), but it is very flexible and very powerful and worth getting familiar with.

avatar
Contributor

Thanks @Matt Burgess . I would definitely have a look and get used to it . I'm thinking how do you identify the output of the processor . My flow which is working 2.x is listfile-fetchfile-customprocessor-partitionrecord-mergecontent-puthivestreaming.

As you suggested I have added 1.7.1 Nar file . I apologies It HDF3.1 not HDF 3.2 . So basically you are asking me to use this path right -->listfile-fetchfile-customprocessor-partitionrecord-mergecontent-puthivestreaming(Avroreader,embdededschema)->puthive3streaming(Avroreader) . I believe I can use the the mergecontent->PHS alone instead of mergecontent->PH3S ..

Getting o.a.n.controller.tasks.ConnectableTask Administratively Yielding PutHive3Streaming[id=1666c555-0167-1000-0000-0000763cd449] due to uncaught Exception: java.lang.NullPointerException java.lang.NullPointerException: null and permission denied error . However I have given 777 permission . still getting permission denied error . HDF latest version seems to be a headache for the users.nifi.txt

  1. -rw-r-----1 nifi nifi 197MJul1213:41 nifi-hive3-nar-1.7.1.nar
  2. -rw-r-----1 nifi nifi 15KAug715:26 nifi-hive-services-api-nar-1.7.0.3.2.0.0-520.nar
  3. -rw-r-----1 nifi nifi 106MAug715:26 nifi-hive-nar-1.7.0.3.2.0.0-520.nar
  4. -rw-r-----1 nifi nifi 260MAug715:27 nifi-hive3-nar-1.7.0.3.2.0.0-520.nar

avatar
Master Guru

If you are using HDF, you'll need HDF 3.2, as IIRC the Hive 3 processors were not available in HDF 3.1. If you upgrade to HDF 3.2 the Hive 3 processors will already be in there. In your flow above you list PutHiveStreaming into PutHive3Streaming, you should only use PutHiveStreaming against Hive 1.x, and PutHive3Streaming on Hive 3.1+.