- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Problem with PutHiveStreaming in HDP 3.x
- Labels:
-
Apache Hive
-
Apache NiFi
Created ‎11-14-2018 09:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everybody ,
I am currently trying to test a flow using the PutHiveStreaming processor in NiFi-1.7.x . I'm getting the following error . However I'm able to execute the same flow in another/lower environment . It is not working in Prod environment and throws me this error in the Nifi-app.log
kindly help me is that the problem with Latest HIveserver2 in HDP or is it nifi specific problem . Do I need to use PutHive3streaming instead of PutHivestreaming
2018-11-14 06:26:23,553 ERROR [Timer-Driven Process Thread-5] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=e34c533f-04c8-37af-b523-02eee3925269] Error connecting to Hive endpoint: table performanc e_metrics at thrift://prodcluster:9083 2018-11-14 06:26:23,553 ERROR [Timer-Driven Process Thread-5] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=e34c533f-04c8-37af-b523-02eee3925269] Hive Streaming connect/write error, flow file will be penalized and routed to retry. org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', partitionVals=[santaclara] }: org.apache.nifi.processors.hive.PutHiveStreaming$ShouldRetryException: Hive Streaming connect/write error, flow file will be penalized and routed to retry. org.a pache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', partitionVals=[LLO1Y] } org.apache.nifi.processors.hive.PutHiveStreaming$ShouldRetryException: Hive Streaming connect/write error, flow file will be penalized and routed to retry. org.apache.nifi.util.hive.HiveWriter$ConnectFailure: F ailed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed, partitionVals=[santaclara] } at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:630) at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:648) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148) at org.apache.nifi.processors.hive.PutHiveStreaming$1.process(PutHiveStreaming.java:839) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2211) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2179) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:792) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:658) at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114) at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:658) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://prodcluster:9083', database='renamed.db', table='renamed', parti tionVals=[] } at org.apache.nifi.util.hive.HiveWriter.newConnection(HiveWriter.java:249) at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:70) at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46) at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:1138) at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:1049) at org.apache.nifi.processors.hive.PutHiveStreaming.access$1000(PutHiveStreaming.java:113) at org.apache.nifi.processors.hive.PutHiveStreaming$1.lambda$process$2(PutHiveStreaming.java:842) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127) ... 18 common frames omitted
Created ‎11-14-2018 09:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using HDP 3.x, then you are using Hive 3. Apache NiFi 1.7+ has Hive 3 versions of the Hive processors, so you will want to use PutHive3Streaming instead of PutHiveStreaming.
Having said that, the Apache NiFi distribution does not include the Hive 3 NAR by default (due to its size). HDF 3.2 NiFi does include the Hive 3 NAR, so perhaps try that instead of the Apache NiFi distribution. Otherwise you can download the Apache NiFi 1.7.1 Hive 3 NAR here.
I recommend using HDF NiFi with HDP rather than Apache NiFi, as HDF has HDP-specific Hive JARs, where Apache NiFi uses Apache Hive JARs. Often the two can work interchangeably, but there are some times where HDP Hive is not compatible with a corresponding Apache Hive version, and for that you'll want HDP Hive JARs, which are bundled with HDF NiFi.
Created ‎11-14-2018 09:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using HDP 3.x, then you are using Hive 3. Apache NiFi 1.7+ has Hive 3 versions of the Hive processors, so you will want to use PutHive3Streaming instead of PutHiveStreaming.
Having said that, the Apache NiFi distribution does not include the Hive 3 NAR by default (due to its size). HDF 3.2 NiFi does include the Hive 3 NAR, so perhaps try that instead of the Apache NiFi distribution. Otherwise you can download the Apache NiFi 1.7.1 Hive 3 NAR here.
I recommend using HDF NiFi with HDP rather than Apache NiFi, as HDF has HDP-specific Hive JARs, where Apache NiFi uses Apache Hive JARs. Often the two can work interchangeably, but there are some times where HDP Hive is not compatible with a corresponding Apache Hive version, and for that you'll want HDP Hive JARs, which are bundled with HDF NiFi.
Created ‎11-14-2018 09:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure . Thanks for the Reply@Matt Burgess . As of now I'm using HDP Hive along with HDF Nifi . However I can see that Recordreader is included in the PutHive3Streaming . What would be option I need to use for PutHive3Streaming. This is seems little new to me. Do you have any sample document to configure PutHive3Streaming.
Current version is HDF 3.2
Created ‎11-14-2018 10:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your incoming data is in Avro format with an embedded schema (i.e. what you would need to use the Hive 1 version PutHiveStreaming), then you can add a Controller Service of type "AvroReader". Configure that to Use Embedded Schema (as the Schema Access Strategy), then Apply and Enable it. Then go back to PutHive3Streaming and select that as your reader.
If you have data in other formats (JSON, XML, CSV, etc.) you'd need to specify the schema for that data somehow (it is not embedded in the file as Avro is). See this blog for more details on the Record Reader/Writer stuff in NiFi. It may take a little bit to get used to it (and to configure it), but it is very flexible and very powerful and worth getting familiar with.
Created ‎11-15-2018 08:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Matt Burgess . I would definitely have a look and get used to it . I'm thinking how do you identify the output of the processor . My flow which is working 2.x is listfile-fetchfile-customprocessor-partitionrecord-mergecontent-puthivestreaming.
As you suggested I have added 1.7.1 Nar file . I apologies It HDF3.1 not HDF 3.2 . So basically you are asking me to use this path right -->listfile-fetchfile-customprocessor-partitionrecord-mergecontent-puthivestreaming(Avroreader,embdededschema)->puthive3streaming(Avroreader) . I believe I can use the the mergecontent->PHS alone instead of mergecontent->PH3S ..
Getting o.a.n.controller.tasks.ConnectableTask Administratively Yielding PutHive3Streaming[id=1666c555-0167-1000-0000-0000763cd449] due to uncaught Exception: java.lang.NullPointerException java.lang.NullPointerException: null and permission denied error . However I have given 777 permission . still getting permission denied error . HDF latest version seems to be a headache for the users.nifi.txt
- -rw-r-----1 nifi nifi 197MJul1213:41 nifi-hive3-nar-1.7.1.nar
- -rw-r-----1 nifi nifi 15KAug715:26 nifi-hive-services-api-nar-1.7.0.3.2.0.0-520.nar
- -rw-r-----1 nifi nifi 106MAug715:26 nifi-hive-nar-1.7.0.3.2.0.0-520.nar
- -rw-r-----1 nifi nifi 260MAug715:27 nifi-hive3-nar-1.7.0.3.2.0.0-520.nar
Created ‎11-16-2018 06:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using HDF, you'll need HDF 3.2, as IIRC the Hive 3 processors were not available in HDF 3.1. If you upgrade to HDF 3.2 the Hive 3 processors will already be in there. In your flow above you list PutHiveStreaming into PutHive3Streaming, you should only use PutHiveStreaming against Hive 1.x, and PutHive3Streaming on Hive 3.1+.
