Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Capture failures from puthivestreaming

Capture failures from puthivestreaming

Hi,

I am trying to capture the failures when I am writing to Hive table.

the scenario I am testing is, I want to capture the data when my Hive is down or my entire Hadoop cluster is down

I am writing retry and failures from Puthivestreaming to a local file system, I can see the files written to retry but not on failure.

looks like it never fails, I saw some suggestion to retry for 3/4 times and then treat that as a failure but in my case when Hive is down it should fail at first instance.

In another scenario I was trying for folder permission, I have removed the folder permission for the table Puthivestreaming is writing even in this case it reties but never fails.

when I redirect retry to Puthivestreaming itself can I configure to retry thrice and fail?

Please suggest me how to configure Puthivestreaming to fail

Regards,

~Sri

4 REPLIES 4
Highlighted

Re: Capture failures from puthivestreaming

Super Guru
@Srinatha Anantharaman

You can probably use Retry loop in this case.

Loop:

  1. keep counter value for the run
  2. increment the counter value in each run
  3. send it to failure using RouteOnAttribute processor
  4. store the failed data into Local File system.

Refer to this link for Retry loop implementation.

Highlighted

Re: Capture failures from puthivestreaming

Shu,

That is the workaround I am thinking about. My concern is when my Hive is down OR no permission to write in such cases even if you retry 100 times it is going to fail, I want to fail at first instance itself. Unless you fix the root cause Puthivestreaming will never be succeeded. My concern is is it a bug in Puthivestreaming Or I have not configured it properly.

Since Failure is not working in above scenarios I am stopping the processor with API call on first instance of Retry.

Thanks& Regards,

~Sri

Highlighted

Re: Capture failures from puthivestreaming

Super Guru
@Srinatha Anantharaman

As per puthivestreaming documentation below:

93273-screen-shot-2018-11-15-at-64315-pm.png

Flowfile will be transferred to failure relationship if the record could not transmitted to hive.

Highlighted

Re: Capture failures from puthivestreaming

Shu,

To test above condition I brought down Hive and same time trying to ingest data using Puthivestreaming

It throws below errors in Nifi-app.log but in flowfile it never goes to failure or retry

2018-11-27 15:10:42,146 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.hive.PutHiveStreaming PutHiveStreaming[id=80198e2c-18b2-3722-b3be-4d97c2b7cf6c] org.apache.nifi.processors.hive.PutHiveStreaming$Lambda$928/1889725558@38ef0670 failed to process due to org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@2939c3df] to Hive Streaming transaction due to java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient; rolling back session: org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@2939c3df] to Hive Streaming transaction due to java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient org.apache.nifi.processor.exception.ProcessException: Error writing [org.apache.nifi.processors.hive.PutHiveStreaming$HiveStreamingRecord@2939c3df] to Hive Streaming transaction due to java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:640) at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:647) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148) at org.apache.nifi.processors.hive.PutHiveStreaming$1.process(PutHiveStreaming.java:838) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2207) at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2175) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:791) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:657) at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114) at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:657) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:91) at org.apache.hive.hcatalog.common.HiveClientCache.getNonCachedHiveMetastoreClient(HiveClientCache.java:85) at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:546) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.getMetaStoreClient(HiveEndPoint.java:448) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:274) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157)

Now I am handling failure and retry for puthivestreaming. I want to kill puthivesreaming as soon as it reaches failure/retry but it not reaching there

Regards,

~Sri

Don't have an account?
Coming from Hortonworks? Activate your account here