Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How Retrieve entire records in Hbase

avatar
Explorer

Hello everybody, I have a small problem with GetHabse, I manage to recover only the new recordings whereas my need is to recover with all the world all the recordings present in a table.

Thank you for your help.

Best Regards,

Koly

1 ACCEPTED SOLUTION

avatar
Master Guru
@Koly SALL

You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.

(or)

By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.

In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again

Flow:

77409-flow.png

In left hand side of the flow screenshot you have to do

Step1: Stop GetHBase Processor

Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.

Step2: Clear the state of GetHBase processor

curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests

(or)

Use InvokeHTTP processor with HTTP method as POST to clear state requests.


Step3: Start GetHBase Processor

Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.

Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.

By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.

Let us know if you are facing any issues..!!

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

5 REPLIES 5

avatar
Master Guru
@Koly SALL

You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.

(or)

By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.

In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again

Flow:

77409-flow.png

In left hand side of the flow screenshot you have to do

Step1: Stop GetHBase Processor

Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.

Step2: Clear the state of GetHBase processor

curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests

(or)

Use InvokeHTTP processor with HTTP method as POST to clear state requests.


Step3: Start GetHBase Processor

Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.

Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.

By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.

Let us know if you are facing any issues..!!

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Master Guru
@Koly SALL

Q:after start GetHase, we must stop this processor?

A: Stopping of GetHBase processor is going to done by left hand side flow once you start the flow then in first step we are going to stop GetHBase processor.

in out Left Hand side of the flow we are going to schedule the processor by using Cron (or) Timer driven

  • First step we are going to trigger shell script and the shell script is going to stop GetHBase processor.
  • Second step clear the state of GetHBase processor
  • Third step start GetHBase processor

Now once we start GetHBase processor and the processor will run based on the scheduling strategy.

Let's assume GetHBase processor scheduled to run for every 5 mins then processor will run every five minutes and checks is there any new records got added to HBase table.

What happens when we have scheduled the processor to run at 10000 min i.e ~167 days?

  • Processor will run once we start the processor and the next run will be triggered after 10000 min.
  • So the processor will run once in 167 days by using this scheduling we will make sure we are not going to run the processor again and again.

Q:it is imperative to plan the processor at 10000 min?

By using this scheduling strategy we are going to run the processor once 10000 min.

In addition if you want to make sure that all the records that got pulled off from the GetHbase processor before stopping again then you need to check ActiveThreadCount value from the GetHbase processor if ActiveThreadCount is 0 then only you need to stop the processor --> clear the state --> start again the processor.

avatar
New Contributor

@Shu_ashu I don't understand how ScanHBase would work as an alternative when it has an input requirement i.e. it can't be used as a root node in a graph for gathering records from HBase indiscriminately.  Would you agree?

 

It seems like the only viable solution then is use the RestApi as you've suggested.

avatar
Explorer

Hi @Shu,

Thank a lot for answer i am undrestood it.

Just one question please, after start GetHase, we must stop this processor?

it is imperative to plan the processor at 10000 min?

Best Regards,

Koly

avatar
Explorer

Hi @Shu

Thank you a lot it's very good 🙂