Created 05-29-2018 10:42 AM
Hello everybody, I have a small problem with GetHabse, I manage to recover only the new recordings whereas my need is to recover with all the world all the recordings present in a table.
Thank you for your help.
Best Regards,
Koly
Created on 05-29-2018 11:35 AM - edited 08-17-2019 09:52 PM
You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.
(or)
By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.
In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again
Flow:
In left hand side of the flow screenshot you have to do
Step1: Stop GetHBase Processor
Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Step2: Clear the state of GetHBase processor
curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests
(or)
Use InvokeHTTP processor with HTTP method as POST to clear state requests.
Step3: Start GetHBase Processor
Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.
By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.
Let us know if you are facing any issues..!!
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created on 05-29-2018 11:35 AM - edited 08-17-2019 09:52 PM
You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.
(or)
By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.
In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again
Flow:
In left hand side of the flow screenshot you have to do
Step1: Stop GetHBase Processor
Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Step2: Clear the state of GetHBase processor
curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests
(or)
Use InvokeHTTP processor with HTTP method as POST to clear state requests.
Step3: Start GetHBase Processor
Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.
By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.
Let us know if you are facing any issues..!!
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 05-29-2018 01:21 PM
Q:after start GetHase, we must stop this processor?
A: Stopping of GetHBase processor is going to done by left hand side flow once you start the flow then in first step we are going to stop GetHBase processor.
in out Left Hand side of the flow we are going to schedule the processor by using Cron (or) Timer driven
Now once we start GetHBase processor and the processor will run based on the scheduling strategy.
Let's assume GetHBase processor scheduled to run for every 5 mins then processor will run every five minutes and checks is there any new records got added to HBase table.
What happens when we have scheduled the processor to run at 10000 min i.e ~167 days?
Q:it is imperative to plan the processor at 10000 min?
By using this scheduling strategy we are going to run the processor once 10000 min.
In addition if you want to make sure that all the records that got pulled off from the GetHbase processor before stopping again then you need to check ActiveThreadCount value from the GetHbase processor if ActiveThreadCount is 0 then only you need to stop the processor --> clear the state --> start again the processor.
Created 05-05-2020 11:45 AM
@Shu_ashu I don't understand how ScanHBase would work as an alternative when it has an input requirement i.e. it can't be used as a root node in a graph for gathering records from HBase indiscriminately. Would you agree?
It seems like the only viable solution then is use the RestApi as you've suggested.
Created 05-29-2018 12:20 PM
Hi @Shu,
Thank a lot for answer i am undrestood it.
Just one question please, after start GetHase, we must stop this processor?
it is imperative to plan the processor at 10000 min?
Best Regards,
Koly
Created 05-29-2018 02:32 PM
Hi @Shu
Thank you a lot it's very good 🙂