- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How Retrieve entire records in Hbase
- Labels:
-
Apache HBase
-
Apache NiFi
Created ‎05-29-2018 10:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody, I have a small problem with GetHabse, I manage to recover only the new recordings whereas my need is to recover with all the world all the recordings present in a table.
Thank you for your help.
Best Regards,
Koly
Created on ‎05-29-2018 11:35 AM - edited ‎08-17-2019 09:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.
(or)
By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.
In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again
Flow:
In left hand side of the flow screenshot you have to do
Step1: Stop GetHBase Processor
Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Step2: Clear the state of GetHBase processor
curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests
(or)
Use InvokeHTTP processor with HTTP method as POST to clear state requests.
Step3: Start GetHBase Processor
Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.
By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.
Let us know if you are facing any issues..!!
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created on ‎05-29-2018 11:35 AM - edited ‎08-17-2019 09:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use Scan Hbase processor introduced in NiFi-1.6 and this processor won't store the state.
(or)
By using RestApi you can clear the stored state in GetHbase processor before getting all the records from HBase table.
In this way we have to stop GetHbase processor first then Clear the state of GetHbase processor then StartHBase processor again
Flow:
In left hand side of the flow screenshot you have to do
Step1: Stop GetHBase Processor
Refer to this link how to stop the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Step2: Clear the state of GetHBase processor
curl -X POST http://localhost:8080/nifi-api/processors/<processor-id>/state/clear-requests
(or)
Use InvokeHTTP processor with HTTP method as POST to clear state requests.
Step3: Start GetHBase Processor
Refer to this link how to start the processor using RestAPI and use Chrome developer tools to view the what are the api calls are making while stopping the processor.
Now on the right hand side of the flow schedule GetHbase processor to run once i.e use Timer driven as scheduling strategy and keep run schedule as 10000 min..(or) etc.
By using this way we have to schedule only left hand side flow as in the step3 we are going to starting the GetHBase processor and the processor scheduled to run only once.
Let us know if you are facing any issues..!!
-
If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created ‎05-29-2018 01:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Q:after start GetHase, we must stop this processor?
A: Stopping of GetHBase processor is going to done by left hand side flow once you start the flow then in first step we are going to stop GetHBase processor.
in out Left Hand side of the flow we are going to schedule the processor by using Cron (or) Timer driven
- First step we are going to trigger shell script and the shell script is going to stop GetHBase processor.
- Second step clear the state of GetHBase processor
- Third step start GetHBase processor
Now once we start GetHBase processor and the processor will run based on the scheduling strategy.
Let's assume GetHBase processor scheduled to run for every 5 mins then processor will run every five minutes and checks is there any new records got added to HBase table.
What happens when we have scheduled the processor to run at 10000 min i.e ~167 days?
- Processor will run once we start the processor and the next run will be triggered after 10000 min.
- So the processor will run once in 167 days by using this scheduling we will make sure we are not going to run the processor again and again.
Q:it is imperative to plan the processor at 10000 min?
By using this scheduling strategy we are going to run the processor once 10000 min.
In addition if you want to make sure that all the records that got pulled off from the GetHbase processor before stopping again then you need to check ActiveThreadCount value from the GetHbase processor if ActiveThreadCount is 0 then only you need to stop the processor --> clear the state --> start again the processor.
Created ‎05-05-2020 11:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Shu_ashu I don't understand how ScanHBase would work as an alternative when it has an input requirement i.e. it can't be used as a root node in a graph for gathering records from HBase indiscriminately. Would you agree?
It seems like the only viable solution then is use the RestApi as you've suggested.
Created ‎05-29-2018 12:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Shu,
Thank a lot for answer i am undrestood it.
Just one question please, after start GetHase, we must stop this processor?
it is imperative to plan the processor at 10000 min?
Best Regards,
Koly
Created ‎05-29-2018 02:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Shu
Thank you a lot it's very good 🙂
