- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hbase filter query using pyspark
- Labels:
-
Apache HBase
-
Apache Spark
Created 01-30-2024 09:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to pull the records from the Hbase based on the row key in pyspark using the following details
but the records which is returning showing all the records (i,e without filter), can you please help me to resolving the issue
Created 02-04-2024 07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Taries
I hope you are doing good. Do you need any further help on this issue. If above solutions is helped in your case please accept the Solution. It will help for others.
Created 01-30-2024 11:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Taries, Welcome to our community! To help you get the best possible answer, I have tagged in our HBase and Spark experts @smdas @RangaReddy who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 01-31-2024 12:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Taries
You need to use the following two parameters to apply filter.
hbase.spark.query.timerange.start
hbase.spark.query.timerange.end
Reference:
Created on 01-31-2024 05:29 AM - edited 01-31-2024 06:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
with the above given options, it is only taking the timerange start and end , not considering the rowkey passed along with it.
Created on 01-31-2024 06:57 AM - edited 01-31-2024 06:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Taries
As I mentioned previously, only the hbase.spark.query.timerange parameter can be used for filtering data during read. The hbase.spark.scan parameter wouldn't be set for this purpose.
To filter the data after reading, you can apply a Spark WHERE or filter clause with your desired conditions.
Created 02-04-2024 07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Taries
I hope you are doing good. Do you need any further help on this issue. If above solutions is helped in your case please accept the Solution. It will help for others.
