Support Questions
Find answers, ask questions, and share your expertise

How filter condition working in spark dataframe?

Highlighted

How filter condition working in spark dataframe?

Contributor

I have a table in hbase with 1 billions records.I want to filter the records based on certain condition (by date).

For example:

  1. Dataframe.filter(col(date)=== todayDate)

Filter will be applied after all records from the table will be loaded into memory or I will get filtered records?

1 REPLY 1
Highlighted

Re: How filter condition working in spark dataframe?

Expert Contributor

@senthil kumar spark with push down the predicates to the datasource , hbase in this case, it will keep the resultant data frame after the filter in memory. For most databases as well spark will do push down. It does not do this blindly though. Spark will assess all the operations that will happen on data frame and based on it build a execution plan and decide it should do a push down or do it in memory. For small tables, it might make sense to do in memory.