Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Read a random sample of data in Apache Spark from Phoenix table

avatar
New Contributor

Hi,

 

I want to read only a finite sample from an Apache Phoenix table into Spark dataframe without using primary keys in where condition.

 

I tried using 'limit' clause in query as shown in the code below.

 

Map<String, String> map = new HashMap<>();
map .put("url", url );
map .put("driver", "org.apache.phoenix.jdbc.PhoenixDriver");
map .put("query", "select * from table1 limit 100");
 
Dataset<Row> df= spark.read().format("jdbc").options(map).load();

 

Following exception has occurred.

java.sql.SQLFeatureNotSupportedException: Wildcard in subqueries not supported. at org.apache.phoenix.compile.FromCompiler

 

Then if I use limit() method of dataframe as shown below.

map .put("query", "select * from table1");

Dataset<Row> df = spark.read().format("jdbc").options(map).load().limit(100);

 

In this case, spark is first reading all the data into dataframe then it is trying to filter the data. The mentioned 'table1' has millions of rows. I am getting timeout exception.

 

org.apache.phoenix.exception.PhoenixIOException: callTimeout

 

So, I want to read a sample of few records from Phoenix table in Apache Spark such that data filtering happens at the server side.

 

Can anyone please help in this?

 

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @vaibhavgokhale 

 

It is not a recommend way to get the data from Phoenix using Spark Jdbc [1]. Try to use Phoenix Spark Connector API [2].

 

Reference:

1. https://phoenix.apache.org/phoenix_spark.html

2. https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/phoenix-access-data/topics/phoenix-understand...

 

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hi @vaibhavgokhale 

 

It is not a recommend way to get the data from Phoenix using Spark Jdbc [1]. Try to use Phoenix Spark Connector API [2].

 

Reference:

1. https://phoenix.apache.org/phoenix_spark.html

2. https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/phoenix-access-data/topics/phoenix-understand...

 

avatar
Super Collaborator

Hi @vaibhavgokhale 

 

Please accept the answer if you satisfied with above solution.