Created on 08-14-2023 06:40 AM - edited 08-14-2023 06:42 AM
Hi,
I want to read only a finite sample from an Apache Phoenix table into Spark dataframe without using primary keys in where condition.
I tried using 'limit' clause in query as shown in the code below.
Following exception has occurred.
java.sql.SQLFeatureNotSupportedException: Wildcard in subqueries not supported. at org.apache.phoenix.compile.FromCompiler
Then if I use limit() method of dataframe as shown below.
map .put("query", "select * from table1");
Dataset<Row> df = spark.read().format("jdbc").options(map).load().limit(100);
In this case, spark is first reading all the data into dataframe then it is trying to filter the data. The mentioned 'table1' has millions of rows. I am getting timeout exception.
org.apache.phoenix.exception.PhoenixIOException: callTimeout
So, I want to read a sample of few records from Phoenix table in Apache Spark such that data filtering happens at the server side.
Can anyone please help in this?
Created 08-17-2023 03:31 AM
It is not a recommend way to get the data from Phoenix using Spark Jdbc [1]. Try to use Phoenix Spark Connector API [2].
Reference:
1. https://phoenix.apache.org/phoenix_spark.html
Created 08-17-2023 03:31 AM
It is not a recommend way to get the data from Phoenix using Spark Jdbc [1]. Try to use Phoenix Spark Connector API [2].
Reference:
1. https://phoenix.apache.org/phoenix_spark.html
Created 08-20-2023 09:01 AM