About vaibhavgokhale

vaibhavgokhale · ‎08-14-2023

Hi, I want to read only a finite sample from an Apache Phoenix table into Spark dataframe without using primary keys in where condition. I tried using 'limit' clause in query as shown in the code below. Map<String, String> map = new HashMap<>(); map .put("url", url ); map .put("driver", "org.apache.phoenix.jdbc.PhoenixDriver"); map .put("query", "select * from table1 limit 100"); Dataset<Row> df= spark.read().format("jdbc").options(map).load(); Following exception has occurred. java.sql.SQLFeatureNotSupportedException: Wildcard in subqueries not supported. at org.apache.phoenix.compile.FromCompiler Then if I use limit() method of dataframe as shown below. map .put("query", "select * from table1"); Dataset<Row> df = spark.read().format("jdbc").options(map).load().limit(100); In this case, spark is first reading all the data into dataframe then it is trying to filter the data. The mentioned 'table1' has millions of rows. I am getting timeout exception. org.apache.phoenix.exception.PhoenixIOException: callTimeout So, I want to read a sample of few records from Phoenix table in Apache Spark such that data filtering happens at the server side. Can anyone please help in this?

vaibhavgokhale · ‎07-04-2023

Thanks @smruti for quick response. This is working.

vaibhavgokhale · ‎07-03-2023

Hi, I am running a query on Hive through my Spark application using HiveWarehouseConnector. I want to use a particular YARN queue for Tez job launched by HiveWarehouseConnector (custom queue configuration at application level). I have tried following two ways: 1. By using Spark conf and setting spark.hive.tez.queue.name = <queue name> 2. By setting tez.queue.name parameter in the hiveserver2 URL as suggested in the following thread https://community.cloudera.com/t5/Support-Questions/Setting-yarn-queue-for-hive-with-beeline/td-p/161499 I am able to set the queue for beeline using the URL option. However, any of the options is not working for HiveWarehouseConnector. Can anyone please help in this regard?

Online	Offline
Last Visited	‎08-15-2023 02:28 PM

Member Since	‎06-29-2023 09:37 AM
Last Visited	‎08-15-2023 02:28 PM
Posts	3

Cloudera Community

Read a random sample of data in Apache Spark from ...

Re: Configuring Tez queue for Hive Warehouse Conne...

Configuring Tez queue for Hive Warehouse Connector