About Arun_Tejasvi

Arun_Tejasvi · ‎08-06-2025

Hi @Anshul_Gupta, Please find the answers below, This is the first time we are testing this use case. Currently, we are using CDP version 7.1 with spark 3.3.2. Phoenix only supports creating views over a single table using simple SELECT * queries. Complex queries, including those with joins, are not supported. This limitation is documented in the Phoenix documentation under the limitations section - https://phoenix.apache.org/views.html Loading the two tables separately and performing the join in Spark could lead to performance issues, especially if the datasets are large. As I mentioned earlier, this is the first time we're testing this approach. The issue only occurs when the query is executed through Spark; when run directly in sqlline, the query executes successfully without any problems. Please let me know if you need any additional information. I'm happy to help.

Arun_Tejasvi · ‎08-01-2025

Hi, I'm attempting to run a simple join query between two tables in an Apache Phoenix database on a Spark cluster. However, when I try to read from the resulting DataFrame after executing the query, I encounter a NullPointerException. Query : (SELECT CONNTEST.EMPLOYEEFILTERJOIN.EMP_ID,CONNTEST.EMPLOYEEFILTERJOIN.NAME,CONNTEST.EMPLOYEEFILTERJOIN.SALARY,CONNTEST.DEPARTMENTFILTERJOIN.DEPT_ID,CONNTEST.DEPARTMENTFILTERJOIN.DETP_NAME FROM CONNTEST.EMPLOYEEFILTERJOIN,CONNTEST.DEPARTMENTFILTERJOIN WHERE CONNTEST.EMPLOYEEFILTERJOIN.EMP_ID=CONNTEST.DEPARTMENTFILTERJOIN.EMP_ID) query_alias Below is the code I'm executing, val spark = SparkSession.builder() .appName("PhoenixWriteIssueSimulator") .master("local[*]") .config("spark.sql.catalogImplementation", "in-memory") .getOrCreate() val jdbcProperties = new Properties() jdbcProperties.setProperty("driver","org.apache.phoenix.jdbc.PhoenixDriver") var df = spark.read .jdbc(phoenixJdbcUrl, phoenixSourceQuery, jdbcProperties) var df = spark.read .jdbc( url = s"${phoenixJdbcUrl}", table = s"${phoenixSourceQuery}", properties = jdbcProperties ); df.show(20); df.show(20) is failing with the below stacktrace, 25/08/01 15:34:51 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException at org.apache.phoenix.util.SchemaUtil.getSchemaNameFromFullName(SchemaUtil.java:713) at org.apache.phoenix.schema.TableNotFoundException.<init>(TableNotFoundException.java:44) at org.apache.phoenix.compile.FromCompiler$MultiTableColumnResolver.resolveTable(FromCompiler.java:1013) at org.apache.phoenix.compile.JoinCompiler$ColumnRefParseNodeVisitor.visit(JoinCompiler.java:1399) at org.apache.phoenix.compile.JoinCompiler$ColumnRefParseNodeVisitor.visit(JoinCompiler.java:1371) at org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56) at org.apache.phoenix.util.ParseNodeUtil.applyParseNodeVisitor(ParseNodeUtil.java:67) at org.apache.phoenix.compile.JoinCompiler.compile(JoinCompiler.java:172) at org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:248) at org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:178) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:543) at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:506) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:311) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:300) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:299) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:292) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:190) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$14.apply(Executor.scala:459) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) I have also checked the schame by printing df.printSchema() below is the output for the same, root |-- CONNTEST.EMPLOYEEFILTERJOIN.EMP_ID: integer (nullable = true) |-- CONNTEST.EMPLOYEEFILTERJOIN.NAME: string (nullable = true) |-- CONNTEST.EMPLOYEEFILTERJOIN.SALARY: integer (nullable = true) |-- CONNTEST.DEPARTMENTFILTERJOIN.DEPT_ID: integer (nullable = true) |-- CONNTEST.DEPARTMENTFILTERJOIN.DETP_NAME: string (nullable = true) After examining the Phoenix-Client-Hbase*.jar(phoenix-client-hbase-2.2-5.1.1.7.1.7.0-551.jar), which is the official Phoenix JDBC driver, I found that when executing a query containing a join condition, the driver calls the JoinCompiler.compile method. In this method, I noticed that the Select statement uses select columns enclosed in double quotes. However, when the JoinCompiler$ColumnRefParseNodeVisitor.visit method processes these column nodes, both their table name and schema name are null. For example, the column name appears as ""CONNTEST.EMPLOYEEFILTERJOIN.EMP_ID"" (with two double quotes at the start and end). This causes a NullPointerException for these columns. I suspected that the issue might be due to using fully qualified column names (schema.table.column) in the query, so I tried running this query: (SELECT emp.EMP_ID, emp.NAME, emp.SALARY, dep.DEPT_ID, dep.DETP_NAME FROM CONNTEST.EMPLOYEEFILTERJOIN emp, CONNTEST.DEPARTMENTFILTERJOIN dep WHERE emp.EMP_ID = dep.EMP_ID) query_alias However, the problem persists even with this query. Could someone help me identify the root cause of this issue and clarify whether it might be a bug in the Phoenix JDBC driver?

Online	Offline
Last Visited	‎08-06-2025 09:42 PM

Member Since	‎08-01-2025 03:11 AM
Last Visited	‎08-06-2025 09:42 PM
Posts	2

Cloudera Community

Re: Getting NullPointer Exception when a join quer...

Getting NullPointer Exception when a join query is...