About fairoz172

fairoz172 · ‎09-05-2018

Hi Josh Elser, Thanks for your response, yes it looks like the same issue as reported in PHOENIX-4489.

fairoz172 · ‎08-29-2018

Hi, I am using Hbase-1.1 to store our data through Apache Phoenix-4.11 which provides the SQL interface for Hbase. I am using Spark-2.1.1 to analyze the data stored in hbase tables. I am loading those tables from hbase as dataframe and running sql queries using Spark-sql. I am using Apache Spark plugin provided by Apache Phoenix to connect Spark with Hbase (https://phoenix.apache.org/phoenix_spark.html). This is how I am loading the hbase tables // Step 1: Registering few main-tables(4-5 tables) with spark-sql Map<String, String> map = new HashMap<>(); map.put("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure"); for(String tableName : tableNames){ map.put("table", tableName); logger.info("Registering table = "+ tableName); logger.info("map = "+ map); Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map); df.registerTempTable(tableName); } and then I am running some set of sql queries like this // Step 2: Running few set of sql queries to filter-out the data, registering intermediate data as temp-tables and using it in the next query, saving the final result-set to csv file and removing all the the intermediate temp-tables List<String> tempTableLIst = new ArrayList<>(); selectResult = sparkSession.sql(selectQry); selectResult.registerTempTable(tempTableName); tempTableLIst.add(tempTableName); // running further queries using these newly registered tempTableName .... .... selectResult = sparkSession.sql(selectQry); selectResult.registerTempTable(tempTableName); tempTableLIst.add(tempTableName); .... .... //Finally saving the filtered data from dataframe to csv selectResult.write().mode("overwrite").csv(outputFilePath); //Removing all the temp tables for(String tableName : tempTableLIst){ sparkSession.sqlContext().dropTempTable(tableName); } " Step 2" is repeated multiple times, I notice that the number of hbase open connection is getting increased with each iteration which finally resulting in the job failure because zookeeper is denying the further connection. We increased the maxClientCnxns in zookeeper to 2000 but the open connection is going beyond that also. I have no idea why spark is opening so many connections with hbase (zookeeper), why it is not closing / reusing the old open connections. Please share if you have any info/idea about this issue, that would be of great help. Thanks, Fairoz

fairoz172 · ‎05-21-2018

@Sampath Kumar DId you found the root cause and solution for this problem. I am facing the same issue. Please post the solution, it will help others. Thanks, Fairoz

fairoz172 · ‎01-19-2018

Here how I solved this problem, sharing it so that if someone else also face this issue then it will be helpful for him I am loading the two datasets separately and then registering them as temp table and now I am able to run the join query using those two tables. Below is the sample code. String table1 = "TABLE_1"; Map<String, String> map = new HashMap<>(); map.put("zkUrl", ZOOKEEPER_URL); map.put("table", table1); Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map); df.registerTempTable(tableName); String table2 = "TABLE_2"; map = new HashMap<>(); map.put("zkUrl", ZOOKEEPER_URL); map.put("table", table2); Dataset<Row> df2 = sparkSession.sqlContext().load("org.apache.phoenix.spark", map); df2.registerTempTable(table2); Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 as A JOIN TABLE_2 as B ON A.COLUMN_1 = B.COLUMN_2 WHERE B.COLUMN_2 = 'XYZ' ");

fairoz172 · ‎01-19-2018

I want to connect to apache phoenix from spark and run join sql query. As suggested by Phoenix official website, they have given an example on how to connect to phoenix from spark but it takes single phoenix table name in the configuration. see the example below Map<String, String> map = new HashMap<>(); map.put("zkUrl", ZOOKEEPER_URL); map.put("table", "TABLE_1"); Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map); df.registerTempTable("TABLE_1"); Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE COLUMN_1 = 'ABC' "); In my phoenix-hbase database I have two tables TABLE_1 and TABLE_2 I have one sql query like this SELECT * FROM TABLE_1 as A JOIN TABLE_2 as B ON A.COLUMN_1 = B.COLUMN_2 WHERE B.COLUMN_2 = 'XYZ'; How I can run this query using Phoenix-Spark connection? Thanks

Online	Offline
Last Visited	‎09-05-2018 09:04 AM

Member Since	‎01-19-2018 07:03 AM
Last Visited	‎09-05-2018 09:04 AM
Posts	5

Cloudera Community

Re: How to use Spark-Phoenix connection to run joi...

Re: Spark, Apache Phoenix and Hbase, connection ut...

Spark, Apache Phoenix and Hbase, connection utiliz...

Re: YARN: AM Container exited with exitCode: 1

Re: How to use Spark-Phoenix connection to run joi...

How to use Spark-Phoenix connection to run join qu...