Created 11-03-2016 09:02 PM
I am testing some code using Phoenix Spark plug in to read a Phoenix table with a namespace prefix in the table name (the table is created as a phoenix table not a hbase table), but it returns an TableNotFoundException.
The table is obviously there because I can query it using plain phoenix sql through Squirrel. In addition, using spark sql to query it has no problem at all.
I am running on the HDP 2.5 platform, with phoenix 4.7.0.2.5.0.0-1245
The problem does not exist at all when I was running the same code on HDP 2.4 cluster, with phoenix 4.4.
Neither does the problem occur when I query a table without a namespace prefix in the DB table name, on HDP 2.5
The log is in the attached file: tableNoFound.txt
My testing code is listed below:
The weird thing is in the attached code, if I run testSpark alone it gives the above exception, but if I run the testJdbc first, and followed by testSpark, both of them work.
public class Application { static private Logger log = Logger.getLogger(Application.class); public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("NMS Tuning Engine"); JavaSparkContext sc = new JavaSparkContext(conf); //testJdbc(sc); testSpark(sc); } static public void testSpark(JavaSparkContext sc) { //SparkContextBuilder.buildSparkContext("Simple Application", "local"); // One JVM can only have one Spark Context now Map<String, String> options = new HashMap<String, String>(); SQLContext sqlContext = new SQLContext(sc); String tableStr = "\"ACME:ENDPOINT_STATUS\""; String dataSrcUrl="jdbc:phoenix:luna-sdp-nms-01.davis.sensus.lab:2181:/hbase-unsecure"; options.put("zkUrl", dataSrcUrl); options.put("table", tableStr); log.info("Phoenix DB URL: " + dataSrcUrl + " tableStr: " + tableStr); DataFrame df = null; try { df = sqlContext.read().format("org.apache.phoenix.spark").options(options).load(); df.explain(true); } catch (Exception ex) { log.error("sql error: ", ex); } try { log.info ("Count By phoenix spark plugin: "+ df.count()); } catch (Exception ex) { log.error("dataframe error: ", ex); } } static public void testJdbc(JavaSparkContext sc) { Map<String, String> options = new HashMap<String, String>(); SQLContext sqlContext = new SQLContext(sc); if (sc == null || sqlContext == null || options == null) { log.info("NULL sc, sqlContext, or options"); } String qry2 = "(Select ENDPOINT_ID, CITY from \"ACME:ENDPOINT_STATUS\" Where city = 'ACME City')"; String dataSrcUrl="jdbc:phoenix:luna-sdp-nms-01.davis.sensus.lab:2181:/hbase-unsecure"; options.put("url", dataSrcUrl); options.put("dbtable", qry2); log.info("Phoenix DB URL: " + dataSrcUrl + "\nquery: " + qry2); DataFrame df = null; try { DataFrameReader dfRd = sqlContext.read().format("jdbc").options(options); if (dfRd == null) { log.error("NULL DataFrameReader Object dfRd in getEndPointDataByJdbc"); } df = dfRd.load(); df.explain(true); } catch (Exception ex) { log.error("sql error: ", ex); } try { log.info ("Count By Jdbc: "+ df.count()); } catch (Exception ex) { log.error("dataframe error: ", ex); } } }
By the way, here is how the HBase looks like when I list it.
hbase(main):031:0* list
TABLE
ACME:ENDPOINT_CONFIG
ACME:ENDPOINT_STATUS
LONG:ENDPOINTS
LONG:RADIOCHANNELS
LONG:REGIONINFORMATION
LONG:TGBSTATISTICS
SENSUS1:ENDPOINTS
SENSUS1:RADIOCHANNELS
SENSUS1:REGIONINFORMATION
SENSUS1:TGBSTATISTICS
SENSUS2:ENDPOINTS
SENSUS2:RADIOCHANNELS
SENSUS2:REGIONINFORMATION
SENSUS2:TGBSTATISTICS
SENSUS:ENDPOINTS
SENSUS:RADIOCHANNELS
SENSUS:REGIONINFORMATION
SENSUS:TGBSTATISTICS
SYSTEM.CATALOG
SYSTEM:CATALOG
SYSTEM:FUNCTION
SYSTEM:SEQUENCE
SYSTEM:STATS
TENANT
24 row(s) in 0.0090 seconds
=> ["ACME:ENDPOINT_CONFIG", "ACME:ENDPOINT_STATUS", "LONG:ENDPOINTS", "LONG:RADIOCHANNELS", "LONG:REGIONINFORMATION", "LONG:TGBSTATISTICS", "SENSUS1:ENDPOINTS", "SENSUS1:RADIOCHANNELS", "SENSUS1:REGIONINFORMATION", "SENSUS1:TGBSTATISTICS", "SENSUS2:ENDPOINTS", "SENSUS2:RADIOCHANNELS", "SENSUS2:REGIONINFORMATION", "SENSUS2:TGBSTATISTICS", "SENSUS:ENDPOINTS", "SENSUS:RADIOCHANNELS", "SENSUS:REGIONINFORMATION", "SENSUS:TGBSTATISTICS", "SYSTEM.CATALOG", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:SEQUENCE", "SYSTEM:STATS", "TENANT"]
Created 11-03-2016 09:31 PM
Looks like you have enabled namespace mapping. Check that the hbase conf dir is in the classpath. And you need to use ACME.xxxxx format for table names
Created 11-03-2016 09:31 PM
Looks like you have enabled namespace mapping. Check that the hbase conf dir is in the classpath. And you need to use ACME.xxxxx format for table names
Created 11-03-2016 09:37 PM
Yes, the namespace is enabled and hbase conf dir is in the classpath.:q
ACME.xxxxx format does not work either, actually even testJdbc complains with tableNotFound exception if using ACME.xxxxx format for table name.
Created 11-04-2016 05:08 AM
Try
String tableStr ="ACME.ENDPOINT_STATUS";
No double quotes and dot between schema name and table name. Hope that will work.
One more note: if you create table using double quotes, put schema and table name in quotes. Like
create table "ACME"."ENDPOINT_STATUS"
in this case you will need to quote table name like you do in your sample. It seems that there is a bug in phoenix code. If you quote them together, Phoenix creates a table with empty schema and the table name as "schema.table". You may check it by querying SYSTEM.CATALOG table.
Created 11-07-2016 05:08 PM
Thank you for the reply. After changing to create table by using
create table ACME.ENDPOINT_STATUS
The phoenix-spark plug in seems working. I also find some weird behavior,
If I do both the following
create table ACME.ENDPOINT_STATUS
create table "ACME:ENDPOINT_STATUS"
Both table shows up in phoenix, the first one shows as Schema ACME, and table name ENDPOINT_STATUS, but the later on shows as scheme none, and table name ACME:ENDPOINT_STATUS.
However, in HBASE, I only see one table ACME:ENDPOINT_STATUS. In addition, upsert in the table ACME.ENDPOINT_STATUS shows up in the other table, so is the other way around.
Namespace mapping is introduced in Phoenix 4.8, did Hortonworks backported the feature into HDP 2.5 and Phoenix 4.7 ?
Created 11-07-2016 06:37 PM
Phoenix supports the only way to define table with schema by using dot between. Colons are using internally by HBase and Phoenix does this transformation on its own. As I mentioned earlier Phoenix has a bug that both schema and table name should have double quotes, otherwise table created incorrectly with empty schema. Moreover on the HBase layer this cause ambiguous physical table mapping like you are seeing in your example. So, my recommendation is to use quotes for schema and table name separately or just do not use quotes.
And you are right, the feature was backported from 4.8.
Created 11-08-2016 10:07 PM
It is very clear, thanks