Support Questions

sasikumar · ‎07-13-2016

Hi, I'm using Azure HDInsight HBase cluster, My use case requires heavy writes and reads. To interact with HBase cluster I have the following options, 1) Phoenix thick client which connects using zookeeper quorum. 2) Phoenix thin client with Phoenix Query Servers. But the problem here is Phoenix Query servers are running on region nodes. I'm not sure how to balance the load between all region nodes. 3) Azure HDInsight HBase exposes a REST API to interact with Phoenix Query servers (Not sure if this is the load balancer for Phoenix Query servers running on region nodes) https://<DOMAIN>.azurehdinsight.net/hbasephoenix/. Problem here is, I'm unable to find API Documentation for this. Please suggest me which option would be performance effective on both read and writes. I'm ok with following two different ways for read and write, if they provide higher performance. Performance is my primary requirement.

elserj · ‎07-13-2016

The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html

The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.

On the read side, I have not noticed any discernible difference in the performance between the two approaches.

Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.

View solution in original post

ashu1 · ‎07-13-2016

,

You can use a load balancer between the client and region servers. I do not have benchmark for read/write comparisons for Phoenix thick and thin client. I would assume they are compatible. It depends on how easy to implement/deploy your application. Phoenix thick client uses JDBC connection to connect to Hbase. The overhead should be very basic. If your application can afford read delay, you can use one cluster for read and another cluster for write, and set up replication between them.

bleonhardi · ‎07-13-2016

In general the normal client is faster unless your client server is heavily CPU constrained. The PQS essentially is a thrift proxy for the thick client so you have additional effort and latency. ( Esp. if you have larger result sets which are read in parallel from all regions by the thick client but sucked through a single pipe from the PQS )

If you can connect the clients directly to the Region Servers and the clients are not heavily CPU constrained I would go with the normal client.

elserj · ‎07-13-2016

Calling it a "thrift proxy" is a bit inaccurate (Apache Thrift is not in the picture at all).

bleonhardi · ‎07-21-2016

@Josh Elser

Sorry about that :-). Protobuf?

elserj · ‎07-21-2016

🙂 no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).

elserj · ‎07-13-2016

The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html

The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.

On the read side, I have not noticed any discernible difference in the performance between the two approaches.

Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.

sasikumar · ‎08-08-2016

Hi @Josh Elser, I'm using Phoenix 4.4.0. I have decided to go with Thick client for writes. For read operations, I tried to use PQS, but it was not successful, since I have some ARRAY values as part of my schema. So again even for reads I had to go with thick client, But here Thick client takes some time to establish connection (including few retries every time). Kindly provide your inputs on this retry problem. sample retry log:

Closing master protocol: MasterService
Closing zookeeper sessionid=0x1565678c9ac097f
Session: 0x1565678c9ac097f closed
EventThread shut down

elserj · ‎08-08-2016

ARRAY support is a known deficiency: https://issues.apache.org/jira/browse/CALCITE-1050

I'm not sure what the "retry problem" is you're referring to. The Phoenix thick client will take a moment to connect initially (to ensure that the Phoenix system tables are properly created). The statements you provided are not errors, but a sign that Phoenix is doing this.

panbclc · ‎01-07-2017

I Find phoenix write is too slow can you help me? Thanks ?

My post is here https://community.hortonworks.com/questions/76862/phoenix-write-is-too-slow.html

Cloudera Community

Support Questions

Which gives better performance for both writes and reads, Phoenix thick client or thin client?

Phoenix JDBC Client Setup

Using Python Client to read and write data to Kafk...

Write / Read Parquet File in Spark

Cannot connect to Phoenix Query Server using JDBC ...

Impala writes on Iceberg

HBase client application best practices

How to write topology with the new kafka spout cli...

Accessing AMS Hbase using Phoenix client

Download Client Configs

Apache Phoenix Performance Testing Tools