Created 07-13-2016 04:34 AM
Hi, I'm using Azure HDInsight HBase cluster, My use case requires heavy writes and reads. To interact with HBase cluster I have the following options, 1) Phoenix thick client which connects using zookeeper quorum. 2) Phoenix thin client with Phoenix Query Servers. But the problem here is Phoenix Query servers are running on region nodes. I'm not sure how to balance the load between all region nodes. 3) Azure HDInsight HBase exposes a REST API to interact with Phoenix Query servers (Not sure if this is the load balancer for Phoenix Query servers running on region nodes) https://<DOMAIN>.azurehdinsight.net/hbasephoenix/. Problem here is, I'm unable to find API Documentation for this. Please suggest me which option would be performance effective on both read and writes. I'm ok with following two different ways for read and write, if they provide higher performance. Performance is my primary requirement.
Created 07-13-2016 02:24 PM
The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html
The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.
On the read side, I have not noticed any discernible difference in the performance between the two approaches.
Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.
Created 07-13-2016 06:23 AM
You can use a load balancer between the client and region servers. I do not have benchmark for read/write comparisons for Phoenix thick and thin client. I would assume they are compatible. It depends on how easy to implement/deploy your application. Phoenix thick client uses JDBC connection to connect to Hbase. The overhead should be very basic. If your application can afford read delay, you can use one cluster for read and another cluster for write, and set up replication between them.
Created 07-13-2016 08:04 AM
In general the normal client is faster unless your client server is heavily CPU constrained. The PQS essentially is a thrift proxy for the thick client so you have additional effort and latency. ( Esp. if you have larger result sets which are read in parallel from all regions by the thick client but sucked through a single pipe from the PQS )
If you can connect the clients directly to the Region Servers and the clients are not heavily CPU constrained I would go with the normal client.
Created 07-13-2016 02:17 PM
Calling it a "thrift proxy" is a bit inaccurate (Apache Thrift is not in the picture at all).
Created 07-21-2016 11:19 AM
Sorry about that :-). Protobuf?
Created 07-21-2016 03:04 PM
🙂 no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).
Created 07-13-2016 02:24 PM
The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html
The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.
On the read side, I have not noticed any discernible difference in the performance between the two approaches.
Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.
Created 08-08-2016 05:53 AM
Hi @Josh Elser, I'm using Phoenix 4.4.0. I have decided to go with Thick client for writes. For read operations, I tried to use PQS, but it was not successful, since I have some ARRAY values as part of my schema. So again even for reads I had to go with thick client, But here Thick client takes some time to establish connection (including few retries every time). Kindly provide your inputs on this retry problem. sample retry log:
Closing master protocol: MasterService Closing zookeeper sessionid=0x1565678c9ac097f Session: 0x1565678c9ac097f closed EventThread shut down
Created 08-08-2016 02:19 PM
ARRAY support is a known deficiency: https://issues.apache.org/jira/browse/CALCITE-1050
I'm not sure what the "retry problem" is you're referring to. The Phoenix thick client will take a moment to connect initially (to ensure that the Phoenix system tables are properly created). The statements you provided are not errors, but a sign that Phoenix is doing this.
Created 01-07-2017 12:00 PM
I Find phoenix write is too slow can you help me? Thanks ?
My post is here https://community.hortonworks.com/questions/76862/phoenix-write-is-too-slow.html