- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Which gives better performance for both writes and reads, Phoenix thick client or thin client?
- Labels:
-
Apache HBase
-
Apache Phoenix
Created ‎07-13-2016 04:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm using Azure HDInsight HBase cluster, My use case requires heavy writes and reads. To interact with HBase cluster I have the following options, 1) Phoenix thick client which connects using zookeeper quorum. 2) Phoenix thin client with Phoenix Query Servers. But the problem here is Phoenix Query servers are running on region nodes. I'm not sure how to balance the load between all region nodes. 3) Azure HDInsight HBase exposes a REST API to interact with Phoenix Query servers (Not sure if this is the load balancer for Phoenix Query servers running on region nodes) https://<DOMAIN>.azurehdinsight.net/hbasephoenix/. Problem here is, I'm unable to find API Documentation for this. Please suggest me which option would be performance effective on both read and writes. I'm ok with following two different ways for read and write, if they provide higher performance. Performance is my primary requirement.
Created ‎07-13-2016 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html
The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.
On the read side, I have not noticed any discernible difference in the performance between the two approaches.
Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.
Created ‎07-13-2016 06:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use a load balancer between the client and region servers. I do not have benchmark for read/write comparisons for Phoenix thick and thin client. I would assume they are compatible. It depends on how easy to implement/deploy your application. Phoenix thick client uses JDBC connection to connect to Hbase. The overhead should be very basic. If your application can afford read delay, you can use one cluster for read and another cluster for write, and set up replication between them.
Created ‎07-13-2016 08:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general the normal client is faster unless your client server is heavily CPU constrained. The PQS essentially is a thrift proxy for the thick client so you have additional effort and latency. ( Esp. if you have larger result sets which are read in parallel from all regions by the thick client but sucked through a single pipe from the PQS )
If you can connect the clients directly to the Region Servers and the clients are not heavily CPU constrained I would go with the normal client.
Created ‎07-13-2016 02:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Calling it a "thrift proxy" is a bit inaccurate (Apache Thrift is not in the picture at all).
Created ‎07-21-2016 11:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry about that :-). Protobuf?
Created ‎07-21-2016 03:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
🙂 no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).
Created ‎07-13-2016 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html
The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.
On the read side, I have not noticed any discernible difference in the performance between the two approaches.
Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.
Created ‎08-08-2016 05:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Josh Elser, I'm using Phoenix 4.4.0. I have decided to go with Thick client for writes. For read operations, I tried to use PQS, but it was not successful, since I have some ARRAY values as part of my schema. So again even for reads I had to go with thick client, But here Thick client takes some time to establish connection (including few retries every time). Kindly provide your inputs on this retry problem. sample retry log:
Closing master protocol: MasterService Closing zookeeper sessionid=0x1565678c9ac097f Session: 0x1565678c9ac097f closed EventThread shut down
Created ‎08-08-2016 02:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ARRAY support is a known deficiency: https://issues.apache.org/jira/browse/CALCITE-1050
I'm not sure what the "retry problem" is you're referring to. The Phoenix thick client will take a moment to connect initially (to ensure that the Phoenix system tables are properly created). The statements you provided are not errors, but a sign that Phoenix is doing this.
Created ‎01-07-2017 12:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I Find phoenix write is too slow can you help me? Thanks ?
My post is here https://community.hortonworks.com/questions/76862/phoenix-write-is-too-slow.html
