Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Which gives better performance for both writes and reads, Phoenix thick client or thin client?

Solved Go to solution
Highlighted

Which gives better performance for both writes and reads, Phoenix thick client or thin client?

New Contributor

Hi, I'm using Azure HDInsight HBase cluster, My use case requires heavy writes and reads. To interact with HBase cluster I have the following options, 1) Phoenix thick client which connects using zookeeper quorum. 2) Phoenix thin client with Phoenix Query Servers. But the problem here is Phoenix Query servers are running on region nodes. I'm not sure how to balance the load between all region nodes. 3) Azure HDInsight HBase exposes a REST API to interact with Phoenix Query servers (Not sure if this is the load balancer for Phoenix Query servers running on region nodes) https://<DOMAIN>.azurehdinsight.net/hbasephoenix/. Problem here is, I'm unable to find API Documentation for this. Please suggest me which option would be performance effective on both read and writes. I'm ok with following two different ways for read and write, if they provide higher performance. Performance is my primary requirement.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html

The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.

On the read side, I have not noticed any discernible difference in the performance between the two approaches.

Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.

9 REPLIES 9

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

New Contributor
,

You can use a load balancer between the client and region servers. I do not have benchmark for read/write comparisons for Phoenix thick and thin client. I would assume they are compatible. It depends on how easy to implement/deploy your application. Phoenix thick client uses JDBC connection to connect to Hbase. The overhead should be very basic. If your application can afford read delay, you can use one cluster for read and another cluster for write, and set up replication between them.

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

In general the normal client is faster unless your client server is heavily CPU constrained. The PQS essentially is a thrift proxy for the thick client so you have additional effort and latency. ( Esp. if you have larger result sets which are read in parallel from all regions by the thick client but sucked through a single pipe from the PQS )

If you can connect the clients directly to the Region Servers and the clients are not heavily CPU constrained I would go with the normal client.

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

Calling it a "thrift proxy" is a bit inaccurate (Apache Thrift is not in the picture at all).

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

@Josh Elser

Sorry about that :-). Protobuf?

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

:) no worries. Just wanted to avoid misinformation. Calling PQS a "proxy server" is definitely the best phrase I can come up with. It uses protobuf to accomplish this, but users don't really have to be aware that's happening (so I tend to not mention it unless explaining how it works).

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html

The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver.

On the read side, I have not noticed any discernible difference in the performance between the two approaches.

Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

New Contributor

Hi @Josh Elser, I'm using Phoenix 4.4.0. I have decided to go with Thick client for writes. For read operations, I tried to use PQS, but it was not successful, since I have some ARRAY values as part of my schema. So again even for reads I had to go with thick client, But here Thick client takes some time to establish connection (including few retries every time). Kindly provide your inputs on this retry problem. sample retry log:

Closing master protocol: MasterService
Closing zookeeper sessionid=0x1565678c9ac097f
Session: 0x1565678c9ac097f closed
EventThread shut down

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

ARRAY support is a known deficiency: https://issues.apache.org/jira/browse/CALCITE-1050

I'm not sure what the "retry problem" is you're referring to. The Phoenix thick client will take a moment to connect initially (to ensure that the Phoenix system tables are properly created). The statements you provided are not errors, but a sign that Phoenix is doing this.

Re: Which gives better performance for both writes and reads, Phoenix thick client or thin client?

New Contributor

I Find phoenix write is too slow can you help me? Thanks ?

My post is here https://community.hortonworks.com/questions/76862/phoenix-write-is-too-slow.html

Don't have an account?
Coming from Hortonworks? Activate your account here