About elserj

elserj · ‎07-13-2016

The Phoenix Query Server API documentation can be found on the Apache Calcite website. PQS is essentially a branding of the Avatica Server. https://calcite.apache.org/avatica/docs/protobuf_reference.html The write performance of the Thin Driver with PQS is very close to the performance of the Thick driver as long as you as the client is using the batch-oriented APIs to write data (https://calcite.apache.org/avatica/docs/protobuf_reference.html#prepareandexecutebatchrequest). These map to the executeBatch() API calls on PreparedStatement. I have done performance evaluation which showed that, when using these API calls, performance between thick and thin is roughly equivalent. Non-batch API calls through the thin driver were roughly 30% slower than the thick driver. On the read side, I have not noticed any discernible difference in the performance between the two approaches. Overall, the thick driver is likely going to perform better than the thin driver because it has to do less work, but the gap is small. Depending on the physical deployment of your application, deploying multiple PQS instances on "beefier" server nodes than lightweight "edge" nodes, the thin driver might result in better overall performance.

elserj · ‎07-13-2016

Calling it a "thrift proxy" is a bit inaccurate (Apache Thrift is not in the picture at all).

elserj · ‎07-13-2016

The size of a table, in bytes, is not necessarily tied to the number of regions. For example, a change in configuration might cause more or less regions for the same amount of data. I don't have any definitive explanation why you saw the number of regions spike to 27; it might have just been transient. The number of regions likely increased from 5 to 17 due to splitting of the regions in this table as a part of the compaction. You can investigate the RegionServer and Master logs on your cluster for the given table to understand if the regions underwent any splits. There are many reasons that the number of regions might have increased -- it is hard to definitively say why given the information you provided so far. I would not be worried about 17 regions instead of only 5.

elserj · ‎07-11-2016

I would highly recommend using Ambari to install your cluster to avoid future issues. It looks like the ZooKeeper nodes cannot communicate with one another. Are 10.0.1.103 and 10.0.1.105 the proper IP addresses? Can the node which you copied the exception from reach the nodes specified by those IP address? Have you inspected if the other nodes have errors?

elserj · ‎07-08-2016

Thanks, @Joshua Adeleke. Like in the other question linked by Srai, if you know the specific file(s) your job is reading, you could try to use the `hdfs debug recoverLease` command on those files. Normally, a lease on an HDFS file will expire automatically if the writer abnormally goes away without closing the file. If you are sure no client is trying to write the file, you could try the recoverLease to force the NN to let this operation succeed.

elserj · ‎07-08-2016

Can you share the hdfs fsck command you ran? It definitely sounds like HDFS is not healthy.

elserj · ‎07-07-2016

Loading jars out of HDFS, as enabled by HBASE-1936, would be an alternative to copying the jars to the local filesystem on each node running HBase.

elserj · ‎07-06-2016

The book will cover what properties to set in hbase-site.xml which you can do via Ambari. However, it will depend on you copying out the necessary jar(s) to your cluster as well (/usr/hdp/current/hbase-client/lib should do the trick).

elserj · ‎07-02-2016

It's just a "bug". The warning message just needs to be suppressed. This will be fixed in the final shipped version of HDP 2.5. I assume the sandbox just grabbed an earlier version that has this.

elserj · ‎07-01-2016

60 values of CODNRBEENF per day or in total? If you have 60 unique CODNRBEENF per day, leading with that column would be better. Otherwise, the date is probably better over time. If you are also querying on CODINTERNO and CODTXF (with FECHAOPRCNF and CODNRBEENF), then it makes sense to include them. It is not a problem to have four columns in the primary key constraint.

Online	Offline
Last Visited	‎07-01-2022 02:44 PM

Member Since	‎07-17-2019 08:58 AM
Last Visited	‎07-01-2022 02:44 PM
Posts	738
Kudos received	429

Cloudera Community

Re: Why can't Object Stores like Amazon S3 be used...

Re: Not a host:port pair: PBUF, how to resolve?

Re: versioning question in hbase

Re: Phoenix query call from java on larger data se...

Re: Revoke permissions to a superuser on Hbase

Re: Which gives better performance for both writes...

Re: Which gives better performance for both writes...

Re: Number of regions increases after major compac...

Re: HBase installation issue in cluster

Re: Unable to load into HBase due to error with da...

Re: Unable to load into HBase due to error with da...

Re: How do I add custom HBase co-processors in HDP...

Re: How do I add custom HBase co-processors in HDP...

Re: Enabled Phoenix in Sandbox HDP 2.5 and got err...

Re: multiple row key hbase