Member since
01-05-2016
10
Posts
9
Kudos Received
0
Solutions
06-27-2016
12:17 PM
Hi @Minwoo Kang,
Yes, that solved the problem.
... View more
02-15-2016
07:55 PM
1 Kudo
Hi Guys,
Regarding, the region split issue it was because I was using the de default policy IncreasingToUpperBoundRegionSplitPolicy instead ConstantSizeRegionSplitPolicy and I believe that because my RSs had a bit more regions that they were supposed to have this was causing the unexpected splits.
Cheers
Pedro
... View more
02-15-2016
09:09 AM
Hi @Artem Ervits,
Thank you, Yes this is really trial&error job :).
Cheers
... View more
02-15-2016
12:41 AM
2 Kudos
Hi @Artem Ervits,
I have 15 RS (increasing to 20 soon), 150 regions (pre-split) and 30GB max size per region. I have read many different opinions regarding with the number of regions and region sizes but in general I've read "less regions" and "small regions" but both are incompatible. In this case, I preferred to have a bit larger regions even know that I'm going to pay for that during major compactions. Do you see more issues breaking the 10GB recommendation?
Updating configs does not happen very often but and before I switch off the balancer and perform a rolling restart again I would like to know the community opinion because I don't want to mess up with the cluster. Regarding the data locality, I agree with you that via replication is overkill. I'm going to have a slower rolling restart (e.g 1 machine per hour) this might give some time to the restarted RS to gain some data locality.
Cheers
Thanks
... View more
02-14-2016
07:30 PM
1 Kudo
Hi @Artem Ervits, Thanks for the reply. I ran a major compaction after restart and yes the data locality came back to normal but I'm wondering if I'm doing something wrong and if there is a way to keep data locality after restart a RegionServer. For clusters running real-time load it can be a big deal to update configurations in our cluster. Regarding with the region split I'm pretty sure that the regions are far way from the max allowed (hbase.hregion.max.filesize). My max region size is 30G and they have ~4.5GB compressed (7GB uncompressed) right now. This is my current hbase.hregion.max.filesize property: <property>
<name>hbase.hregion.max.filesize</name>
<value>32212254720</value>
<source>hbase-site.xml</source>
</property>
This is my is a snapshot of my region sizes: So, we are using uniform distribution for our rowkey for that reason if my region size was larger than hbase.hregion.max.filesize I was expecting to see all or almost all regions splitting but they were only 3 regions splitting out of 150 regions. I believe that I'm doing something wrong during a rolling restart of all region servers because we might have other conditions for a region split. Cheers
Thank you Pedro
... View more
02-14-2016
12:32 PM
3 Kudos
Hi, I noticed that after performing a rolling restart the data locality for the entire cluster goes down to 20% which is bad and for realtime applications this can be a nightmare. I've read here that we should switch off the balancer before perform a manual rolling restart on HBase. However, I used the Ambari rolling restart and I didn't see any reference to the balancer in the documentation. Maybe the balancer is not the issue, what is the safest way to perform a rolling restart on all region servers but keeping the data locality at least above 75%. Is there any option on Ambari to take care of that before a RS Rolling Restart.
Another issue that I noticed is that some regions have split during the Rolling Restart but they are bit far for being full. Any insights? Thank you,
Cheers
Pedro
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache HBase
01-06-2016
12:12 PM
Hi @asinghal,
It worked perfectly.
Thanks
... View more
01-05-2016
05:21 PM
@Artem Ervits, Sure!
Thanks
... View more
01-05-2016
05:11 PM
Hi @Artem Ervits,
Thanks for the info. I was using the HA master for testing. Regarding with the full restart you are right. I followed Ambari which after we change any configuration it asks for a restart of all "affected" components and I clicked the button :). Is Ambari doing a proper rolling restart on this case? I know that it does it when we click "Restart All Region Servers".
I have done full restarts with Ambari before but this problem has only started after I introduced local indexes. I need to dig a bit more about this problem. Thanks
... View more
01-05-2016
01:17 PM
2 Kudos
Hi Guys, I have been testing out the Phoenix Local Indexes and I'm facing an issue after restart the entire HBase cluster.
Scenario: I'm using Ambari 2.1.2 and HDP 2.3 using Phoenix 4.4 and HBase 1.1.1. My test cluster contains 10 machines and the main table contains 300 pre-split regions which implies 300 regions on local index table as well. To configure Phoenix I'm following this tutorial. When I start a fresh cluster everything is just fine, the local index is created and I can insert data and query it using the index. The problem comes when I need to restart the cluster to update some configurations in that moment I'm not able to restart the cluster anymore.
Most of the servers have exceptions like this one which looks that they are getting into a state where some region servers are waiting from regions that are not available yet in other region servers. (Kinda of a deadlock) INFO [htable-pool7-t1] client.AsyncProcess: #5, table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e. is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null, retrying after=20001ms, replay=1ops
INFO [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1] client.AsyncProcess: #3, waiting for 2 actions to finish
INFO [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2] client.AsyncProcess: #4, waiting for 2 actions to finish When the server is having these exceptions I can see this message (I checked the size of this file and it is very small): Description: Replaying edits from hdfs://.../recovered.edits/0000000000000464197
Status: Running pre-WAL-restore hook in coprocessors (since 48mins, 45sec ago) Another interesting thing that I noticed is the empty coprocessor list for the servers that are stuck. For other hand, HBase master goes down after logging some of these messages: GeneralBulkAssigner: Failed bulking assigning N regions Any help would be awesome 🙂 Thank you Pedro
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix