About smdas

smdas · ‎03-15-2021

Hello @Chandresh Thanks for the Update. The Link shared by you deals with improvement from MemStore Flush & eventual HFiles Compaction perspective. Currently, I am unfamiliar with the Blockers impacting your Environment. For Example: If MemStore Writes are being delayed, We can consider reviewing the Flusher Thread. Similarly, if Compaction is a concern (Courtesy of Too-Many-Hfiles), Reviewing the Thread Count would help. Similarly, if MemStore Writes are being blocked owing to Too-Many-WALs, It's worth checking the "hbase.hstore.flusher.count" & "hbase.regionserver.max.logs". Most importantly, How's HDFS Performance & any Hot-Spotting. In short, Evaluating Read & Write Performance collectively would be a large scope for your Team. I would recommend to start with either Read or Write, All Tables or Specific Table, All RegionServer vs Specific RegionServer & proceed accordingly. - Smarak

smdas · ‎03-14-2021

Hello @Chandresh Thanks for using Cloudera Community. Based on the Synopsis, your Team is observing Cluster Level impact for 1 RS undergoing GC Cycle & becoming unavailable. Things are normal after the concerned RS Restart. There are 2 aspect here: (I) RS undergoing longer GC Cycle, (II) HBase Cluster Un-usability. Ideally, HBase Cluster won't become Un-usable if 1 RS is impacted. Having said that, if the RS is Unresponsive, the Query RPC is handled on the concerned RS would be delayed & ensure the Query responses are delayed or timing out. Under no circumstances, Phoenix accessibility to HBase is impacted. Please confirm whether your Team Phoenix Queries are timing out or delayed when 1 RS is busy in GC Cycle (Different from Phoenix being unable to connect to HBase). If the concerned RS is hosting "hbase:meta", the same is feasible. As such, We need to focus on the RS undergoing GC for longer duration to mitigate any possible scenarios. Have shared a Blog via Link [1] on GC for HBase. Additionally, Check if the RS GC Cycle are causing ZK Timeout or the GC Time was lesser than ZK Timeout. - Smarak [1] https://blog.cloudera.com/tuning-java-garbage-collection-for-hbase/

smdas · ‎03-14-2021

Hello @Rjkoop Thanks for posting the Update & confirming the Q has been resolved. In short, the Article requires us to set the 3 Configurations you specified ["hbase.security.exec.permission.checks", "hbase.security.access.early_out", "hfile.format.version"] along with enabling the "HBase Secure Authorization" (Mandatory for "HBase Cell-Level ACLs" enabling). Additionally, Link [1] documents the ACL functionality in detail as well. As the Post is Solved, I shall mark the same likewise as well. - Smarak [1] https://hbase.apache.org/book.html#hbase.accesscontrol.configuration

smdas · ‎03-13-2021

Hello @WayneWang Hope you are doing well. I wish to follow-up with you concerning the issue posted. Kindly let us know if the concerned issue has been resolved & the steps followed. This would help us to proceed accordingly on the Post. - Smarak

smdas · ‎03-13-2021

Hello @CaptainJa Thanks for your Update. Based on your review, the "hadoop-acl" enforcer is being delayed to be tracked via Ranger Audit UI while other Audits are likely appearing immediately. As far as I know, the Audit Framework from any Service to Solr is same, likely indicating the suspicions raised by you i.e. the "hadoop-acl" events are being buffered prior to being sent to Solr for Indexing. Currently, I am unfamiliar with any Configuration controlling the same yet wish to confirm if the HDFS Audit Logs or InfraSolr Logs are reporting any issues, which may point to any concerns. I was under the impression that Solr may be the Bottleneck for RangerAudit Lagging yet the synopsis appears to be impacting the "hadoop-acl" alone. - Smarak

smdas · ‎03-13-2021

Hello @Kenzan Thanks for using Cloudera Community. Based on the synopsis, Your Team have 1 RegionServer being allocated no Regions. Deleting the RegionServer & adding the same afresh doesn't help either. The 1st Screen-Shot showing the Log shows the RegionServer received a ZooKeeper expiry. It's likely the RegionServer experienced a ZooKeeper Timeout or the Master didn't receive any Heartbeat for the same. As 1 RegionServer is impacted, Review the Host Level concerns (CPU/Memory) if the RegionServer is being aborted (Likely, No relationship with Zero Region assignment). Coming the Zero Region assignment, Enable TRACE Logging for HMaster Balancer Thread or briefly enable the Complete HMaster Trace Logging (HMaster UI > LogLevel > "org.apache.hadoop.hbase" & TRACE for "Set Log Level"). This would enable the TRACE Logging for HMaster Service & capture any Balancer associated tracing, which would confirm the reasoning for AssignmentManager skipping the Region from any Region assignment. Once the TRACE Logging is captured & a Balancer Run has been captured to confirm the reasoning for Region Balancing being skipped, We can set the Logging to INFO again. - Smarak

smdas · ‎03-13-2021

Hello @Priyanka26 Thanks for using Cloudera Community. Based on the synopsis, You have 1 Region of "prod.timelineservice.entity" Table in Transition (RIT). You have tried to perform an "assign" Command, "snapshot" Command yet they are timing out. You have raised queries on HBCK2 usage as a contrast to HBCK1. Coming to the RIT, Please confirm the Region State (CLOSING, FAILED_CLOSE, OPENING, FAILED_OPEN). Accordingly, Review the reasoning for the Region State in the HMaster & RegionServer Logs using the RegionID. As YARN Timeline Service uses 1 RegionServer JVM, We have to check the Logs on 1 Host only. Once we confirm the reasoning for the RIT, We can discuss possible mitigation steps. For HBCK, HBCK2 Tool offers the functionalities of HBCK1 minus any "-fix" Command, partly because the HBCK2 Tool offers the functionalities of "-fix" individually as documented in Link [1]. With newer HBase release, most of the Command listed under the Link are available. It's likely the HBase Version being used by your Team is not having all required HBCK2 Command support. The Link list the HBase Version compatible with the Command as well. - Smarak [1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

smdas · ‎03-01-2021

Hello @CaptainJa Wish to follow-up with you on the concerned Topic & see how things are for your team. If there are no further issues, Kindly mark the Post as Solved. If there are queries, Please share them. - Smarak

smdas · ‎03-01-2021

Hello @WayneWang Thanks for using Cloudera Community. The issue being faced by your Team is a Table having 1 "Failed Region" i.e. 327a6ed6b00c95e63346f6d725147952. Your Team have tried HBCK fix & repair (As this is HBase v1.x), yet the issue persists. We can try the following Steps as well: Restarting the Masters after running any HBCK fix Command As RIT are maintained in ZNode, Try removing the RIT ZNode & restart the Master again. For the Region, Would request you to review the RegionServer (Wherein the Region is in FAILED State) & confirm the reasoning for the same as well. - Smarak

smdas · ‎02-03-2021

Hello @Aco Thanks for the reply. Will it be feasible for your Team to uncheck "is_hbase_system_service_launch", followed by the restart of the YARN Timeline Service & confirm the Status. - Smarak

Online	Offline
Last Visited	‎08-20-2025 12:14 AM

Member Since	‎01-16-2018 09:55 AM
Last Visited	‎08-20-2025 12:14 AM
Posts	613
Kudos received	48

Cloudera Community

Re: Timeout: PBJ session not going idle

Re: Impact of Upgrading EKS from 1.29 to 1.31 on C...

Re: Capture airflow run duration

Re: How to enable IAM for apache airflow

Re: Apache Airflow can not connect to mssql 2008

Re: Apache Phoenix / Hbase - Stops connecting If o...

Re: Apache Phoenix / Hbase - Stops connecting If o...

Re: Cloudera 6.2.1 Hbase cell level security

Re: Hbase - How to fix failed regions

Re: Ranger Audit Lagging

Re: Only one HBase Regionserver has no region & n...

Re: One region for "prod.timelineservice.entity" h...

Re: Ranger Audit Lagging

Re: Hbase - How to fix failed regions

Re: ATSv2 HBase Application and TimeLineService v2...