About smdas

smdas · ‎05-02-2021

Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak

smdas · ‎05-01-2021

Hello @bigdatanewbie Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster. Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information. Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful. Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues. - Smarak

smdas · ‎05-01-2021

Hello @Satya_Singh Thanks for using Cloudera Community. Based on the Post, HBase Master isn't progressing on StartUp owing to "hbase:namespace" being not Online. Your Team tried to assign the Region via HBCK2 yet the issue persists. Additionally, the RegionServer on which the "hbase:namespace" Region is marked for assignment is under Live & Dead as well. The approach used by your Team is correct with respect to using HBCK2. When your Team run the HBCK2 assign command, It should return a PID before terminating. The Output would be like [10], assuming "10" is the PID. Review the HMaster UI for the concerned PID & review the State. When an RS has Live & Dead Status, Review the StartUp Code for the Live & Dead RS. For Example, the RS "hbase-test-rc-5.openstacklocal,16020,1469419333913" StartUp Code is "1469419333913". We expect the Live RS StartUp Code would be recent & Dead RS StartUp Code would be Older with the "hbase:namespace" Region being tried to be assigned to the Live RS. Use the Epoch Converter via Link [1] to convert the StartUp Code to Human Readable Timestamp. If the PID State is fine with Region being assigned to the RS with Live StartUp Code, Review the concerned RS Logs to confirm if the Region Assignment has been received & any issues with opening the Region. Accordingly, Proceed with the troubleshooting. Do review & let us know your observations. - Smarak [1] https://www.epochconverter.com/

smdas · ‎05-01-2021

Hello @nilk Thanks for using Cloudera Community. Based on the Post, Your Team is experiencing frequent "Bad Health" alerting for various Services, which persists until Restart only to reappear after 1 Minute. It's likely the Health Test are failing owing to a persistent reason rather than intermittent cause. As you haven't shared the CM Version or CDH Version, Sharing a Link [1] which covers the various Health Test & their Causes. Each Alert have their reasoning & we can review the Alerting on CM UI while implementing the preventive measures. Link [2] shares a Post with similar context, wherein the Steps for each Service Health Alert would be different. Unfortunately, there is no Single Solution likely for each Service Alerting. As such, Review the Alerting context for each Service, Do post here if you have any queries concerning the reasoning for any specific alerting & proceed accordingly. - Smarak [1] https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_ht.html [2] https://community.cloudera.com/t5/Support-Questions/Bad-Health-of-cloudera-Managger/m-p/37161/highlight/true#M37913

smdas · ‎03-26-2021

Hello @Keshu While I am not familiar with Nifi to provide an exact answer, HBase allows VERSION to fetch all versions of a Table. Assuming a Table has been configured with VERSION => 5, We can use Command "scan 'Table_Name', {VERSIONS => 5}" to fetch all Versions from the Table. You may be familiar with the above, yet putting the Update, if you can adjust the Nifi Processor to query the HBase Table as above (If Nifi Processor offers such functionality). I shall allow the Nifi Gurus on the Community to get back to you with the exact requirement. - Smarak

smdas · ‎03-26-2021

Hello @vishal6196 It's been a while on the Post yet as far as I recall, the App was writing to 1 CF only. In short, WAL is used for each RegionServer & subsequently, the Writes arrives at MemStore based on CF demarcation at Region Level. From WALReader, We confirmed the WAL have entries for 1 CF only, naturally indicating the MemStore of the concerned CF would be populated only. Additionally, I don't recall Crash being observed. Are you facing similar concerns in your Environment. If Yes, Kindly share the following details in a New Post: What's the MemStore Flush failure trace from Logs, If the Problem is Persistent, Whether WALReader (Link [1]) shows Writes happening on all CF of the Regions & Count of CF of the Region. - Smarak [1] https://hbase.apache.org/book.html#hlog_tool.prettyprint

smdas · ‎03-25-2021

Hello @Priyanka26 Do let us know where you stand with the Current Post. I am aware you had issues with your Cluster courtesy of a Separate Post, yet we wish to follow-up to ensure the Post isn't left unattended from our side. Thanks, Smarak

smdas · ‎03-25-2021

Hi @Priyanka26 Thanks for the Update. In the 2nd Step, Your Team mentioned creating the required NameSpace & Tables. Yet, I would suggest Bulk-Loading i.e. CompleteBulkLoad Process as simply copying the Hfiles won't likely work. Additionally, the existing Hfiles would be part of Split/Compaction & ideally, I expect your Team would create Tables with 1 Region. As such, BulkLoad would gracefully handle such situations. For Customer facing issues like yours in earlier HBase v2.x HDP Release, We typically use BulkLoad. Yet, Pointing to the fact that your Team should upgrade to HDP v3.1.5 at minimum to avoid this issue in future. - Smarak

smdas · ‎03-22-2021

Hello @JB0000000000001 Once you are good with the Post & have no further queries, Kindly mark the Post as Solved to ensure we can mark the Post accordingly. - Smarak

smdas · ‎03-20-2021

Hello @JB0000000000001 Thank You for the kind words. I intended to reciprocate the level of detailing you posted in the Q, which was helpful for us. Based on your Update, Few pointers: The GC Flags indicates your Team is using CMS & as you mentioned, the GC Cycles JVMPause would trace the logging as you highlighted. Note that "No GCs detected" is observed when a JVMPause isn't GC induced. If your Team observe JVMPause tracing with "No GCs detected", the shared Community Post is a good pointer to evaluating Host Level concerns. 10GB RegionServer Heap is likely to not require 60 Seconds CleanUp, with 60 Seconds being the HBase-ZooKeeper Timeout, unless the 10GB is being filled frequently & we are having a continuous GC Cycle. In other words, if 1 FullGC is immediately followed by another FullGC (Visible in GC Logs), We wouldn't see 1 Large FullGC yet the cumulative impact is worth reviewing & require contemplation of the Eden-TenuredGeneration scale. While 3x Sharding than RegionServer is good for Parallelism, the Write Pattern matters based on the RowKey. If the RowKey of Table ensure Writes are being concentrated into Regions on 1 RegionServer, the concerned RegionServer would be a Bottleneck. While Writes is being carried out, Reviewing the concerned Table View via HMaster UI would offer insight into the same. The *out file of RegionServer would capture the OOME, yet the previous one & not the current one. Additionally, the JVMFlag can be adjusted to include +HeapDumpOnOutOfMemoryError & related parameters for HeapDump path. Link [1] covers the same. One pointer is to review the HMaster Logs for "Ephemeral" (Prefer Case Insensitive) associated tracing. The HMaster trace when an Ephemeral ZNode is being deleted & then, review the RegionServer (Whose Ephemeral ZNode was removed) Logs. This is based on the point wherein your team is (Maybe) reviewing the Spark Job Log for RegionServer unavailability & tracing the RS Logs. The HMaster approach is more convenient & accurate. Do keep us posted on how things goes. - Smarak [1] Command-Line Options - Troubleshooting Guide for HotSpot VM (oracle.com)

Online	Offline
Last Visited	‎01-12-2026 06:15 AM

Member Since	‎01-16-2018 09:55 AM
Last Visited	‎01-12-2026 06:15 AM
Posts	613
Kudos received	48

Cloudera Community

Re: Timeout: PBJ session not going idle

Re: Impact of Upgrading EKS from 1.29 to 1.31 on C...

Re: Capture airflow run duration

Re: How to enable IAM for apache airflow

Re: Apache Airflow can not connect to mssql 2008

Re: Hbase namespace table in not online

Re: HBase Java client connection timeout

Re: hbase Master startup cannot progress

Re: After successfully installed cloudera manager,...

Re: how to retrieve all versions of rows from hbas...

Re: Hbase Flush/writes not working for one of the ...

Re: One region for "prod.timelineservice.entity" h...

Re: Hbase namespace table in not online

Re: How to know why hbase regionserver fails?

Re: How to know why hbase regionserver fails?