Member since
01-16-2018
366
Posts
23
Kudos Received
47
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
94 | 04-14-2022 01:29 AM | |
153 | 04-14-2022 01:12 AM | |
94 | 04-14-2022 12:58 AM | |
174 | 03-29-2022 01:13 AM | |
192 | 03-29-2022 01:00 AM |
05-21-2021
03:43 AM
Hello @priyanshu_soni You can skip the "Timestamp" part as the same is inserted by HBase implicitly. I tried the same Query as you, excluding the Timestamp & the same was Successful: hbase(main):018:0> put 'Table_X1','125','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}'
Took 0.0077 seconds
hbase(main):019:0> scan 'Table_X1'
ROW COLUMN+CELL
125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
1 row(s)
Took 0.0057 seconds As you may see above, the "timestamp" field corresponds to the Epoch Timestamp of the Inserted Row Time of Operation. If you wish to explicitly specify the Timestamp, You can include a EpochTime as shared below: hbase(main):020:0> put 'Table_X1','126','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}',1621593487680
Took 0.0202 seconds
hbase(main):021:0> scan 'Table_X1'
ROW COLUMN+CELL
125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
126 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
2 row(s)
Took 0.0071 seconds Let us know if you have any issues with the Put Operation. - Smarak
... View more
05-21-2021
02:57 AM
1 Kudo
Hello @JB0000000000001 Thanks for using Cloudera Community. You wish to store a Table in Cache via a Read-Once Model. As you stated, RowCounter would count the Rows & skip caching the data in the Cache as well, Even if you have a Block Cache configured with sufficient capacity to manage the Table's Data into the Cache, their LRU implementation would likely cause the Object to be evicted. I haven't used any other Cache implementation outside of LRU to comment. If you have a Set of Tables meeting such requirement, We can use RegionServer Grouping for the concerned Table's Region & ensure the BlockCache of the concerned RegionServer Group is used for Selective Tables' Regions, thereby reducing the impact of LRU. Test using the "IN_MEMORY" Flag for the Table's Column Families, which would try to persist the Column's Family Data as long as it can without any guarantee. While an Old Blog, yet [1] offers a few Practices implemented by a Heavy-HBase-Using Customer of Cloudera, written by their Employees' experience. As you are already familiar with various Heap Option, there may be no new Information for you yet I am sharing for closing any loop. Hope the above helps. Do let us know if your Team implemented any approach so as to benefit the wider audience looking at Similar Use-Case. - Smarak [1] http://blog.asquareb.com/blog/2014/11/21/leverage-hbase-cache-and-improve-read-performance/
... View more
05-21-2021
02:30 AM
Hello @jmag2304 Thanks for using Cloudera Community. Based on the Post, Your team is encountering "Resultset is Closed" after 10 minutes, while your Team is running multiple queries to fetch data from various Tables one after another, with each Table having ~4M Rows. From [1] covering Phoenix Configuration, We observe there is 1 Parameter using 10 Minutes Default i.e. "phoenix.query.timeoutMs", however the same shouldn't impact a Session with multiple queries. Also, You have increased the same without any success. Wish to verify whether you observed any concerning message in the Phoenix Query Server Logs while the concerned Exception is encountered. Whether you have encountered such issues with Phoenix Thick Client (as compared to Phoenix Thin Client using Phoenix Query Server). Hi @Sunny93, Thanks for sharing your Team's experience concerning the issue. Kindly assist by confirming the Server Side Parameter being referred here. It would assist @jmag2304 & fellow Community Members for such issues. - Smarak [1] https://phoenix.apache.org/tuning.html
... View more
05-21-2021
02:17 AM
Hello @dcy Thanks for using Cloudera Community. Based on the Synopsis, the Master isn't starting for HBase after you turned off the Computer & started HBase again. You haven't stated the Version of HBase yet I am suspecting the WAL of the RegionServers involved have issues, causing the concerned issue. Verify whether the HDFS Fsck Report on the WAL & MasterProcWAL files is Healthy. When HBase starts, the WAL of the RegionServers are Split to be replayed & we suspect the WAL Files are having issues, causing the concerned "Cannot Seek After EoF". As you mentioned the Setup being on a Computer, Try Sidelining the WAL Directory of RegionServer(s) & MasterProcWALs to prevent any replay of WAL & any Master Procedures, followed by restarting the HBase Service. The Location of of WAL & MasterProcWAL would be {hbase-rootdir}/WALs & {hbase-rootdir}/MasterProcWALs. Note that Sidelining the WAL have the possibility of Data Loss, if any WAL contains Data which isn't persisted to Disk yet. Kindly review & let us know if the above works. - Smarak
... View more
05-21-2021
02:03 AM
Hello @bigdatanewbie Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP". As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed. As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved. Thanks for using Cloudera Community. - Smarak
... View more
05-11-2021
06:32 AM
Hello @bigdatanewbie Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link [1] is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB). - Smarak [1] https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Clc6CAC
... View more
05-10-2021
12:30 AM
Hello @sakitha Thanks for using Cloudera Community & we hope to assist you in your Big Data Learning. To your Queries, Please find the required details below: (I) When you are running the Job in Client Mode (Like Spark-Shell), the Driver runs on the Local Node wherein the Job is being executed. As such, the Driver Logs is printed in the Console itself. As you mentioned YARN Mode, the Application Master & the Executors are being launched in NodeManagers. In Cluster Mode, the Driver is launched in Application Master JVM & the Driver Logs is captured in the Application Master Logs. (II) Yes, the 2 Directories specified by your Team refers to the Event Logs. You haven't mentioned whether you are using any Orchestration Tool (Ambari, CM). As such, the Log4j needs to be edited to reflect the same. Link [1] refers to a Topic with similar ask. (III) In Spark on YARN Mode, there is 3 Set of Logs: Spark Event Logs from the Event Log Directory (This is the Source of the Information for Spark UI), YARN Application Logs. You can fetch the same via CLI with the Application ID as shared via [2], The Logging Directory "/var/log" holds the Service based Logs like NodeManager, ResourceManager, DataNodes etc. If we assume any Service Level issue impacts the Job, We can review the Service Logs within the concerned Directory. Kindly review & let us know if your ask is answered. Else, Do post your queries & we shall assist you. - Smarak [1] https://stackoverflow.com/questions/32001248/whats-the-difference-between-spark-eventlog-dir-and-spark-history-fs-logdirecto/33554588 [2] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-operating-system/content/use_the_yarn_cli_to_view_logs_for_running_applications.html
... View more
05-02-2021
12:02 AM
Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak
... View more
05-02-2021
12:02 AM
Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak
... View more
05-01-2021
11:46 PM
Hello @bigdatanewbie Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster. Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information. Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful. Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues. - Smarak
... View more
05-01-2021
11:28 PM
Hello @Satya_Singh Thanks for using Cloudera Community. Based on the Post, HBase Master isn't progressing on StartUp owing to " hbase:namespace" being not Online. Your Team tried to assign the Region via HBCK2 yet the issue persists. Additionally, the RegionServer on which the "hbase:namespace" Region is marked for assignment is under Live & Dead as well. The approach used by your Team is correct with respect to using HBCK2. When your Team run the HBCK2 assign command, It should return a PID before terminating. The Output would be like [10], assuming "10" is the PID. Review the HMaster UI for the concerned PID & review the State. When an RS has Live & Dead Status, Review the StartUp Code for the Live & Dead RS. For Example, the RS "hbase-test-rc-5.openstacklocal,16020,1469419333913" StartUp Code is "1469419333913". We expect the Live RS StartUp Code would be recent & Dead RS StartUp Code would be Older with the "hbase:namespace" Region being tried to be assigned to the Live RS. Use the Epoch Converter via Link [1] to convert the StartUp Code to Human Readable Timestamp. If the PID State is fine with Region being assigned to the RS with Live StartUp Code, Review the concerned RS Logs to confirm if the Region Assignment has been received & any issues with opening the Region. Accordingly, Proceed with the troubleshooting. Do review & let us know your observations. - Smarak [1] https://www.epochconverter.com/
... View more
05-01-2021
11:17 PM
Hello @nilk Thanks for using Cloudera Community. Based on the Post, Your Team is experiencing frequent "Bad Health" alerting for various Services, which persists until Restart only to reappear after 1 Minute. It's likely the Health Test are failing owing to a persistent reason rather than intermittent cause. As you haven't shared the CM Version or CDH Version, Sharing a Link [1] which covers the various Health Test & their Causes. Each Alert have their reasoning & we can review the Alerting on CM UI while implementing the preventive measures. Link [2] shares a Post with similar context, wherein the Steps for each Service Health Alert would be different. Unfortunately, there is no Single Solution likely for each Service Alerting. As such, Review the Alerting context for each Service, Do post here if you have any queries concerning the reasoning for any specific alerting & proceed accordingly. - Smarak [1] https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_ht.html [2] https://community.cloudera.com/t5/Support-Questions/Bad-Health-of-cloudera-Managger/m-p/37161/highlight/true#M37913
... View more
03-26-2021
12:12 AM
Hello @Keshu While I am not familiar with Nifi to provide an exact answer, HBase allows VERSION to fetch all versions of a Table. Assuming a Table has been configured with VERSION => 5, We can use Command "scan 'Table_Name', {VERSIONS => 5}" to fetch all Versions from the Table. You may be familiar with the above, yet putting the Update, if you can adjust the Nifi Processor to query the HBase Table as above (If Nifi Processor offers such functionality). I shall allow the Nifi Gurus on the Community to get back to you with the exact requirement. - Smarak
... View more
03-26-2021
12:05 AM
Hello @vishal6196 It's been a while on the Post yet as far as I recall, the App was writing to 1 CF only. In short, WAL is used for each RegionServer & subsequently, the Writes arrives at MemStore based on CF demarcation at Region Level. From WALReader, We confirmed the WAL have entries for 1 CF only, naturally indicating the MemStore of the concerned CF would be populated only. Additionally, I don't recall Crash being observed. Are you facing similar concerns in your Environment. If Yes, Kindly share the following details in a New Post: What's the MemStore Flush failure trace from Logs, If the Problem is Persistent, Whether WALReader (Link [1]) shows Writes happening on all CF of the Regions & Count of CF of the Region. - Smarak [1] https://hbase.apache.org/book.html#hlog_tool.prettyprint
... View more
03-25-2021
11:57 PM
Hello @Priyanka26 Do let us know where you stand with the Current Post. I am aware you had issues with your Cluster courtesy of a Separate Post, yet we wish to follow-up to ensure the Post isn't left unattended from our side. Thanks, Smarak
... View more
03-25-2021
11:55 PM
Hi @Priyanka26 Thanks for the Update. In the 2nd Step, Your Team mentioned creating the required NameSpace & Tables. Yet, I would suggest Bulk-Loading i.e. CompleteBulkLoad Process as simply copying the Hfiles won't likely work. Additionally, the existing Hfiles would be part of Split/Compaction & ideally, I expect your Team would create Tables with 1 Region. As such, BulkLoad would gracefully handle such situations. For Customer facing issues like yours in earlier HBase v2.x HDP Release, We typically use BulkLoad. Yet, Pointing to the fact that your Team should upgrade to HDP v3.1.5 at minimum to avoid this issue in future. - Smarak
... View more
03-22-2021
10:54 PM
1 Kudo
Hello @JB0000000000001 Once you are good with the Post & have no further queries, Kindly mark the Post as Solved to ensure we can mark the Post accordingly. - Smarak
... View more
03-20-2021
02:42 AM
Hello @JB0000000000001 Thank You for the kind words. I intended to reciprocate the level of detailing you posted in the Q, which was helpful for us. Based on your Update, Few pointers: The GC Flags indicates your Team is using CMS & as you mentioned, the GC Cycles JVMPause would trace the logging as you highlighted. Note that " No GCs detected" is observed when a JVMPause isn't GC induced. If your Team observe JVMPause tracing with "No GCs detected", the shared Community Post is a good pointer to evaluating Host Level concerns. 10GB RegionServer Heap is likely to not require 60 Seconds CleanUp, with 60 Seconds being the HBase-ZooKeeper Timeout, unless the 10GB is being filled frequently & we are having a continuous GC Cycle. In other words, if 1 FullGC is immediately followed by another FullGC (Visible in GC Logs), We wouldn't see 1 Large FullGC yet the cumulative impact is worth reviewing & require contemplation of the Eden-TenuredGeneration scale. While 3x Sharding than RegionServer is good for Parallelism, the Write Pattern matters based on the RowKey. If the RowKey of Table ensure Writes are being concentrated into Regions on 1 RegionServer, the concerned RegionServer would be a Bottleneck. While Writes is being carried out, Reviewing the concerned Table View via HMaster UI would offer insight into the same. The *out file of RegionServer would capture the OOME, yet the previous one & not the current one. Additionally, the JVMFlag can be adjusted to include +HeapDumpOnOutOfMemoryError & related parameters for HeapDump path. Link [1] covers the same. One pointer is to review the HMaster Logs for "Ephemeral" (Prefer Case Insensitive) associated tracing. The HMaster trace when an Ephemeral ZNode is being deleted & then, review the RegionServer (Whose Ephemeral ZNode was removed) Logs. This is based on the point wherein your team is (Maybe) reviewing the Spark Job Log for RegionServer unavailability & tracing the RS Logs. The HMaster approach is more convenient & accurate. Do keep us posted on how things goes. - Smarak [1] Command-Line Options - Troubleshooting Guide for HotSpot VM (oracle.com)
... View more
03-20-2021
01:30 AM
Hello @Priyanka26 Thanks for the Update. I haven't tried these Steps yet they look fine on Papers. As you are taking the BackUp of the Data Directory, We would have the HFiles for any concerns as well. Do let us know how things goes & most importantly, Do Plan to Upgrade to HDP v3.1.5. - Smarak
... View more
03-19-2021
08:13 AM
Hello @Priyanka26 Thanks for the Update. The referred JAR aren't available for download. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. If I find anything, I shall let you know. Yet, It's highly unlikely to come across any easier Solution. - Smarak
... View more
03-19-2021
12:10 AM
Hello @WayneWang As we haven't received any further Update, We are closing the Post assuming the issue was handled by the Steps shared above [1]. In HBase v1.x, We have limited choices with HBase v2.x using a new AssignmentManager (Details in HBASE-12439), which would assist in managing the RIT without ZooKeeper involvement. Thanks for using Cloudera Community. - Smarak [1] https://community.cloudera.com/t5/Support-Questions/Hbase-How-to-fix-failed-regions/m-p/312263/highlight/true#M225066
... View more
03-19-2021
12:09 AM
Hello @SurajP KIndly confirm the issue is resolved. If Yes, Please share the Steps for our fellow Community Members & mark the Post as resolved. If the issue persists, Please share the ZooKeeper Logs. - Smarak
... View more
03-19-2021
12:01 AM
1 Kudo
Hello @JB0000000000001 Thanks for using Cloudera Community. First of all, Really appreciate your detailed analysis into the concerned issue. In short, a Spark Job writes a month worth of data into HBase per a month. Intermittently, the Spark Job fails on certain month & your Team observed ServerNotRunningYetException during the concerned period. The Primary issue appears to be RegionServer being terminated (Owing to certain reasons) & Master re-assigning the Regions to other Active RegionServers. Any Region remains in Transition (RIT) until the Region is Closed on the Now-Down-RegionServer + WAL Replay + Region Opened on New-Assigned-RegionServer. Typically, WAL Replay may be taking time causing the Spark Executor Task to be retried & failed during the concerned period. Again, this is my assumption based on the facts laid out by you. So, What can be avoided primarily: 1. RegionServer being terminated, 2. Spark Executor Task failure. For 2, We can definitely increase the Task failure from Default (4) to a Higher Value to ensure the collective failures is lower than the Time taken for a Region to be transitioned from 1 RS to another RS including WAL Replay. As the Job is run once a month, the above Config Change appears to be an easy way out. For 1, I am assuming the RS are going down owing to exceeding their ZooKeeper Timeout during the period of Garbage Collection (Likely, Full GC seeing the Memory Usage). As such, We have the following pointers: If a RegionServer is failing with Out-Of-Memory, the Process JVM *out file would capture the same. If a RegionServer is failing owing to JVMPause from GC exceeding ZooKeeper Timeout, We can check the Logs for the time taken for JVMPause & the ZooKeeper Timeout. If we are seeing JVMPause exceeding ZooKeeper Timeout, Increasing the ZooKeeper Timeout to a Value higher than the Highest JVMPause would help. Negative of this Change is a RegionServer failure would be delayed in being detected. If JVMPause is the Cause, Review the GC Logs. Depending on CMS or G1 GC, the Logs offers a lot of details into the JVMPause & proceed accordingly. Link [1] offers a Great Info by Plumbr. Your MemStore is allocated 25% of the RegionServer Heap. You have written 2 Heap (32GB & 20GB), so I am unsure which is the one. Yet, the above indicates the MemStore is likely to being filled quickly. Your Team have Off-Heap Bucket Cache yet still have an extremely low value for Write Cache (MemStore). I am assuming the same must be causing the YoungGen to be filled up quickly, thereby causing MinorGC & subsequently, the FullGC may be initiated, which is causing the ZooKeeper Timeout. We can try giving a bit more MemStore Space. As your Team didn't complain of Performance, I am assuming the frequent MemStore Flush isn't causing any HDFS impact. Any Hot-Spotting i.e. Writes being managed by few Regions/RegionServer, thereby over-loading the RegionServer. Finally, Have your Team considered using Bulk-Loading to bypass the Memory factor in Load Jobs like you are describing. - Smarak [1] https://plumbr.io/handbook/garbage-collection-algorithms-implementations
... View more
03-18-2021
11:22 PM
Hello @Priyanka26 Thanks for the Update. I see you have posted the Q in a new Post & I have responded to the Q over there. Let us know where you stand with respect to the RIT for " prod.timelineservice.entity " Table. Assuming you have solved the issue, Kindly share the Steps for our fellow Community Users. If the issue remains, Please do share the details discussed in our response. - Smarak
... View more
03-18-2021
10:39 PM
Hello @Priyanka26 Thanks for using Cloudera Community. Based on the Post, Your team have Namespace Region "0c72d4be7e562a2ec8a86c3ec830bdc5" causing the Master StartUp initialization. Using HBCK2 is throwing a Kerberos Exception. In HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster. - Smarak
... View more
03-16-2021
01:39 AM
Hello @Kenzan Thanks for the Update. Typically, Such issues lies with Balancing & the TRACE Logging prints the finer details into the same. Ideally, We should have "StochasticLoadBalancer" as Default & "SimpleLoadBalancer" (Set by your Team) extends on BaseLoadBalancer. Have shared 2 Links documenting the 2 Balancer & their running configurations. As your issue has been resolved, Kindly mark the Post as Resolved to ensure we close the Post as well. Thanks again for being a Cloudera Community Members & contributing as well. - Smarak [1] https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.html [2] http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html
... View more
03-16-2021
12:41 AM
Hello @sheshk11
Thanks for sharing your knowledge (Knowledge Article) on managing the DISABLING Table. As @tencentemr mentioned, It has been helpful. Few other details I wish to add:
1. Using the Link [1] HBCK setTableState to perform the same on HBase v2.x. The advantage of using the same is to ensure the manual intervention is avoided to avoid any unintended HBase Metadata manipulation.
2. In certain cases, the Regions belonging to the Table would be in Transition as well. If we are Disabling the Table, It's best to review the RegionState for the Table as well. Link [1] HBCK setRegionState can assist here.
As the Post is a KA, I shall mark the same as Resolved. Thank You for posting the same for assisting fellow Community Members.
- Smarak
[1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
... View more
03-16-2021
12:21 AM
Hello @novice_tester Thanks for using Cloudera Community. To your query, Flume has been replaced by CFM (Cloudera Flow Management). The Link [1] covers the details around the various components being deprecated in CDP. The Link [2] covers additional details on the same by @TimothySpann . - Smarak [1] https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-rt-updated-cdh-components.html [2] https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html
... View more
03-15-2021
01:41 AM
Hello @Chandresh Thanks for the Update. The Link shared by you deals with improvement from MemStore Flush & eventual HFiles Compaction perspective. Currently, I am unfamiliar with the Blockers impacting your Environment. For Example: If MemStore Writes are being delayed, We can consider reviewing the Flusher Thread. Similarly, if Compaction is a concern (Courtesy of Too-Many-Hfiles), Reviewing the Thread Count would help. Similarly, if MemStore Writes are being blocked owing to Too-Many-WALs, It's worth checking the " hbase.hstore.flusher.count" & " hbase.regionserver.max.logs". Most importantly, How's HDFS Performance & any Hot-Spotting. In short, Evaluating Read & Write Performance collectively would be a large scope for your Team. I would recommend to start with either Read or Write, All Tables or Specific Table, All RegionServer vs Specific RegionServer & proceed accordingly. - Smarak
... View more
03-14-2021
01:25 AM
Hello @Chandresh Thanks for using Cloudera Community. Based on the Synopsis, your Team is observing Cluster Level impact for 1 RS undergoing GC Cycle & becoming unavailable. Things are normal after the concerned RS Restart. There are 2 aspect here: (I) RS undergoing longer GC Cycle, (II) HBase Cluster Un-usability. Ideally, HBase Cluster won't become Un-usable if 1 RS is impacted. Having said that, if the RS is Unresponsive, the Query RPC is handled on the concerned RS would be delayed & ensure the Query responses are delayed or timing out. Under no circumstances, Phoenix accessibility to HBase is impacted. Please confirm whether your Team Phoenix Queries are timing out or delayed when 1 RS is busy in GC Cycle (Different from Phoenix being unable to connect to HBase). If the concerned RS is hosting "hbase:meta", the same is feasible. As such, We need to focus on the RS undergoing GC for longer duration to mitigate any possible scenarios. Have shared a Blog via Link [1] on GC for HBase. Additionally, Check if the RS GC Cycle are causing ZK Timeout or the GC Time was lesser than ZK Timeout. - Smarak [1] https://blog.cloudera.com/tuning-java-garbage-collection-for-hbase/
... View more