Member since
01-16-2018
613
Posts
48
Kudos Received
109
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1434 | 04-08-2025 06:48 AM | |
| 1703 | 04-01-2025 07:20 AM | |
| 1702 | 04-01-2025 07:15 AM | |
| 1358 | 05-06-2024 06:09 AM | |
| 2079 | 05-06-2024 06:00 AM |
05-21-2021
02:57 AM
1 Kudo
Hello @JB0000000000001 Thanks for using Cloudera Community. You wish to store a Table in Cache via a Read-Once Model. As you stated, RowCounter would count the Rows & skip caching the data in the Cache as well, Even if you have a Block Cache configured with sufficient capacity to manage the Table's Data into the Cache, their LRU implementation would likely cause the Object to be evicted. I haven't used any other Cache implementation outside of LRU to comment. If you have a Set of Tables meeting such requirement, We can use RegionServer Grouping for the concerned Table's Region & ensure the BlockCache of the concerned RegionServer Group is used for Selective Tables' Regions, thereby reducing the impact of LRU. Test using the "IN_MEMORY" Flag for the Table's Column Families, which would try to persist the Column's Family Data as long as it can without any guarantee. While an Old Blog, yet [1] offers a few Practices implemented by a Heavy-HBase-Using Customer of Cloudera, written by their Employees' experience. As you are already familiar with various Heap Option, there may be no new Information for you yet I am sharing for closing any loop. Hope the above helps. Do let us know if your Team implemented any approach so as to benefit the wider audience looking at Similar Use-Case. - Smarak [1] http://blog.asquareb.com/blog/2014/11/21/leverage-hbase-cache-and-improve-read-performance/
... View more
05-21-2021
02:17 AM
Hello @dcy Thanks for using Cloudera Community. Based on the Synopsis, the Master isn't starting for HBase after you turned off the Computer & started HBase again. You haven't stated the Version of HBase yet I am suspecting the WAL of the RegionServers involved have issues, causing the concerned issue. Verify whether the HDFS Fsck Report on the WAL & MasterProcWAL files is Healthy. When HBase starts, the WAL of the RegionServers are Split to be replayed & we suspect the WAL Files are having issues, causing the concerned "Cannot Seek After EoF". As you mentioned the Setup being on a Computer, Try Sidelining the WAL Directory of RegionServer(s) & MasterProcWALs to prevent any replay of WAL & any Master Procedures, followed by restarting the HBase Service. The Location of of WAL & MasterProcWAL would be {hbase-rootdir}/WALs & {hbase-rootdir}/MasterProcWALs. Note that Sidelining the WAL have the possibility of Data Loss, if any WAL contains Data which isn't persisted to Disk yet. Kindly review & let us know if the above works. - Smarak
... View more
05-21-2021
02:03 AM
Hello @bigdatanewbie Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP". As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed. As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved. Thanks for using Cloudera Community. - Smarak
... View more
05-11-2021
06:32 AM
Hello @bigdatanewbie Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link [1] is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB). - Smarak [1] https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Clc6CAC
... View more
05-10-2021
12:30 AM
Hello @sakitha Thanks for using Cloudera Community & we hope to assist you in your Big Data Learning. To your Queries, Please find the required details below: (I) When you are running the Job in Client Mode (Like Spark-Shell), the Driver runs on the Local Node wherein the Job is being executed. As such, the Driver Logs is printed in the Console itself. As you mentioned YARN Mode, the Application Master & the Executors are being launched in NodeManagers. In Cluster Mode, the Driver is launched in Application Master JVM & the Driver Logs is captured in the Application Master Logs. (II) Yes, the 2 Directories specified by your Team refers to the Event Logs. You haven't mentioned whether you are using any Orchestration Tool (Ambari, CM). As such, the Log4j needs to be edited to reflect the same. Link [1] refers to a Topic with similar ask. (III) In Spark on YARN Mode, there is 3 Set of Logs: Spark Event Logs from the Event Log Directory (This is the Source of the Information for Spark UI), YARN Application Logs. You can fetch the same via CLI with the Application ID as shared via [2], The Logging Directory "/var/log" holds the Service based Logs like NodeManager, ResourceManager, DataNodes etc. If we assume any Service Level issue impacts the Job, We can review the Service Logs within the concerned Directory. Kindly review & let us know if your ask is answered. Else, Do post your queries & we shall assist you. - Smarak [1] https://stackoverflow.com/questions/32001248/whats-the-difference-between-spark-eventlog-dir-and-spark-history-fs-logdirecto/33554588 [2] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-operating-system/content/use_the_yarn_cli_to_view_logs_for_running_applications.html
... View more
05-02-2021
12:02 AM
Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak
... View more
05-01-2021
11:46 PM
Hello @bigdatanewbie Thanks for using Cloudera Community. Based on the Post, a Spring Boot Application fails to connect to HBase in a Kerberized Cluster. Looking at the Logs, We observe the RegionServer "fepp-cdhdn-d2.mycompany.com/172.29.233.141" isn't able to complete the RPC Request within 60 Seconds Timeout. With 3 retries, the failure being persisted causes the Overall App failure. The fact that the Application identifies the RegionServer hosting the Regions of Table "hbasepoc:alan_test" indicates the Client is able to fetch the Metadata (hbase:meta) Table's Region from the ZooKeeper & connect with RegionServer hosting "hbase:met" Region to pull the required Metadata information. Let's verify the Table "hbasepoc:alan_test" is Healthy by running an HBCK on the Table & using HBase Shell to perform the same Operation as being performed by the Spring Boot Application. If the HBCK Report on the table (Obtained via "hbase hbck -details hbasepoc:alan_test") shows no Inconsistency & HBase Shell access to the Table with the same Operation completes successfully, Reviewing the concerned Host (Wherein Spring Boot Application is running) connectivity with the HBase Setup along with RegionServer Logs would be helpful. Additionally, We can try increasing the Timeout or Retries to confirm the Issue lies with Delayed Response or any other Underlying issues. - Smarak
... View more
03-26-2021
12:05 AM
Hello @vishal6196 It's been a while on the Post yet as far as I recall, the App was writing to 1 CF only. In short, WAL is used for each RegionServer & subsequently, the Writes arrives at MemStore based on CF demarcation at Region Level. From WALReader, We confirmed the WAL have entries for 1 CF only, naturally indicating the MemStore of the concerned CF would be populated only. Additionally, I don't recall Crash being observed. Are you facing similar concerns in your Environment. If Yes, Kindly share the following details in a New Post: What's the MemStore Flush failure trace from Logs, If the Problem is Persistent, Whether WALReader (Link [1]) shows Writes happening on all CF of the Regions & Count of CF of the Region. - Smarak [1] https://hbase.apache.org/book.html#hlog_tool.prettyprint
... View more
03-25-2021
11:55 PM
Hi @Priyanka26 Thanks for the Update. In the 2nd Step, Your Team mentioned creating the required NameSpace & Tables. Yet, I would suggest Bulk-Loading i.e. CompleteBulkLoad Process as simply copying the Hfiles won't likely work. Additionally, the existing Hfiles would be part of Split/Compaction & ideally, I expect your Team would create Tables with 1 Region. As such, BulkLoad would gracefully handle such situations. For Customer facing issues like yours in earlier HBase v2.x HDP Release, We typically use BulkLoad. Yet, Pointing to the fact that your Team should upgrade to HDP v3.1.5 at minimum to avoid this issue in future. - Smarak
... View more
03-22-2021
10:54 PM
1 Kudo
Hello @JB0000000000001 Once you are good with the Post & have no further queries, Kindly mark the Post as Solved to ensure we can mark the Post accordingly. - Smarak
... View more