About smdas

smdas · ‎05-21-2021

Hello @Satya_Singh Do let us know if your issue has been resolved. If Yes, Please share the Mitigation Steps followed by you to ensure other Community Users can benefit from your experience & mark the Post as Resolved as well. - Smarak

smdas · ‎05-21-2021

Hello @sakitha Kindly let us know if your queries posted in the Post have been answered by us. If No, Do share your concerns. If Yes, Please mark the Post as Resolved. Thanks, Smarak

smdas · ‎05-21-2021

Hello @priyanshu_soni You can skip the "Timestamp" part as the same is inserted by HBase implicitly. I tried the same Query as you, excluding the Timestamp & the same was Successful: hbase(main):018:0> put 'Table_X1','125','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}' Took 0.0077 seconds hbase(main):019:0> scan 'Table_X1' ROW COLUMN+CELL 125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"} 1 row(s) Took 0.0057 seconds As you may see above, the "timestamp" field corresponds to the Epoch Timestamp of the Inserted Row Time of Operation. If you wish to explicitly specify the Timestamp, You can include a EpochTime as shared below: hbase(main):020:0> put 'Table_X1','126','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}',1621593487680 Took 0.0202 seconds hbase(main):021:0> scan 'Table_X1' ROW COLUMN+CELL 125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"} 126 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"} 2 row(s) Took 0.0071 seconds Let us know if you have any issues with the Put Operation. - Smarak

smdas · ‎05-21-2021

Hello @JB0000000000001 Thanks for using Cloudera Community. You wish to store a Table in Cache via a Read-Once Model. As you stated, RowCounter would count the Rows & skip caching the data in the Cache as well, Even if you have a Block Cache configured with sufficient capacity to manage the Table's Data into the Cache, their LRU implementation would likely cause the Object to be evicted. I haven't used any other Cache implementation outside of LRU to comment. If you have a Set of Tables meeting such requirement, We can use RegionServer Grouping for the concerned Table's Region & ensure the BlockCache of the concerned RegionServer Group is used for Selective Tables' Regions, thereby reducing the impact of LRU. Test using the "IN_MEMORY" Flag for the Table's Column Families, which would try to persist the Column's Family Data as long as it can without any guarantee. While an Old Blog, yet [1] offers a few Practices implemented by a Heavy-HBase-Using Customer of Cloudera, written by their Employees' experience. As you are already familiar with various Heap Option, there may be no new Information for you yet I am sharing for closing any loop. Hope the above helps. Do let us know if your Team implemented any approach so as to benefit the wider audience looking at Similar Use-Case. - Smarak [1] http://blog.asquareb.com/blog/2014/11/21/leverage-hbase-cache-and-improve-read-performance/

smdas · ‎05-21-2021

Hello @jmag2304 Thanks for using Cloudera Community. Based on the Post, Your team is encountering "Resultset is Closed" after 10 minutes, while your Team is running multiple queries to fetch data from various Tables one after another, with each Table having ~4M Rows. From [1] covering Phoenix Configuration, We observe there is 1 Parameter using 10 Minutes Default i.e. "phoenix.query.timeoutMs", however the same shouldn't impact a Session with multiple queries. Also, You have increased the same without any success. Wish to verify whether you observed any concerning message in the Phoenix Query Server Logs while the concerned Exception is encountered. Whether you have encountered such issues with Phoenix Thick Client (as compared to Phoenix Thin Client using Phoenix Query Server). Hi @Sunny93, Thanks for sharing your Team's experience concerning the issue. Kindly assist by confirming the Server Side Parameter being referred here. It would assist @jmag2304 & fellow Community Members for such issues. - Smarak [1] https://phoenix.apache.org/tuning.html

smdas · ‎05-21-2021

Hello @dcy Thanks for using Cloudera Community. Based on the Synopsis, the Master isn't starting for HBase after you turned off the Computer & started HBase again. You haven't stated the Version of HBase yet I am suspecting the WAL of the RegionServers involved have issues, causing the concerned issue. Verify whether the HDFS Fsck Report on the WAL & MasterProcWAL files is Healthy. When HBase starts, the WAL of the RegionServers are Split to be replayed & we suspect the WAL Files are having issues, causing the concerned "Cannot Seek After EoF". As you mentioned the Setup being on a Computer, Try Sidelining the WAL Directory of RegionServer(s) & MasterProcWALs to prevent any replay of WAL & any Master Procedures, followed by restarting the HBase Service. The Location of of WAL & MasterProcWAL would be {hbase-rootdir}/WALs & {hbase-rootdir}/MasterProcWALs. Note that Sidelining the WAL have the possibility of Data Loss, if any WAL contains Data which isn't persisted to Disk yet. Kindly review & let us know if the above works. - Smarak

smdas · ‎05-21-2021

Hello @bigdatanewbie Thanks for the Comment. As you stated, the Port 16020 is the IPC Port for HBase. When a Client connects to HBase, the 1st Connection is made to the RegionServer holding "hbase:meta" Region. After fetching the Metadata details from the concerned RegionServer, the Client connects with the required RegionServers for the Read/Write Operations being performed by the End-User. Such Communication happens on Port 16020 as well. As such, Please review if the concerned Scenario was applicable for all Traffic between the Client Host & the RegionServer Host on Port 16020, wherein the Traffic is recognised as "Unknown_TCP". As you mentioned, It's surprising the concerned issue hasn't surfaced before as Palo Alto Network Product are widely used, yet I suspect the Firewall Setting may be to allow any Traffic on Port 16020, thereby ensuring the Type of Traffic isn't reviewed. As the concerned issue with your Client Connection to HBase is resolved, Kindly confirm if you have any further ask concerning the Post. If not, Kindly mark the Post as Resolved. Thanks for using Cloudera Community. - Smarak

smdas · ‎05-11-2021

Hello @bigdatanewbie Thanks for the response & sharing the reasoning for the RPC Connection being timed out. Unfortunately, I am not familiar with "unknown_tcp" Connection & reviewing the Palo Alto Site for the concerned topic reports few criterias, wherein a Connection can be termed as "Unknown" if the Connection doesn't have enough Header info or didn't match any Known Application behavior. Link [1] is a KB from Palo Alto on the same context & discuss the same, with the steps to review & mitigate the same (I am sure your Team have reviewed this KB). - Smarak [1] https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Clc6CAC

smdas · ‎05-10-2021

Hello @sakitha Thanks for using Cloudera Community & we hope to assist you in your Big Data Learning. To your Queries, Please find the required details below: (I) When you are running the Job in Client Mode (Like Spark-Shell), the Driver runs on the Local Node wherein the Job is being executed. As such, the Driver Logs is printed in the Console itself. As you mentioned YARN Mode, the Application Master & the Executors are being launched in NodeManagers. In Cluster Mode, the Driver is launched in Application Master JVM & the Driver Logs is captured in the Application Master Logs. (II) Yes, the 2 Directories specified by your Team refers to the Event Logs. You haven't mentioned whether you are using any Orchestration Tool (Ambari, CM). As such, the Log4j needs to be edited to reflect the same. Link [1] refers to a Topic with similar ask. (III) In Spark on YARN Mode, there is 3 Set of Logs: Spark Event Logs from the Event Log Directory (This is the Source of the Information for Spark UI), YARN Application Logs. You can fetch the same via CLI with the Application ID as shared via [2], The Logging Directory "/var/log" holds the Service based Logs like NodeManager, ResourceManager, DataNodes etc. If we assume any Service Level issue impacts the Job, We can review the Service Logs within the concerned Directory. Kindly review & let us know if your ask is answered. Else, Do post your queries & we shall assist you. - Smarak [1] https://stackoverflow.com/questions/32001248/whats-the-difference-between-spark-eventlog-dir-and-spark-history-fs-logdirecto/33554588 [2] https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/data-operating-system/content/use_the_yarn_cli_to_view_logs_for_running_applications.html

smdas · ‎05-02-2021

Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak

Online	Offline
Last Visited	‎08-20-2025 12:14 AM

Member Since	‎01-16-2018 09:55 AM
Last Visited	‎08-20-2025 12:14 AM
Posts	613
Kudos received	48

Cloudera Community

Re: Timeout: PBJ session not going idle

Re: Impact of Upgrading EKS from 1.29 to 1.31 on C...

Re: Capture airflow run duration

Re: How to enable IAM for apache airflow

Re: Apache Airflow can not connect to mssql 2008

Re: hbase Master startup cannot progress

Re: Which directory spark applications on yarn out...

Re: How to put Json data as a Json format in HBase

Re: simplest method to read a full hbase table so ...

Re: Phoenix JDBC resultSet closes prematurely

Re: Hbase

Re: HBase Java client connection timeout

Re: HBase Java client connection timeout

Re: Which directory spark applications on yarn out...

Re: One region for "prod.timelineservice.entity" h...