Member since
01-16-2018
553
Posts
37
Kudos Received
91
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
141 | 03-10-2023 07:36 AM | |
103 | 03-10-2023 07:17 AM | |
112 | 02-28-2023 09:04 PM | |
92 | 02-28-2023 08:53 PM | |
92 | 02-28-2023 08:43 PM |
07-28-2021
03:42 AM
Follow-Update to tag @KR_IQ @sppandita85BLR as the Original Post is Old.
... View more
07-28-2021
03:41 AM
Hello @a_gulshani Thanks for using Cloudera Community. Based on the Post, the RangerAudits Collection has issues, which is causing the "Error running solr query, please check solr configs. Could not find a healthy node to handle the request" Message. The Date Exception shared by you refers to "audit_logs_shard0_replica1" , which isn't related to RangerAudits Collection. The Ranger Audit UI relies on RangerAudits Collection & if the Shards of RangerAudits Collection aren't available, We get the "Error running solr query, please check solr configs. Could not find a healthy node to handle the request" Message. Open the Infra-Solr UI & Verify the State of the RangerAudits Collection by Solr UI > Cloud > Graph. If the Shards associated with RangerAudits Collection aren't Active, the Error is expected. Next, Review the Infra-Solr Service Logs & confirm the reasoning for the Shard unavailability. There can be multiple reasoning for Shard unavailability, hence Logs would be the best place to review. For a Quicker Solution & if you are willing to lose the RangerAudits, You can delete the RangerAudits Collection & Restart the RangerAdmin Service to ensure the Collection is created afresh. Kindly review & let us know your observation. - Smarak
... View more
07-26-2021
01:15 AM
Hello @michalm_ Thanks for using Cloudera Community. While I haven't performed such Task, I wish to check if you have reviewed the 3rd Party Script via [1], which uses a Use-Defined Time to find Impala Queries & optionally, Kill them as well. Let us know if it helps. - Smarak [1] https://github.com/onefoursix/kill-long-running-impala-queries
... View more
07-26-2021
01:07 AM
Hello @Joe685 Thanks for using Cloudera Community. Your Question is with respect to using Filter for visualizing the Datapoints. You ask is Clear, yet we wish to confirm the Platform. You mentioned "Public". Does that mean you are reviewing Data Visualization on CDP Public Cloud. If Yes, We wish to check if [1] fits your requirement. If Not, any additional details into the Platform as requested earlier can assist us in reviewing internally & getting back to you accordingly. - Smarak [1] https://docs.cloudera.com/data-visualization/cloud/filter-shelf/topics/viz-filter-shelf-range-date.html
... View more
07-23-2021
04:20 AM
Hello @mahfooz-iiitian Please note that CDH Releases are End-Of-Support & CDP Releases aren't supporting Spark-HBase-Connector in favor of HBase-Spark-Connector as documented below. In short, Hortonworks Spark-HBase Connector isn't Supported yet in favor of HBase Connector for Spark. - Smarak [1] https://issues.apache.org/jira/browse/HBASE-25326 [2] https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md [3] https://docs.cloudera.com/cdp-private-cloud/latest/data-migration/topics/cdp-data-migration-hbase-prepare-data-migration.html
... View more
07-23-2021
03:58 AM
Hello @Chandresh Hope you are doing well. We wish to confirm if you have identified the Cause of the issue. If Yes, Kindly share the same to benefit our fellow Community Users as well. If no further assistance required, Please mark the Post as Solved. - Smarak
... View more
07-23-2021
03:41 AM
Hello @ryu As mentioned by @arunek95, we assume Phoenix is enabled for the Cluster. If not, Kindly enable Phoenix & try the Command again. The Logging indicates HDP v2.6.1.0 with Phoenix v4.7. The Directory " /usr/lib/phoenix/" has the Phoenix Client & you mentioned the same Directory has Phoenix Server Jar as well. Kindly verify if the Permission on the JAR is Correct & confirm via "jar -tvf" on the Phoenix Server Jar that the Class "MetaDataEndpointImpl" is included in the same. The Error indicates the Phoenix creating the SYSTEM Tables (Upon 1st Connection to Phoenix) is encountering the Error. In our Internal Setup, We see the Phoenix-Server Jar is present in HBase Lib Path as well, pointing to the Phoenix-Server Jar in Phoenix Lib Path as SymLink: /usr/hdp/<Version>/hbase/lib/phoenix-server.jar -> /usr/hdp/<Version>/phoenix/phoenix-server.jar Kindly ensure the Phoenix Server JAR is present in HBase Lib Directory as well. Additionally, Review the Master Logs to check for the Error Message at HBase Level as well. - Smarak
... View more
07-23-2021
03:08 AM
Hello @gurucgi This is an Old Post yet we wish to check if the issue has been addressed by you. If Yes, Please share the Steps to ensure fellow Community Users can benefit from your experience. Based on the Post, SystemCatalogRegionObserver is being reported for Class not being loaded. The concerned Error is being received for Phoenix v5.1.1. In Cloudera, We are shipping with Phoenix v5.1.0 at the time of writing. I am not sure if you are being impacted via [1]. In short, the JAR seems to be requiring the Phoenix-Server Jar to be placed rightfully. - Smarak [1] https://issues.apache.org/jira/browse/PHOENIX-6330
... View more
07-23-2021
02:39 AM
Hello @Satya_Singh Do let us know if your issue has been resolved. If Yes, Please share the Mitigation Steps followed by you to ensure other Community Users can benefit from your experience & mark the Post as Resolved as well. - Smarak
... View more
07-23-2021
02:38 AM
Hello @krishpuvi Thanks for using Cloudera Community. Based on the Post, You wish to run HBase Major Compaction on a particular queue to ensure the Compaction activity doesn't impact other resources in the Cluster. Please note that HBase Major Compaction isn't using any YARN managed resources & as such, We can't map Major Compaction to be run on any YARN Queue. You can restict MajorCompaction to use lesser bandwidth via [1] & [2], which explains the " PressureAwareCompactionThroughputController" with Min & Max Speed along with the Parameter to disable any Speed Limit via "NoLimitThroughputController". Additionally, You can disable Compaction & run Compaction manually during Off-Business-Hours at Table/ColumnFamiliy Level as well. - Smarak [1] https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/configuring-hbase/topics/hbase-limit-the-speed-of-compactions.html [2] https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/throttle/PressureAwareCompactionThroughputController.html
... View more
07-23-2021
02:31 AM
Hello @JB0000000000001 We wish to follow-up with you on the Post & confirm if you have any additional Observation to be shared with respect to studying or implementing HBase on Cloud Storage. Or, our response to your Post was helpful in getting the required Cloud Storage possible latencies. - Smarak
... View more
07-23-2021
02:27 AM
Hello @proble As we haven't received any response from your side, We shall be marking the Post as Resolved. Having said that, Please do share your experience with handling the concerned issue & whether the details shared by us assisted you. In short, the HBase Table Creation is failing as Master hasn't completed Initialisation. You have to use HBCK2 Tool to assign the HBase:Meta & HBase:Namespace (Whichever isn't assigned as per Master Logs). Link [1] covers our response to the Post on 2021/05/31 with the Log trace to be verified in Master Logs & the Steps to be followed. We hope the Post was helpful to you & assisted in resolving the issue. - Smarak [1] https://community.cloudera.com/t5/Support-Questions/Error-while-creating-table-in-hbase/m-p/317429/highlight/true#M227161
... View more
07-23-2021
02:18 AM
Hello @Aman_Patel_CV Thanks for using Cloudera Community. Based on the Post, You would like to take HBase Table Backup in incremental fashion to S3. Kindly use [1] for using Incremental Backup to S3, which list the Usage with S3 as Backup Path as well. Additionally, Customer may use Snapshot yet the Snapshot needs to be exported to S3 as covered via [2]. Link [3] also offers more details into S3 Backup Scenario available with S3 & HBase. Let us know if the shared references assist you. Please report if you have any issues as well & we shall get back to you with the required details. - Smarak [1] https://hbase.apache.org/book.html#br.terminology [2] https://hbase.apache.org/book.html#snapshots_s3 [3] https://hbase.apache.org/book.html#br.s3.backup.scenario
... View more
07-23-2021
02:05 AM
Hello @BabaHer Thanks for using Cloudera Community. It appears HBASE-25326 via [1] allows HBase Connector to be used for Spark v3.0 & the Git Page via [2] by the HBASE-25326 Owner offers Example as well. - Smarak [1] https://issues.apache.org/jira/browse/HBASE-25326 [2] https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md
... View more
07-04-2021
05:06 AM
Hello @JB0000000000001 Thanks for using Cloudera Community. This is an Old Post, as such I am unsure whether you have found the details being shared below. Having said that, [1] by Cloudera HBase Team offers a few details of S3 Performance in the final paragrapgh. As mentioned in the concerned Paragraph, BucketCache is Critical for Performance & AWS suggest the same in [2] under "Operational considerations" Section. In short, there are definitely areas for Performance Degradation yet there are few Links is LinkedIn Slideshare around Microsoft, Airbnb, Huawei HBase Usage on Cloud & they offer a lot of details into the Observation made & Optimisation performed. The Optimisation aren't always centered on HBase & pivot towards Cloud Infrastructure as well. Such Slideshare should offer additional insight to your Team. Being an Old Post, We wish to check if your Team have come across findings, which can be shared with us. This would allow the Post to be used by fellow community Users, who may be considering the Cloud Storage Latency aspect as well. - Smarak [1] https://blog.cloudera.com/how-hbase-in-cdp-can-leverage-amazons-s3/ [2] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html
... View more
07-04-2021
04:36 AM
Hello @proble Kindly confirm if you have resolved the issue posted. If Yes, Kindly share the Steps to ensure fellow Community Users can gain from your experience. If our Post assisted in any manner, Confirm the same as well. Thanks, Smarak
... View more
07-04-2021
04:29 AM
Hello @william266455 As we haven't heard from your side, We are marking the Post as Solved as the Question wasn't associated with any Issue, rather a Generic discussion on Cloud Technology in 2021. We would like to hear from your side concerning the Cloud Technology Usage & CDP Public Cloud in your daily work & if there is any queries concerning CDP Public Cloud, which you would like us to confirm or answer. Thank You for using Cloudera Community. - Smarak
... View more
06-10-2021
11:39 PM
Hello @william266455 Hope you are doing well. We wish to follow-up concerning the Post & whether you may have further queries. If there are no further ask, Kindly mark the Post as Solved. - Smarak
... View more
05-31-2021
11:48 AM
Hello @proble Based on the Image, the Master hasn't completed Initialisation. As such, We need to ensure the HBase:Meta & HBase:Namespace Table's Regions are Onlined to ensure Master can complete initialisation. You haven't stated the HBase Version & the Product Type (HDP, CDH, CDP). Having said that, Kindly review [1], which explained the concerned issue, How to verify the same from HBase Master Logs & the Mitigation Steps via HBCK2 Tool. Link [2] from the same Page covers the Steps to obtain the HBCK2 Jar from the Git & using it by building the same. Kindly review & let us know if you have any issues with the same. - Smarak [1] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#master-startup-cannot-progress-in-holding-pattern-until-region-onlined [2] https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#obtaining-hbck2
... View more
05-31-2021
11:38 AM
2 Kudos
Hello @william266455 Thanks for using Cloudera Community. Your Post covers an extremely wide domain yet I would like to share my perspective from working with Customers on CDP Public Cloud: From the 3 Cloud Vendors, AWS was the Pioneer to start with & as such, I wish to point to the Link [1], which offers the 6 Major Advantages of Cloud Computing from an AWS WhitePaper. Listing them here, We have "Trading Capital Expenses With Operational Expenses", "Scale", "Stop Guessing Capacity", "Agility", "Go Global", "Avoid Running & Maintaining Data Centre". The PDF does an awesome job of explaining the Benefits & I would skip reiterating the same. As you posted the Question in Cloudera Community, I would like to take a few minutes sharing the Current On-Premise Model Short-Coming from Hadoop Ecosystem perspective (Note that On-Premise has it's Advantages & Stating On-Premise is Defunct would be an Over-Statement): Associated "Big Data" with "Hadoop" isn't new (Although the Focus should be moving from Data Size/Type to Analytics & Value from Data). Hadoop Ecosystem offers a wide variety of Services & they fit Use-Cases with different needs. As the Services increase, So does the Data being processed & reviewed. As such, any Capacity Planning to hard to begin with. Be it Scaling Up or Scaling Out, Managing Linux or Windows Server isn't Easy. A Simple Task of adding Resources to an Instance isn't performed without Operational Experience & adding New Instances requires getting in touch with Multiple Parties (Admin, Billing, Vendor, Transport, Sponsor etc) Other context like Agility, Monitoring, Scaling, Routing are Self Explanatory. Now, How does Cloudera helps via Cloudera Data Platform (CDP) is as follows: By ensuring Users only bother about their Use-Case (Running a Spark Job, Hive-Impala SQL, NoSQL HBase Analytics etc.) like the Users have been doing on their On-Premise, yet with the Flexibility of Cloud. The Best Part is a Customer can have 1 Setup on AWS, another Setup on Azure or GCP as well, yet the UI to perform your Hive/Spark/Kafka/HBase Jobs remains Similar. When no Jobs are running, the Scale-Down happens implicitly & when Jobs are running, the Scale-Up happens without any User's intervention. This ensure User focus on their Job only & Cost is Saved as well. Avoid Noisy Neighbours by ensuring the Jobs runs via Containers with their own resources & avoid any poor Job take away Host level resources, thereby affecting other Jobs. As you tag "Cloudera Data Engineering", I wish to share a bit of details on the same. CDE allows you to run Spark Job (Just like you run on On-Premise) yet with the Flexibility of running them on Containers as well. And the Best Part is, End-User don't have to know anything about the Kubernetes Cluster running their Spark Job. In short, Cloud offers Flexibility, Agility which is hard to achieve with On-Premise Setup. Yet, with Power comes Responsibility & here, AWS have a Shared-Responsibility Model as shared via [2]. Finally, I wish to share Cloudera Public Cloud Offering via [3] & some Awesome Free Training Videos on CDP via [4]. The above Opinion is Completely mine from my Experiences & I have barely scratched the Surface of the benefits from Cloud. I am happy to answer any specific query you may have from the above details. Also, Note that each of the AWS Link or PDF are Public Materials. - Smarak [1] https://docs.aws.amazon.com/whitepapers/latest/aws-overview/aws-overview.pdf#six-advantages-of-cloud-computing [2] https://aws.amazon.com/compliance/shared-responsibility-model/?ref=wellarchitected [3] https://docs.cloudera.com/cdp/latest/index.html [4] https://www.cloudera.com/about/training.html#?fq=training%3Acomplimentary%2Ffree
... View more
05-26-2021
11:26 PM
1 Kudo
Hello @Priyanka26 As we haven't heard from your side, We shall summarise the Discussion in the Post to ensure the same benefits Users with similar experiences: PROBLEM: In HDP v3.1.0, HBase NameSpace Region isn't assigned, thereby causing the following Message: 2021-03-17 20:29:54,614 WARN [Thread-18] master.HMaster: hbase:namespace,,1575575842296.0c72d4be7e562a2ec8a86c3ec830bdc5. is NOT online; state={0c72d4be7e562a2ec8a86c3ec830bdc5 state=OPEN, ts=1616010947554, server=itk-phx-prod-compute-6.datalake.phx,16020,1615483461273}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined. Your Team tried to use HBCK2 Assign yet the same fails with the following Error: Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: java.io.IOException: Call to itk-phx-prod-master-2.datalake.phx/192.168.15.180:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name DISCUSSION SUMMARY: (I) In Customer's HDP v3.1.0, We have a Bug wherein the HBCK2 JAR can't used with the available Hbase-Client & Hbase-Server JAR in a Secure Cluster. There is no issue with the way your team is using the HBCK2. Owing to the Bug being mentioned above, the HBCK2 Jar is throwing the concerned exception. Without the modified Hbase-Client & Hbase-Server JAR, We can try to re-initialize the HBase Cluster yet only if the same isn't a Production Cluster. (II) The referred JAR aren't available for download publicly. Unfortunately, I am not familiar with any other means other than manual intervention (Start HBase on a new DataDir & Bulkload from previous DataDir being one of them). Such issues aren't present in HDP v3.1.5 onwards. (III) Your Team decided to use the Bulk-Load approach to ensure HBase is Initialised afresh. [1] shares the Steps used by your Team. In short, Do Upgrade to HDP v3.1.5 (The same would be a Maintenance Upgrade from v3.1.0 to v3.1.5) as soon as possible. Until then, Such issues require Bulk-Loading. The Bug causing the HBCK2 issue in a Kerberized Environment impacts HDP v3.0.0 through (And inclusive) HDP v3.1.4 & Fixed in HDP v3.1.5. Thanks again for using Cloudera Community. - Smarak [1] https://community.cloudera.com/t5/Support-Questions/Hbase-namespace-table-in-not-online/m-p/313460/highlight/true#M225541
... View more
05-26-2021
11:08 PM
Hello @HadoopBD We hope the Steps shared by @sebastienleroy via the Community Link has worked for you. Additionally, Sharing 2 Links which document the same for ensuring HBase is monitored with Prometheus. - Smarak [1] https://godatadriven.com/blog/monitoring-hbase-with-prometheus/ [2] https://grafana.com/grafana/dashboards/12722
... View more
05-26-2021
10:52 PM
1 Kudo
Hello @JB0000000000001 Unfortunately, I didn't find any 1 Document explaining the HMaster UI Metrics collectively. Let me know if you come across any Metrics, which isn't clear. I shall review the same & share the required details. If I can't help, I shall ensure I get the required details from our Product Engineering Team to assist as well. - Smarak
... View more
05-26-2021
07:50 AM
1 Kudo
Hello @JB0000000000001 Thanks for the Kind Words. I certainly thought in the same direction that the RowCounter may not be explicitly caching the Data yet the Hfiles Metadata (Index, Bloom Filter) enough to likely improve the Subsequent queries with the rejection of Hfiles to be processed, thereby assisting by reducing the scope of Hfiles to be reviewed before returning the Output. Thanks again for using Cloudera Community & helping fellow Community Members by sharing your experiences around HBase. - Smarak
... View more
05-21-2021
01:51 PM
1 Kudo
Hello @JB0000000000001 Appreciate your detailed response. To some of your reviews, To check upon the Impact of Row Counter on Caching, Created a Table on a Vanilla HBase Setup, Executed PE Write to insert 100,000 Rows. Flush & Restarted the Service to ensure there is no WAL Split write going through the MemStore, thereby assisting in Read Access. As shown below, there is no Read & Write with Server Metrics showing 4 Blocks of 400KB Size in Block Cache: After a RowCounter Job, We observed ~100,000 Reads with Used Heap increased around ~150MB yet Block Cache Size remained Similar. After a Scan Operation reading each of the 100,000 Rows, the Read Request increased by 100,000 & the Block Cache Sizing increased by ~100MB & Block Count increasing by ~1500 & Hit Ration further deteriorating owing to larger Misses: Row Counter likely doesn't offer Caching benefits yet I have shared our Observation to ensure you can compare the same with your observation as well. Using the Region Server Grouping would still involve Eviction, yet the Competition & Quantity of Objects for Cache would be reduced by reducing the Tables involved within the RS Grouping Scope. This might be a bit Excessive approach yet I thought of sharing the same, in case you may wish to review. The Link [1] has the required Map Reduce Programs & I couldn't find anyone, which may fit the Use-Case of access all Data 1x. A Full Table Scan can be likely equivalent to running a Scan on HBase Shell for the Table. Or, Phoenix for SQL approach may be considered by you yet I believe writing a SELECT SQL using FTS is equivalent to Scanning the HBase Table via "scan". Link [2] does mention In-Memory Column Family are the last to evict, yet there's no guarantee that In-Memory Column Family would always remain in Memory. Also, I see you have "CACHE_INDEX_ON_WRITE" as True. So, I assume you might be extracting the most from your Memory unless Caching Policies possibility. While I haven't tested, Sharing Link [3] which talks about the various Caching Policies Test for Block Cache. While I couldn't find any 100% guaranteed way to Cache the Objects, I believe you have implemented the most such functionalities already as discussed until now. I fear Compression & Encoding may not be helpful as Data is always Decompressed in Memory, likely leaving the possibility of reviewing the Caching Policies. I may have missed other possibilities & would share, if I come across any. - Smarak [1] https://hbase.apache.org/book.html#hbase.mapreduce.classpath [2] https://hbase.apache.org/book.html#block.cache.design [3] https://blogs.apache.org/hbase/entry/comparing_blockcache_deploys
... View more
05-21-2021
03:51 AM
Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak
... View more
05-21-2021
03:51 AM
Hello @Priyanka26 We wish to follow-up with your Team concerning the Post. If the issue is resolved, Do mark the Post as Solved & share the Steps followed by your Team to ensure our fellow Community Users can learn from your experience as well. Thanks, Smarak
... View more
05-21-2021
03:51 AM
Hello @Satya_Singh Do let us know if your issue has been resolved. If Yes, Please share the Mitigation Steps followed by you to ensure other Community Users can benefit from your experience & mark the Post as Resolved as well. - Smarak
... View more
05-21-2021
03:47 AM
1 Kudo
Hello @sakitha Kindly let us know if your queries posted in the Post have been answered by us. If No, Do share your concerns. If Yes, Please mark the Post as Resolved. Thanks, Smarak
... View more
05-21-2021
03:43 AM
Hello @priyanshu_soni You can skip the "Timestamp" part as the same is inserted by HBase implicitly. I tried the same Query as you, excluding the Timestamp & the same was Successful: hbase(main):018:0> put 'Table_X1','125','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}'
Took 0.0077 seconds
hbase(main):019:0> scan 'Table_X1'
ROW COLUMN+CELL
125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
1 row(s)
Took 0.0057 seconds As you may see above, the "timestamp" field corresponds to the Epoch Timestamp of the Inserted Row Time of Operation. If you wish to explicitly specify the Timestamp, You can include a EpochTime as shared below: hbase(main):020:0> put 'Table_X1','126','Cf1:CheckItem','{"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}',1621593487680
Took 0.0202 seconds
hbase(main):021:0> scan 'Table_X1'
ROW COLUMN+CELL
125 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
126 column=Cf1:CheckItem, timestamp=1621593487680, value={"ID" : "1334134","Name" : "Apparel Fabric","Path" : "Arts, Crafts & Sewing/Fabric/Apparel Fabric"}
2 row(s)
Took 0.0071 seconds Let us know if you have any issues with the Put Operation. - Smarak
... View more