Member since
07-30-2020
219
Posts
45
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
435 | 11-20-2024 11:11 PM | |
488 | 09-26-2024 05:30 AM | |
1084 | 10-26-2023 08:08 AM | |
1852 | 09-13-2023 06:56 AM | |
2129 | 08-25-2023 06:04 AM |
12-29-2022
12:04 AM
1 Kudo
@mabilgen you have 143 million blocks in the cluster and the NN heap is 95GB. This is why the NN is not holding up. You would need to bring the total block count to 90 million for NN to start working properly as the NN expects atleast 150GB of heap for 143 million blocks to work smoothly.
... View more
12-28-2022
10:04 AM
1 Kudo
HI @mabilgen , The main problem that you have on this cluster is lack of RAM on the host which is limited to 128GB. The Namenodes while startup will consume the allocated heap of 98GB that leaves 30GB of memory for any other processes running on this host. When other processes also utilise this remaining 30GB of memory, you are seeing huge JVM pauses as the garbage collector is trying to de-reference the objects to free up the memory but this takes too much time that the Namenode gives up on the Journalnode Quorum and fails over. As a Thumb rule, you should allocate 1GB of heap for 1 million blocks. So if there are more than 98million blocks on this cluster, then the current NN heap is not sufficient. 1) Try to lower the Total block count on the cluster by deleting any unwanted files or old snapshots. 2) If feasible, add more Physical RAM to the host Any amount of tunings won't be helpful in this situation as the jvm pauses are too big to be managed by tunings. As such, you would need to either perform cleanup of HDFS or Add more RAM to NN hosts or Move Namenode to another Node which has higher RAM.
... View more
12-28-2022
06:17 AM
HI @Sadique1 There is no specific option to decommission a disk from Datanode. The best option would be to decommission the Datanode followed by removing the disk and recommission it back. https://community.cloudera.com/t5/Community-Articles/Decommission-and-Reconfigure-Data-Node-Disks/ta-p/248262 The downside of decommission and recommission is that it will need to copy all the blocks to another host for all disks, not just the one you are removing. OR Considering that you don't have files with rep factor=1(fsck will tell you if any such files exist.), You could remove the disk from 1 DN and restart it, and wait for the NN to recover the under replicated blocks. Then repeat for each node. Note : If there are files with rep factor as 1 on those Datanode disks, then you will be getting missing blocks. Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
12-27-2022
11:19 AM
Hi @younes You will need to use the CM api to get this done. Check if the below helps : https://community.cloudera.com/t5/Support-Questions/hadoop-amp-amp-eco-system-cmdline-start-stop/m-p/359732#M238177
... View more
12-16-2022
11:58 AM
Hi @ditto You can use API to start the services. To Stop and Start a CDH service, you could try to make use of the CM APIs such as the below, Start the service - POST /clusters/{clusterName}/services/{serviceName}/commands/start Stop the service - POST /clusters/{clusterName}/services/{serviceName}/commands/stop https://cloudera.github.io/cm_api/apidocs/v6/path__clusters_-clusterName-_commands_stop.html https://cloudera.github.io/cm_api/apidocs/v6/path__clusters_-clusterName-_commands_start.html https://cloudera.github.io/cm_api/apidocs/v6/index.html
... View more
12-07-2022
10:31 PM
1 Kudo
Hi @jgabrey-1216863216 , This has been fixed in CDP 7.1.7 SP1 CHF20 (p1063). You can refer the below doc : https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/runtime-release-notes/topics/chf-pvcb-sp1-overview.html#ariaid-title2
... View more
12-06-2022
12:49 AM
Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost. 22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181 We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum. So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"
... View more
11-23-2022
10:50 PM
1 Kudo
HI @bgkim, Hbase won't work well with data folders of hbase set to EC policy. EC did not implement hflush which is required to not lose data. Additionally, the Data locality is also killed with EC which makes it slower. https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/cdppvc-data-migration-opdb/topics/cdppvc-data-migration-hbase-unsupported-interfaces-features.html -- Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
11-23-2022
08:25 AM
1 Kudo
Hi @bhushan04ec041 , Can you try the below and check : 1) Stop the Oozie service 2) Select the "Delete" option from drop down menu. Under "Remove service Dependencies", select "Configure Service Dependency" and uncheck "oozie". 3) Save Changes 4) Delete Oozie
... View more
11-17-2022
01:19 AM
@mike_bronson7 -- Was your question answered? Please take some time to click on “Accept as Solution” below this post.
... View more