Member since
09-03-2020
46
Posts
2
Kudos Received
0
Solutions
05-06-2022
02:42 AM
@arunpoy If we are using the CDH/CDP both timeout parameters(hbase.rpc.timeout & hbase.client.scanner.timeout.period) need to be added in both server-side and client-side in the below paths from the HBase configuration. HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml Bothe the time-out parameters need to be added on the server and client-side and RPC(hbase.rpc.timeout) time-out needs to be set a bit higher than the client scanner time-out (hbase.client.scanner.timeout.period).
... View more
03-11-2022
01:41 AM
We can use the HBase master UI for enabling and disabling the TRACE/DEBUG/INFO without restarting any service. ➤ To enable the TRACE/DEBUG logging please follow the below steps from the Hbase master UI for the master server. Step 1: Hbase Master UI -> click on log level-> add the property(org.apache.hadoop.ipc) on the second log box-> enter the DBUG in Level box-> click on Set log level radio button. ➤ To enable the TRACE/DEBUG logging please follow the below steps from the Hbase master UI for the specific region server. Step 1: Hbase Master UI -> select specific region server-> click on log level-> add the property(org.apache.hadoop.ipc) on the second log box-> enter the DBUG in Level box-> click on Set log level radio button. ➤ Once the completion of the above steps, just verifies whether the changes are effected or not with the below steps. Step 2: Hbase Master UI -> select specific region server/Master server-> click on log level-> add the property(org.apache.hadoop.ipc) on the fist log box-> click on the Get log level radio button, and verify the log level. Or ➤ hadoop daemonlog -setlevel <hbase regionserver host>:<port> org.apache.hadoop.hbase TRACE/DEBUG/INFO
... View more
10-25-2021
01:30 AM
Hi, @kras Thank you for writing back with your observation. Can you please check the below details as well? 1) When the Region Server JVM reports High CPU, Open "top" Command for the Region Server PID, 2) Use "Shift H" to open the Thread View of the PID. This would show the Threads within the Region Server JVM with CPU Usage, 3) Monitor the Thread View & Identify the Thread hitting the Max CPU Usage, 4) Take Thread Dump | JStack of Region Server PID & Compare the Thread with the "top" Thread View consuming the Highest CPU. 5) Check the CUP usage of the other services that are hosted on the Region Server host. The above Process would allow you to identify the Thread contributing towards the CPU Usage. Compare the same with other Region Server & your Team can make a Conclusive Call to identify the reasoning for CPU Utilization. Howsoever Logs are reviewed, Narrowing the Focus of JVM review would assist in identifying the Cause. Review shared Link for additional reference. Ref: https://www.infoworld.com/article/3336222/java-challengers-6-thread-behavior-in-the-jvm.html https://blogs.manageengine.com/application-performance-2/appmanager/2011/02/09/identify-java-code-co... https://blog.jamesdbloom.com/JVMInternals.html Thanks & Regards, Prathap Kumar.
... View more
10-22-2021
10:55 PM
1) check the region server logs is there “responseTooSlow” or “operationTooSlow” or any other WARN/ERROR messages. please provide log snippets. 2) if we are seeing the "responseTooSlow" on the region servers, please check the data node logs for the underlying issue from the data node logs. 3)In the data node logs please check we have below ERROR/WARN in the data node logs are not. Slow BlockReceiver write data to disk cost - This indicates that there was a delay in writing the block to the OS cache or disk. Slow BlockReceiver write packet to mirror took - This indicates that there was a delay in writing the block across the network Slow flushOrSync took/Slow manageWriterOsCache took - This indicates that there was a delay in writing the block to the OS cache or disk 4) If we have the above ERROR/WARN we need to check the infra team and OS vendor team to fix the underlying hardware issues to overcome issue. There are many reasons this could happen including OS/Kernel bugs (update your system), swap, transparent huge pages, pauses by a hypervisor for the High CPU usage issues and you need to figure out which is causing the issue and need to fix it to overcome the issue.
... View more
10-22-2021
10:50 PM
2 Kudos
There are many reasons this could happen including OS/Kernel bugs (update your system), swap, transparent huge pages, pauses by a hypervisor for the High latency issues. 1) As we are seeing the "responseTooSlow" on the region servers, please check the data node logs for the underlying issue from the data node logs. 2) In the data node logs please check we have below ERROR/WARN in the data node logs are not. Slow BlockReceiver write data to disk cost - This indicates that there was a delay in writing the block to the OS cache or disk. Slow BlockReceiver write packet to mirror took - This indicates that there was a delay in writing the block across the network Slow flushOrSync took/Slow manageWriterOsCache took - This indicates that there was a delay in writing the block to the OS cache or disk 3) If we have the above ERROR/WARN we need to check the infra team and OS vendor team to fix the underlying hardware issues to overcome issue.
... View more
10-18-2021
10:59 AM
Usually, Exception: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out is caused by communication issues among Hadoop cluster nodes. To resolve this issue, check the following: a) Whether there are any communication problems among the Hadoop cluster nodes. b) Whether SSL certificate of any data node has expired (If Hadoop cluster is SSL enabled). c) If the SSL changes were made and services that are using the SSL is not restarted after the activity the issue will occur, need to restart the services in the cluster which are using the SSL.
... View more
07-02-2021
06:05 AM
Disk balancer is not available in the HDP 2.X and it is available from the HDP 3.X As a workaround for this, we can decommission the data node where we are observing the disk are not balanced equally, clean up the data node, recommission the node again, and run the HDFS balancer again. Thanks, Prathap Kumar.
... View more