Member since
11-27-2020
9
Posts
0
Kudos Received
0
Solutions
12-11-2020
08:16 PM
My cluster Java Heap memory values kindly please check this values. Raj@77 Your Service Monitor is running out of Java Heap hence the issue:
where can i found exact location of service monitor using java heap memory value. is there any specific requirement for service monitor Java Heap value. Is there any issue. /etc/default/cloudera-scm-server
Java Options.
#
# Default value sets Java maximum heap size to 2GB, and Java maximum permanent
# generation size to 256MB.
#
export CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp" Heap size: https://docs.cloudera.com/cloudera-manager/7.2.1/managing-clusters/topics/cm-configuring-memory-allo... Link is not working
... View more
12-11-2020
06:46 AM
SERVICE_MONITOR_AGGREGATION_RUN_DURATION
The health test result for SERVICE_MONITOR_AGGREGATION_RUN_DURATION has become bad: The last metrics aggregation run duration is 33.9 second(s). Critical threshold: 30 second(s).
Service Monitor File /var/log/cloudera-scm-firehose/mgmt-cmf-mgmt-SERVICEMONITOR.log.out
8:35:51.569 PM WARN Groups Potential performance problem: getGroups(user=hue) took 8024 milliseconds.
8:36:21.270 PM WARN JvmPauseMonitor Detected pause in JVM or host machine (e.g. a stop the world GC, or JVM not scheduled): paused approximately 18430ms: no GCs detected.
8:36:21.558 PM WARN EnterpriseService com.cloudera.cmf.PollingScmProxy: run duration exceeded desired period. Duration: 19230 ms. Desired period: 1000 ms.
8:36:22.253 PM INFO AggregatingTimeSeriesStore Run took PT33.916S which is over the slow run threshold of PT30S. 15690 metrics written for 28 entities. PT27.952S write time over 2 writes. Longest writes: PT27.876S,PT0.076S.
8:37:02.720 PM INFO LDBPartitionManager Updating partition=LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-12-10T17:48:01.580Z, startTime=2020-12-10T17:48:01.580Z, endTime=null, version=2, state=OPEN}. Setting endTime=2020-12-10T18:38:02.709Z Metrics Aggregation Run Duration Thresholds
critical:30000.0, warning:10000.0
present we are using default values in my cluster. any changes required for this values. else any other issue is there in my cluster.
... View more
Labels:
- Labels:
-
Cloudera Manager
12-11-2020
06:08 AM
Thanks for your response. we are configuring 60000. present it is ok
... View more
11-27-2020
11:06 PM
Like this we are getting some times HDFS canery good and some times HDFS Canary Bad
HDFS Canary Good
2 Still Concerning
Nov 27 12:15:53 PM
HDFS Canary Bad
Nov 27 12:15:08 PM
DataNode Health Concerning
Nov 27 11:58:47 AM
DataNode Health Bad
Nov 27 11:58:12 AM
DataNode Health Concerning
Nov 27 10:07:15 AM
DataNode Health Bad
Nov 27 10:07:00 AM
DataNode Health Concerning
Nov 27 9:29:35 AM
DataNode Health Bad
Nov 27 9:29:20 AM
DataNode Health Concerning
Nov 27 8:45:31 AM
DataNode Health Bad
Nov 27 8:45:06 AM
DataNode Health Concerning
Nov 26 10:03 PM
HDFS Canary Good
2 Still Bad
Nov 26 10:02:23 PM
DataNode Health Bad
Nov 26 10:02:18 PM
HDFS Canary Bad
Nov 26 10:01:42 PM
HDFS Canary Good
2 Still Concerning
Nov 26 8:01:53 PM
HDFS Canary Bad
Nov 26 8:01:03 PM
HDFS Canary Good
2 Still Concerning
Nov 26 6:16:18 PM
HDFS Canary Bad
Nov 26 6:15:38 PM
DataNode Health Concerning
Nov 26 4:45:01 PM
DataNode Health Bad We are finding this logs in service Monitor 12:06:35.706 PM INFO LDBPartitionManager
Expiring partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:35.706 PM INFO LDBPartitionMetadataStore
Setting partition state=DELETING for partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:35.717 PM INFO LDBPartitionManager
Couldn't close partition because it was already closed by another thread
12:06:35.718 PM INFO LDBPartitionMetadataStore
Deleting partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:39.374 PM INFO LDBTimeSeriesRollupManager
Running the LDBTimeSeriesRollupManager at 2020-11-27T10:06:39.374Z, forMigratedData=false
12:11:39.374 PM INFO LDBTimeSeriesRollupManager
Running the LDBTimeSeriesRollupManager at 2020-11-27T10:11:39.374Z, forMigratedData=false
12:11:39.375 PM INFO LDBTimeSeriesRollupManager
Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2020-11-27T10:10:00.000Z
12:11:41.505 PM INFO LDBTimeSeriesRollupManager
Finished rollup: duration=PT2.130S, numStreamsChecked=54046, numStreamsRolledUp=18786
12:13:40.962 PM INFO LDBResourceManager
Closed: 0 partitions 12:14:57.535 PM INFO DataStreamer
Exception in createBlockOutputStream blk_1086073148_12332434
java.net.SocketTimeoutException: 13000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.27:47442 remote=/172.27.12:9866]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:537)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1762)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
12:14:57.536 PM WARN DataStreamer
Abandoning BP-1768670017-172.-1592847899660:blk_1086073148_12332434 12:14:57.536 PM WARN DataStreamer
Abandoning BP-1768670017- -1592847899660:blk_1086073148_12332434
12:14:57.543 PM WARN DataStreamer
Excluding datanode DatanodeInfoWithStorage[172.27.129.28:9866,DS-211016d1-2920-4748-ba83-46a493759fe3,DISK]
12:15:05.558 PM INFO DataStreamer
Exception in createBlockOutputStream blk_1086073149_12332435
java.net.SocketTimeoutException: 8000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.27.129.30:56202 remote=/172.27.129.29:9866]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:537)
at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1762)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
12:15:05.559 PM WARN DataStreamer
Abandoning BP-1768670017-172.27.0-1592847899660:blk_1086073149_12332435
12:15:05.568 PM WARN DataStreamer
Excluding datanode DatanodeInfoWithStorage[172.27.:9866,DS-5696ff0f-56d5-4dab-b0c3-5fbdde410da4,DISK]
12:15:05.573 PM WARN DataStreamer this are my cluster values. we thinking this values are issue
dfs.socket.timeout : 3000
dfs.datanode.socket.write.timeout :3000
we are found internet this values like this. this is the issue are any other
dfs.socket.timeout : 60000
dfs.datanode.socket.write.timeout :480000
... View more
Labels:
- Labels:
-
Cloudera Manager