Support Questions

Find answers, ask questions, and share your expertise

HDFS_CANARY_HEALTH has become bad: Canary test failed to write file in directory /temp/.cloudera_health_monitoring_canary_files.

avatar
Explorer

 

Like this we are getting  some times HDFS canery good and some times HDFS Canary Bad


HDFS Canary Good
2 Still Concerning
Nov 27 12:15:53 PM
 	
HDFS Canary Bad
Nov 27 12:15:08 PM
 	
DataNode Health Concerning
Nov 27 11:58:47 AM
 	
DataNode Health Bad
Nov 27 11:58:12 AM
 	
DataNode Health Concerning
Nov 27 10:07:15 AM
 	
DataNode Health Bad
Nov 27 10:07:00 AM
 	
DataNode Health Concerning
Nov 27 9:29:35 AM
 	
DataNode Health Bad
Nov 27 9:29:20 AM
 	
DataNode Health Concerning
Nov 27 8:45:31 AM
 	
DataNode Health Bad
Nov 27 8:45:06 AM
 	
DataNode Health Concerning
Nov 26 10:03 PM
 	
HDFS Canary Good
2 Still Bad
Nov 26 10:02:23 PM
 	
DataNode Health Bad
Nov 26 10:02:18 PM
 	
HDFS Canary Bad
Nov 26 10:01:42 PM
 	
HDFS Canary Good
2 Still Concerning
Nov 26 8:01:53 PM
 	
HDFS Canary Bad
Nov 26 8:01:03 PM
 	
HDFS Canary Good
2 Still Concerning
Nov 26 6:16:18 PM
 	
HDFS Canary Bad
Nov 26 6:15:38 PM
 	
DataNode Health Concerning
Nov 26 4:45:01 PM
 	
DataNode Health Bad

 

 

 

We are finding this logs in service Monitor

 

 

 

 

12:06:35.706 PM INFO    LDBPartitionManager 
Expiring partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:35.706 PM INFO    LDBPartitionMetadataStore   
Setting partition state=DELETING for partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:35.717 PM INFO    LDBPartitionManager 
Couldn't close partition because it was already closed by another thread
12:06:35.718 PM INFO    LDBPartitionMetadataStore   
Deleting partition LDBPartitionMetadataWrapper{tableName=stream, partitionName=stream_2020-11-24T10:05:30.100Z, startTime=2020-11-24T10:05:30.100Z, endTime=2020-11-24T10:55:30.100Z, version=2, state=CLOSED}
12:06:39.374 PM INFO    LDBTimeSeriesRollupManager  
Running the LDBTimeSeriesRollupManager at 2020-11-27T10:06:39.374Z, forMigratedData=false
12:11:39.374 PM INFO    LDBTimeSeriesRollupManager  
Running the LDBTimeSeriesRollupManager at 2020-11-27T10:11:39.374Z, forMigratedData=false
12:11:39.375 PM INFO    LDBTimeSeriesRollupManager  
Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2020-11-27T10:10:00.000Z
12:11:41.505 PM INFO    LDBTimeSeriesRollupManager  
Finished rollup: duration=PT2.130S, numStreamsChecked=54046, numStreamsRolledUp=18786
12:13:40.962 PM INFO    LDBResourceManager  
Closed: 0 partitions

 

 

 

12:14:57.535 PM INFO    DataStreamer    
Exception in createBlockOutputStream blk_1086073148_12332434
java.net.SocketTimeoutException: 13000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.27:47442 remote=/172.27.12:9866]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:537)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1762)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
12:14:57.536 PM WARN    DataStreamer    
Abandoning BP-1768670017-172.-1592847899660:blk_1086073148_12332434

 

 

 

12:14:57.536 PM WARN    DataStreamer    
Abandoning BP-1768670017-    -1592847899660:blk_1086073148_12332434
12:14:57.543 PM WARN    DataStreamer    
Excluding datanode DatanodeInfoWithStorage[172.27.129.28:9866,DS-211016d1-2920-4748-ba83-46a493759fe3,DISK]
12:15:05.558 PM INFO    DataStreamer    
Exception in createBlockOutputStream blk_1086073149_12332435
java.net.SocketTimeoutException: 8000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.27.129.30:56202 remote=/172.27.129.29:9866]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:537)
    at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1762)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
12:15:05.559 PM WARN    DataStreamer    
Abandoning BP-1768670017-172.27.0-1592847899660:blk_1086073149_12332435
12:15:05.568 PM WARN    DataStreamer    
Excluding datanode DatanodeInfoWithStorage[172.27.:9866,DS-5696ff0f-56d5-4dab-b0c3-5fbdde410da4,DISK]
12:15:05.573 PM WARN    DataStreamer    

 

   

 

 

this are my cluster values. we thinking this values are issue

dfs.socket.timeout : 3000
dfs.datanode.socket.write.timeout :3000



we are found internet this values like this.  this is the issue are any other

dfs.socket.timeout : 60000
dfs.datanode.socket.write.timeout :480000

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Guru

@Raj77 I agree with your analysis you can give it a try. Even for some cases I have seen this value set to very high like below:

 

HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml

dfs.client.socket-timeout
3000000

dfs.datanode.socket.write.timeout		
3000000

 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

2 REPLIES 2

avatar
Master Guru

@Raj77 I agree with your analysis you can give it a try. Even for some cases I have seen this value set to very high like below:

 

HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml

dfs.client.socket-timeout
3000000

dfs.datanode.socket.write.timeout		
3000000

 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Explorer

Thanks for your response.  we are configuring 60000.  present it is ok