Created 11-07-2024 03:38 PM
hdfs, yarn, impala, and kudu are currently in service.
If an io error occurs on one disk in one server during operation,
Is the service down?
Should I just leave it until I replace the disk?
Created 11-07-2024 10:00 PM
If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.
The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available
++++++++++++++
YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.
The NodeManager service itself will continue running as long as other disks are healthy.
++++++++++++++++
If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.
+++++++++++++
Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.
++++++++++++
You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.
Created 11-07-2024 09:29 PM
@flashone, Welcome to our community! To help you get the best possible answer, I have tagged in our experts @Kartik_Agarwal @RAGHUY who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 11-07-2024 10:00 PM
If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.
The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available
++++++++++++++
YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.
The NodeManager service itself will continue running as long as other disks are healthy.
++++++++++++++++
If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.
+++++++++++++
Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.
++++++++++++
You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.
Created 11-11-2024 02:35 AM
Thanks ~ Good Answer~