- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Does the service stop if a disk io error occurs during operation? Or maintain?
- Labels:
-
Cloudera Manager
Created 11-07-2024 03:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hdfs, yarn, impala, and kudu are currently in service.
If an io error occurs on one disk in one server during operation,
Is the service down?
Should I just leave it until I replace the disk?
Created 11-07-2024 10:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.
The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available
++++++++++++++
YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.
The NodeManager service itself will continue running as long as other disks are healthy.
++++++++++++++++
If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.
+++++++++++++
Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.
++++++++++++
You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.
Created 11-07-2024 09:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@flashone, Welcome to our community! To help you get the best possible answer, I have tagged in our experts @Kartik_Agarwal @RAGHUY who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 11-07-2024 10:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.
The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available
++++++++++++++
YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.
The NodeManager service itself will continue running as long as other disks are healthy.
++++++++++++++++
If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.
+++++++++++++
Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.
++++++++++++
You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.
Created 11-11-2024 02:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks ~ Good Answer~
