Support Questions

flashone · ‎11-07-2024

hdfs, yarn, impala, and kudu are currently in service.

If an io error occurs on one disk in one server during operation,
Is the service down?
Should I just leave it until I replace the disk?

RAGHUY · ‎11-07-2024

@flashone

If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.

The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available

++++++++++++++

YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.

The NodeManager service itself will continue running as long as other disks are healthy.

++++++++++++++++

If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.

+++++++++++++

Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.

++++++++++++

You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.

View solution in original post

VidyaSargur · ‎11-07-2024

@flashone, Welcome to our community! To help you get the best possible answer, I have tagged in our experts @Kartik_Agarwal @RAGHUY who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

RAGHUY · ‎11-07-2024

@flashone

If a disk error is detected, HDFS can mark the affected disk as failed and stop using it. HDFS DataNodes are designed to handle disk failures gracefully. If you have replication set up correctly, the data should remain accessible, though replication might temporarily increase on other nodes to compensate for the loss.

The service itself (HDFS) will usually stay operational as long as there are other healthy disks and nodes available

++++++++++++++

YARN NodeManagers can handle disk failures by marking disks as unhealthy if configured to monitor disk health, When a disk fails, NodeManager excludes that disk from the list of usable directories.

The NodeManager service itself will continue running as long as other disks are healthy.

++++++++++++++++

If Impala detects a disk I/O error, it will stop using that disk. The Impala Daemon will continue running, but queries that rely on data stored on the failed disk might fail until data can be accessed from another replica or node.

+++++++++++++

Kudu Tablet Servers monitor disk health, and if a disk fails, Kudu can mark it as failed and continue operating if there are other healthy disks. However, if the failure impacts multiple disks or replicas, it can lead to data availability issues.

++++++++++++

You can usually keep the services running if only a single disk fails and if replication is properly configured. However, it’s best to replace the failed disk promptly to avoid further risk. In HDFS and Kudu especially, losing additional disks could risk data loss or availability issues.

flashone · ‎11-11-2024

Thanks ~ Good Answer~

Cloudera Community

Support Questions

Does the service stop if a disk io error occurs during operation? Or maintain?

Install the clouder Management services caught hig...

Flume Agent Start/Stop/Restart Operations through ...

What does the 7.1.7 Long Term Service (LTS) mean?

Resolution of Failed Knox Gateway Start During CDP...

Flume + HDFS IO error + ConnectException

Error during add datanode service

More Hadoop nodes = faster IO and processing time?

How to hotswap Data node hard disk without stoppin...

When enabling Kerberos using the wizard, an error ...

Scripted start / stop of HDP services