Support Questions

karthik_07a · ‎05-11-2022

Hi

We run Cloudera based hadoop cluster - 5.11 (not hortonworks) and currently we added 5 new impala daemon nodes. (Impala version 2.8). After adding new nodes, things looked fine ,but after 7-8 hours we are getting below errors when the impala co-ordinator tried to connect to the newly added nodes. Please help resolving it as we are blocked in production

Error : Sender timed out waiting for receiver fragment instance

Detailed error :

I0506 20:16:55.660058 72446 coordinator.cc:1417] CancelFragmentInstances() query_id=d6447c0a5ed591c4:47ac776800000000, tried to cancel 35 fragment instances

I0506 20:16:55.663775 72446 coordinator.cc:756] Query id=d6447c0a5ed591c4:47ac776800000000 failed because fragment id=d6447c0a5ed591c4:47ac776800000006 on host=hadoop-slave22.use1.data.ripple.com:22000 failed.

I0506 20:16:55.664430 72093 coordinator.cc:1060] All fragment instances finished due to one or more errors.

Sender timed out waiting for receiver fragment instance: d6447c0a5ed591c4:47ac77680000001b

I0506 20:16:55.664465 72093 coordinator.cc:1031] Finalizing query: d6447c0a5ed591c4:47ac776800000000

I0506 20:16:55.664490 72093 coordinator.cc:1044] Removing staging directory: hdfs://USEast/user/hive/warehouse/qualitydb/Parquet/meta_parsed_exchanges/_impala_insert_staging/d6447c0a5ed591c4_47ac776800000000/

I0506 20:16:55.666432 72030 data-stream-mgr.cc:226] DeregisterRecvr(): fragment_instance_id=d6447c0a5ed591c4:47ac776800000015, node=1

I0506 20:16:55.666458 72029 data-stream-mgr.cc:226] DeregisterRecvr(): fragment_instance_id=d6447c0a5ed591c4:47ac77680000001b, node=1

I0506 20:16:55.668644 72030 fragment-mgr.cc:99] PlanFragment completed. instance_id=d6447c0a5ed591c4:47ac776800000015

I0506 20:16:55.669879 72029 fragment-mgr.cc:99] PlanFragment completed. instance_id=d6447c0a5ed591c4:47ac77680000001b

I0506 20:16:55.955592 64026 impala-beeswax-server.cc:230] close(): query_id=d6447c0a5ed591c4:47ac776800000000

I0506 20:16:55.955613 64026 impala-server.cc:906] UnregisterQuery(): query_id=d6447c0a5ed591c4:47ac776800000000

I0506 20:16:55.955619 64026 impala-server.cc:994] Cancel(): query_id=d6447c0a5ed591c4:47ac776800000000

quanlong · ‎05-11-2022

CDH 5.11 and Impala 2.8 is pretty old. You should try the latest versions.

In the logs, there is one failed fragment instance and one timeout fragment instance.

The failed one is on host hadoop-slave22.use1.data.ripple.com. You should check impalad logs on it for more details.
The timeout one has instance_id=d6447c0a5ed591c4:47ac77680000001b. The first part (d6447c0a5ed591c4) is the same for this query. The last part (47ac77680000001b) is the id of the fragment instance. You can check previous logs to see where this instance is scheduled. Then check impalad logs of the scheduled host. Usually this might due to network saturation. Later Impala versions have more RPC improvements, e.g. KRPC.

Cloudera Community

Support Questions

Impalad error : Sender timed out waiting for receiver fragment instance