Support Questions

ask_bill_brooks · ‎11-13-2019

We have been having an issue with "Impala" in the environment where sometimes the connections between impala daemons gets refused and throws an error causing the queries to fail.
This issue has been happening intermittently in the cluster at random times during the day.

Actual Error Message:

1) Couldn't open transport for <node_address>:22000 (connect() failed: Connection refused)
Failed to create thread SenderThread(1:1) in category DataStreamSender:boost::thread_resource_error: Resource temporarily unavailable
Sender timed out waiting for receiver fragment instance: <query_id>, dest node

robbiez · ‎11-13-2019

Actually, I can see two different types of error from your error messages. "Connection refused" usually means the port 22000 was not open on the peer node. I'd like to check if the impala daemon on the peer node (<node_address>) stopped at that time. The "Resource temporarily unavailable" error was most likely related to the thread resource limits. The impala daemon couldn't create a thread due to insufficient resource so threw this error. I suggest have a look at IMPALA-5605 which should be helpful.

[1] https://github.com/apache/impala/blob/53ef115e8e5cac231ef948f8670106c348d197fe/be/src/util/thread.cc...

saihadoop · ‎11-14-2019

Thanks for the reply.

I will check that.

Also can you check the following errors and let me know why jobs are getting cancelled?

Query Status: ExecQueryFInstances rpc query_id=2e4fe80a7382061f:3ef80d1500000000 failed: RPC client failed to connect: Couldn't open transport for p1i-hdp-srv06.lnt.com:22000 (connect() failed: Connection timed out)
RPC Error: Client for p1i-hdp-srv07.lnt.com:22000 hit an unexpected exception: No more data to read., type: N6apache6thrift9transport19TTransportExceptionE, rpc: N6impala19TTransmitDataResultE, send: done

robbiez · ‎11-14-2019

The first error was that the function connect() failed because the peer impala daemon didn't accept the connection in time. You can have a look at charts on CM to check the CPU usage and number of threads in the impala daemon on the peer node for that time. Similar to "Resource temporarily unavailable", this error could also be related to CPU load or thread resource limits.

The second error means the connection was lost. You review impala daemon logs on p1i-hdp-srv07.lnt.com to look for the reason.

saihadoop · ‎11-14-2019

Thank you again

I also see jobs getting cancelled. can you give me any reason?

Query ID: 3348f74b129b0dae:1666447600000000
User: pslc_mnr_bu
Database: default
Coordinator: p1i-hdp-srv11.lnt.com
Query Type: QUERY
Query State: EXCEPTION
Start Time: Nov 14, 2019 10:12:22 AM
End Time: Nov 14, 2019 10:13:02 AM
Duration: 39.6s
Rows Produced: 0
Admission Result: Admitted immediately
Admission Wait Time: 0ms
Aggregate Peak Memory Usage: 260.5 MiB
Bytes Streamed: 46.9 MiB
Client Fetch Wait Time: 0ms
Client Fetch Wait Time Percentage: 0
Connected User: psvc_mpr_bi
Estimated per Node Peak Memory: 2.3 GiB
File Formats: PARQUET/SNAPPY
HDFS Average Scan Range: 356.3 KiB
HDFS Bytes Read: 2.0 GiB
HDFS Bytes Read From Cache: 0 B
HDFS Bytes Read From Cache Percentage: 0
HDFS Local Bytes Read: 1.0 GiB
HDFS Local Bytes Read Percentage: 50
HDFS Remote Bytes Read: 1.0 GiB
HDFS Remote Bytes Read Percentage: 50
HDFS Scanner Average Read Throughput: 412.7 MiB/s
HDFS Short Circuit Bytes Read: 1.0 GiB
HDFS Short Circuit Bytes Read Percentage: 50
Impala Version: impalad version 3.0.0-cdh6.0.1 RELEASE (build 9a74a5053de5f7b8dd983802e6d75e58d31472db)
Memory Accrual: 479,253,548 byte seconds
Memory Spilled: 0 B
Node with Peak Memory Usage: p1i-hdp-srv03.lnt.com:22000
Number of Backends: 13
Number of Query Fragments Instances: 485
Out of Memory: false
Per Node Peak Memory Usage: 220.5 MiB
Planning Wait Time: 4.55s
Planning Wait Time Percentage: 11
Query Status: Cancelled
Session ID: 3041e4d70d3697bc:efc9a3fe12cfa1a2
Session Type: HIVESERVER2
Statistics Corrupt: false
Statistics Missing: false
Threads: CPU Time: 3.6m
Threads: CPU Time Percentage: 9
Threads: Network Receive Wait Time: 13.5m
Threads: Network Receive Wait Time Percentage: 33
Threads: Network Send Wait Time: 41.99s
Threads: Network Send Wait Time Percentage: 2
Threads: Storage Wait Time: 23.5m
Threads: Storage Wait Time Percentage: 57
Threads: Total Time: 41.3m

Thanks

robbiez · ‎11-14-2019

You are welcome.

The query was cancelled due to some exception but there are no details of the exception in your query info. You can download the text query profile from CM. If you still can't see the detail in the query profile, you need to grep the query id 3348f74b129b0dae:1666447600000000 from the impala INFO log files on p1i-hdp-srv11.lnt.com. You should be able to see which query instance hit the exception. Then you can grep the instance id from the impala INFO log files on the host where the instance was running to look for the cause.

Cloudera Community

Support Questions

imapal jobs fail