Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error communicating with impalad: TSocket read 0 bytes

avatar
Contributor

Hi,

 

When I query from impala-shell a table (select * from internet_usages limit 1) sometimes I get error.

Error communicating with impalad: TSocket read 0 bytes

 

After some time (half an hour or more) the query runs properly.

Sometimes I restart Impala, but it does not always help. In fact I am not sure if it helps at all.

Postgres database and pgpool look to work all time without problems.

 

Bellow impala logs when query finish with error and when finish proparly. I marked rows when queries start to differ.

 

Best regards

Andrzej

 

When I get an error there are following messages in impala logs.

II1123 12:41:34.452467 12473 impala-beeswax-server.cc:175] query(): query=select * from internet_usages limit 2
I1123 12:41:34.452606 12473 impala-beeswax-server.cc:519] query: Query {
01: query (string) = "select * from internet_usages limit 2",
03: configuration (list) = list<string>[0] {
},
04: hadoop_user (string) = "etl_user",
}
I1123 12:41:34.453100 12473 RequestPoolService.java:277] resolveRequestPool(pool=, user=etl_user): resolved_pool=root.etl_user, has_access=true
I1123 12:41:34.453505 12473 RequestPoolService.java:375] getPoolConfig(pool=root.etl_user): max_mem_resources=-1, max_requests=-1, max_queued=200, queue_timeout_ms=0, default_query_options=
I1123 12:41:34.453732 12473 impala-beeswax-server.cc:549] TClientRequest.queryOptions: TQueryOptions {
01: abort_on_error (bool) = false,
02: max_errors (i32) = 0,
03: disable_codegen (bool) = false,
04: batch_size (i32) = 0,
05: num_nodes (i32) = 0,
06: max_scan_range_length (i64) = 0,
07: num_scanner_threads (i32) = 0,
08: max_io_buffers (i32) = 0,
09: allow_unsupported_formats (bool) = false,
10: default_order_by_limit (i64) = -1,
11: debug_action (string) = "",
12: mem_limit (i64) = 0,
13: abort_on_default_limit_exceeded (bool) = false,
15: hbase_caching (i32) = 0,
16: hbase_cache_blocks (bool) = false,
17: parquet_file_size (i64) = 0,
18: explain_level (i32) = 1,
19: sync_ddl (bool) = false,
23: disable_cached_reads (bool) = false,
24: disable_outermost_topn (bool) = false,
25: rm_initial_mem (i64) = 0,
26: query_timeout_s (i32) = 0,
28: appx_count_distinct (bool) = false,
29: disable_unsafe_spills (bool) = false,
31: exec_single_node_rows_threshold (i32) = 100,
32: optimize_partition_key_scans (bool) = false,
33: replica_preference (i32) = 0,
34: schedule_random_replica (bool) = false,
35: scan_node_codegen_threshold (i64) = 1800000,
36: disable_streaming_preaggregations (bool) = false,
37: runtime_filter_mode (i32) = 2,
38: runtime_bloom_filter_size (i32) = 1048576,
39: runtime_filter_wait_time_ms (i32) = 0,
40: disable_row_runtime_filtering (bool) = false,
41: max_num_runtime_filters (i32) = 10,
42: parquet_annotate_strings_utf8 (bool) = false,
43: parquet_fallback_schema_resolution (i32) = 0,
44: mt_num_cores (i32) = 1,
45: s3_skip_insert_staging (bool) = true,
46: runtime_filter_min_size (i32) = 1048576,
47: runtime_filter_max_size (i32) = 16777216,
48: prefetch_mode (i32) = 1,
49: strict_mode (bool) = false,
}
I1123 12:41:34.457437 12473 Frontend.java:878] analyze query select * from internet_usages limit 2
I1123 12:41:34.459012 12473 Frontend.java:822] Requesting prioritized load of table(s): default.internet_usages
I1123 12:43:32.765156 30575 webserver.cc:365] Webserver: error reading: Resource temporarily unavailable
I1123 12:43:34.472626 12473 Frontend.java:897] Missing tables were not received in 120000ms. Load request will be retried.
I1123 12:43:34.475009 12473 Frontend.java:822] Requesting prioritized load of table(s): default.internet_usages
I1123 12:45:14.323235 13498 webserver.cc:365] Webserver: error reading: Resource temporarily unavailable
I1123 12:45:14.323235 30574 webserver.cc:365] Webserver: error reading: Resource temporarily unavailable
I1123 12:45:14.323247 13497 webserver.cc:365] Webserver: error reading: Resource temporarily unavailable
I1123 12:45:14.324121 30878 webserver.cc:365] Webserver: error reading: Resource temporarily unavailable
I1123 12:45:34.488405 12473 Frontend.java:897] Missing tables were not received in 120000ms. Load request will be retried.
I1123 12:45:34.490772 12473 Frontend.java:822] Requesting prioritized load of table(s): default.internet_usages
I1123 12:47:34.504266 12473 Frontend.java:897] Missing tables were not received in 120000ms. Load request will be retried.
I1123 12:47:34.506804 12473 Frontend.java:822] Requesting prioritized load of table(s): default.internet_usages

 

When query runs proparly I get the follwoing impala log:

I1123 12:04:30.549031 12472 impala-beeswax-server.cc:175] query(): query=select * from internet_usages limit 1
I1123 12:04:30.549149 12472 impala-beeswax-server.cc:519] query: Query {
01: query (string) = "select * from internet_usages limit 1",
03: configuration (list) = list<string>[0] {
},
04: hadoop_user (string) = "etl_user",
}
I1123 12:04:30.551203 12472 RequestPoolService.java:277] resolveRequestPool(pool=, user=etl_user): resolved_pool=root.etl_user, has_access=true
I1123 12:04:30.551853 12472 RequestPoolService.java:375] getPoolConfig(pool=root.etl_user): max_mem_resources=-1, max_requests=-1, max_queued=200, queue_timeout_ms=0, default_query_options=
I1123 12:04:30.552089 12472 impala-beeswax-server.cc:549] TClientRequest.queryOptions: TQueryOptions {
01: abort_on_error (bool) = false,
02: max_errors (i32) = 0,
03: disable_codegen (bool) = false,
04: batch_size (i32) = 0,
05: num_nodes (i32) = 0,
06: max_scan_range_length (i64) = 0,
07: num_scanner_threads (i32) = 0,
08: max_io_buffers (i32) = 0,
09: allow_unsupported_formats (bool) = false,
10: default_order_by_limit (i64) = -1,
11: debug_action (string) = "",
12: mem_limit (i64) = 0,
13: abort_on_default_limit_exceeded (bool) = false,
15: hbase_caching (i32) = 0,
16: hbase_cache_blocks (bool) = false,
17: parquet_file_size (i64) = 0,
18: explain_level (i32) = 1,
19: sync_ddl (bool) = false,
23: disable_cached_reads (bool) = false,
24: disable_outermost_topn (bool) = false,
25: rm_initial_mem (i64) = 0,
26: query_timeout_s (i32) = 0,
28: appx_count_distinct (bool) = false,
29: disable_unsafe_spills (bool) = false,
31: exec_single_node_rows_threshold (i32) = 100,
32: optimize_partition_key_scans (bool) = false,
33: replica_preference (i32) = 0,
34: schedule_random_replica (bool) = false,
35: scan_node_codegen_threshold (i64) = 1800000,
36: disable_streaming_preaggregations (bool) = false,
37: runtime_filter_mode (i32) = 2,
38: runtime_bloom_filter_size (i32) = 1048576,
39: runtime_filter_wait_time_ms (i32) = 0,
40: disable_row_runtime_filtering (bool) = false,
41: max_num_runtime_filters (i32) = 10,
42: parquet_annotate_strings_utf8 (bool) = false,
43: parquet_fallback_schema_resolution (i32) = 0,
44: mt_num_cores (i32) = 1,
45: s3_skip_insert_staging (bool) = true,
46: runtime_filter_min_size (i32) = 1048576,
47: runtime_filter_max_size (i32) = 16777216,
48: prefetch_mode (i32) = 1,
49: strict_mode (bool) = false,
}
I1123 12:04:30.556161 12472 Frontend.java:878] analyze query select * from internet_usages limit 1
I1123 12:04:30.560166 12472 Frontend.java:956] create plan
I1123 12:04:31.458932 12472 HdfsScanNode.java:365] collecting partitions for table internet_usages
I1123 12:04:31.488257 12472 HdfsScanNode.java:422] cardinality_=3488886438 sel=1.0
I1123 12:04:31.488364 12472 HdfsScanNode.java:429] computeStats HdfsScan: cardinality_=3488886438
I1123 12:04:31.488493 12472 HdfsScanNode.java:485] computeNumNodes totalRanges=1160171 localRanges=6 remoteRanges=0 localHostSet.size=6 clusterNodes=6
I1123 12:04:31.488543 12472 HdfsScanNode.java:432] computeStats HdfsScan: #nodes=6
I1123 12:04:31.488672 12472 HdfsScanNode.java:365] collecting partitions for table internet_usages
I1123 12:04:31.516621 12472 HdfsScanNode.java:422] cardinality_=3488886438 sel=1.0
I1123 12:04:31.516707 12472 HdfsScanNode.java:429] computeStats HdfsScan: cardinality_=1
I1123 12:04:31.516811 12472 HdfsScanNode.java:485] computeNumNodes totalRanges=1160171 localRanges=6 remoteRanges=0 localHostSet.size=6 clusterNodes=6
I1123 12:04:31.516849 12472 HdfsScanNode.java:432] computeStats HdfsScan: #nodes=6
I1123 12:04:31.517246 12472 Planner.java:137] desctbl: tuples:
TupleDescriptor{id=0, name=BaseTableRef default.internet_usages, tbl=default.internet_usages, byte_size=72, is_materialized=true, slots=[SlotDescriptor{id=0, path=default.internet_usages.id, type=BIGINT, materialized=true, byteSize=8, byteOffset=8, nullable=true, nullIndicatorByte=0, nullIndicatorBit=2, slotIdx=2, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=3594411008, numNulls_=-1}}, SlotDescriptor{id=1, path=default.internet_usages.purchase_id, type=BIGINT, materialized=true, byteSize=8, byteOffset=16, nullable=true, nullIndicatorByte=0, nullIndicatorBit=3, slotIdx=3, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=14064324, numNulls_=-1}}, SlotDescriptor{id=2, path=default.internet_usages.current_usage, type=BIGINT, materialized=true, byteSize=8, byteOffset=24, nullable=true, nullIndicatorByte=0, nullIndicatorBit=4, slotIdx=4, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=118118152, numNulls_=-1}}, SlotDescriptor{id=3, path=default.internet_usages.created_at, type=TIMESTAMP, materialized=true, byteSize=16, byteOffset=40, nullable=true, nullIndicatorByte=0, nullIndicatorBit=6, slotIdx=6, stats=ColumnStats{avgSerializedSize_=16.0, maxSize_=16, numDistinct_=3523511296, numNulls_=-1}}, SlotDescriptor{id=4, path=default.internet_usages.updated_at, type=TIMESTAMP, materialized=true, byteSize=16, byteOffset=56, nullable=true, nullIndicatorByte=0, nullIndicatorBit=7, slotIdx=7, stats=ColumnStats{avgSerializedSize_=16.0, maxSize_=16, numDistinct_=3523511296, numNulls_=-1}}, SlotDescriptor{id=5, path=default.internet_usages.enabled_service_span_id, type=BIGINT, materialized=true, byteSize=8, byteOffset=32, nullable=true, nullIndicatorByte=0, nullIndicatorBit=5, slotIdx=5, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=349686, numNulls_=-1}}, SlotDescriptor{id=6, path=default.internet_usages.purchase_id_mod_100_part, type=TINYINT, materialized=true, byteSize=1, byteOffset=1, nullable=true, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=1.0, maxSize_=1, numDistinct_=100, numNulls_=0}}, SlotDescriptor{id=7, path=default.internet_usages.month_number_part, type=TINYINT, materialized=true, byteSize=1, byteOffset=2, nullable=true, nullIndicatorByte=0, nullIndicatorBit=1, slotIdx=1, stats=ColumnStats{avgSerializedSize_=1.0, maxSize_=1, numDistinct_=12, numNulls_=0}}]}
I1123 12:04:31.517395 12472 Planner.java:138] resultexprs: SlotRef{path=id, type=BIGINT, id=0} SlotRef{path=purchase_id, type=BIGINT, id=1} SlotRef{path=current_usage, type=BIGINT, id=2} SlotRef{path=created_at, type=TIMESTAMP, id=3} SlotRef{path=updated_at, type=TIMESTAMP, id=4} SlotRef{path=enabled_service_span_id, type=BIGINT, id=5} SlotRef{path=purchase_id_mod_100_part, type=TINYINT, id=6} SlotRef{path=month_number_part, type=TINYINT, id=7}
I1123 12:04:31.517426 12472 Planner.java:139] finalize plan fragments
I1123 12:04:31.518618 12472 Frontend.java:999] get scan range locations
I1123 12:04:31.518740 12472 Planner.java:339] Estimated per-host peak memory requirement: 0
I1123 12:04:31.518777 12472 Planner.java:340] Estimated per-host virtual cores requirement: 0
I1123 12:04:31.522639 12472 Frontend.java:1077] create result set metadata
I1123 12:04:31.522734 12472 JniFrontend.java:151] Estimated Per-Host Requirements: Memory=0B VCores=0

00:SCAN HDFS [default.internet_usages]
partitions=1200/1200 files=1160171 size=131.63GB
table stats: 3488886438 rows total
column stats: all
limit: 1
hosts=6 per-host-mem=unavailable
tuple-ids=0 row-size=66B cardinality=1
I1123 12:04:35.428925 12472 RequestPoolService.java:277] resolveRequestPool(pool=, user=etl_user): resolved_pool=root.etl_user, has_access=true
I1123 12:04:36.630676 12472 RequestPoolService.java:375] getPoolConfig(pool=root.etl_user): max_mem_resources=-1, max_requests=-1, max_queued=200, queue_timeout_ms=0, default_query_options=
I1123 12:04:36.630795 12472 admission-controller.cc:439] Schedule for id=bd4414f7d03e553f:e25876e000000000 in pool_name=root.etl_user cluster_mem_needed=4.00 GB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B
I1123 12:04:36.630810 12472 admission-controller.cc:444] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0)
I1123 12:04:36.630816 12472 admission-controller.cc:450] Admitted query id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:36.630884 12472 coordinator.cc:434] Exec() query_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:37.247411 12472 plan-fragment-executor.cc:98] Prepare(): query_id=bd4414f7d03e553f:e25876e000000000 instance_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:37.401240 12472 plan-fragment-executor.cc:215] descriptor table for fragment=bd4414f7d03e553f:e25876e000000000
tuples:
Tuple(id=0 size=72 slots=[Slot(id=0 type=BIGINT col_path=[2] offset=8 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT col_path=[3] offset=16 null=(offset=0 mask=8) slot_idx=3 field_idx=-1), Slot(id=2 type=BIGINT col_path=[4] offset=24 null=(offset=0 mask=10) slot_idx=4 field_idx=-1), Slot(id=3 type=TIMESTAMP col_path=[5] offset=40 null=(offset=0 mask=40) slot_idx=6 field_idx=-1), Slot(id=4 type=TIMESTAMP col_path=[6] offset=56 null=(offset=0 mask=80) slot_idx=7 field_idx=-1), Slot(id=5 type=BIGINT col_path=[7] offset=32 null=(offset=0 mask=20) slot_idx=5 field_idx=-1), Slot(id=6 type=TINYINT col_path=[0] offset=1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=7 type=TINYINT col_path=[1] offset=2 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
I1123 12:04:37.401443 12472 runtime-state.cc:215] Error from query bd4414f7d03e553f:e25876e000000000: Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata.
I1123 12:04:40.887898 12472 status.cc:114] Disabled by query option.
@ 0x84eab9 (unknown)
@ 0xc7fef5 (unknown)
@ 0xc63c3a (unknown)
@ 0xdfb5c0 (unknown)
@ 0xde8859 (unknown)
@ 0xb268b5 (unknown)
@ 0xb29344 (unknown)
@ 0xad48f6 (unknown)
@ 0xada168 (unknown)
@ 0xb197ac (unknown)
@ 0xd560e5 (unknown)
@ 0xd593e4 (unknown)
@ 0x81c3bc (unknown)
@ 0x1b514fb (unknown)
@ 0x1b39029 (unknown)
@ 0xa0efc9 (unknown)
@ 0xa0fa22 (unknown)
@ 0xbf6069 (unknown)
@ 0xbf6a04 (unknown)
@ 0xe5cd2a (unknown)
@ 0x7f80961b3dc5 start_thread
@ 0x7f8095ee121d __clone
I1123 12:04:41.080906 20576 plan-fragment-executor.cc:327] Open(): instance_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:41.185772 12472 impala-beeswax-server.cc:310] get_results_metadata(): query_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.239852 12472 plan-fragment-executor.cc:485] Finished executing fragment query_id=bd4414f7d03e553f:e25876e000000000 instance_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.241802 12472 plan-fragment-executor.cc:485] Finished executing fragment query_id=bd4414f7d03e553f:e25876e000000000 instance_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.241833 12472 data-stream-mgr.cc:258] cancelling all streams for fragment=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.241852 12472 coordinator.cc:1078] All backends finished successfully.
I1123 12:04:43.246424 12472 impala-beeswax-server.cc:352] close(): query_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.246454 12472 impala-server.cc:920] UnregisterQuery(): query_id=bd4414f7d03e553f:e25876e000000000
I1123 12:04:43.246461 12472 impala-server.cc:1006] Cancel(): query_id=bd4414f7d03e553f:e25876e000000000

 

When I get an error there in hive metastore logs last messages are:

2017-11-23 12:24:33,230 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-92]: ugi=impala      ip=10.113.8.167 cmd=source:10.113.8.167 get_partitions_by_names : db=default tbl=internet_usages

2017-11-23 12:24:52,951 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [pool-4-thread-92]: </PERFLOG method=get_partitions_by_names start=1511439873230 end=1511439892951 duration=19721 from=org.apache.hadoop.hive.metastore.RetryingHMSHandler threadId=92 retryCount=0 error=false>

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

The problem was probably caused by a table with over 1 mln small files in hdfs. When invalidate metadata command was started in impala refreshing metadata process lasted over 15 min. After compacting table problem disappeared.

View solution in original post

2 REPLIES 2

avatar
Champion

@Szczechla

 

The similar issue (Resource temporarily unavailable error message from Impala shell) has been discussed in the below link. It may help you.

 

Note: Your log shows the issue is due to Resource temporarily unavailable 

 

https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Resource-temporarily-unavailable-error...

avatar
Contributor

The problem was probably caused by a table with over 1 mln small files in hdfs. When invalidate metadata command was started in impala refreshing metadata process lasted over 15 min. After compacting table problem disappeared.