Member since
06-13-2016
17
Posts
3
Kudos Received
0
Solutions
07-25-2017
11:00 PM
I've explored idle query timeout and idle session out before itself and using in prod cluster as well. Those timeouts are not related to idle connections. Idle session timeout is taking care of only session, but still connection is established and being counted under active connections in use. On reading about SocketTimeout option in jdbc driver, it clearly said it will take care of idle connections. Hence tried it out, but doesn't seem to work. Most of users here, using sql workbench through jdbc driver keeps opening new tab for every new query and don't close their tab for the whole day. I assumed sockettimeout will solve this problem.
... View more
07-25-2017
05:33 AM
Hi, Had come across SocketTimeout option in jdbc driver and tried it out to see if helps in taking care of idle connections. It is not working as expected, even default value is not working. I am using ImpalaJDBC_2.5.24.1043 version (free) and using CDH 5.7.0. Is anyone else using SocketTimeout option comes across this issue? The reason for looking out this option is, had come across a situation, wherein client program waiting for so long to get the connection from impala, almost close to 2 hrs. No exception in logs, client etc but it keeps on waiting. During this moment, total front end connections in use didn't even reached 50% of the total front end connections (from cloudera manager graph). Restarting impalad's helped to recover this situation. Hence, thought of exploring some connection timeout option as an attempt to see whether control comes out as soon as it reaches max threshold time while fetching the connections, instead of waiting indefinitely. While doing this, come across above socket timeout option and tried out as it was also an another requirement for me to clear idle connections. Thanks, Mani
... View more
Labels:
- Labels:
-
Apache Impala
07-25-2016
06:01 AM
Hello Everyone, I am running a cluster with 2 impalads, with memory capacity of 60GB for each impalad. I am using CDH 5.7 package and managing the cluster using CM. I had come across an query with below error - Memory limit exceeded Cannot perform hash join at node with id 2. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 2. Number of rows 53969358. Query: SELECT `dim_experiment`.`experiment_name` AS `experiment_name`
FROM `gwynniebee_bi`.`fact_recommendation_events` `fact_recommendatio`
LEFT OUTER JOIN `gwynniebee_bi`.`dim_experiment` `dim_experiment` ON (`fact_recommendatio`.`experiment_key` = `dim_experiment`.`experiment_key`)
GROUP BY 1 Profile: ----------------
Estimated Per-Host Requirements: Memory=2.05GB VCores=3
WARNING: The following tables are missing relevant table and/or column statistics.
gwynniebee_bi.fact_recommendation_events
09:MERGING-EXCHANGE [UNPARTITIONED]
| order by: dim_experiment.experiment_name ASC
| hosts=1 per-host-mem=unavailable
| tuple-ids=3 row-size=38B cardinality=41
|
04:SORT
| order by: dim_experiment.experiment_name ASC
| hosts=1 per-host-mem=16.00MB
| tuple-ids=3 row-size=38B cardinality=41
|
08:AGGREGATE [FINALIZE]
| group by: dim_experiment.experiment_name
| hosts=1 per-host-mem=10.00MB
| tuple-ids=2 row-size=38B cardinality=41
|
07:EXCHANGE [HASH(dim_experiment.experiment_name)]
| hosts=1 per-host-mem=0B
| tuple-ids=2 row-size=38B cardinality=41
|
03:AGGREGATE [STREAMING]
| group by: dim_experiment.experiment_name
| hosts=1 per-host-mem=10.00MB
| tuple-ids=2 row-size=38B cardinality=41
|
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
| hash predicates: dim_experiment.experiment_key = fact_recommendatio.experiment_key
| runtime filters: RF000 <- fact_recommendatio.experiment_key
| hosts=1 per-host-mem=2.00GB
| tuple-ids=1N,0 row-size=46B cardinality=unavailable
|
|--06:EXCHANGE [HASH(fact_recommendatio.experiment_key)]
| | hosts=2 per-host-mem=0B
| | tuple-ids=0 row-size=4B cardinality=unavailable
| |
| 00:SCAN HDFS [gwynniebee_bi.fact_recommendation_events fact_recommendatio, RANDOM]
| partitions=1/1 files=90 size=3.96GB
| table stats: unavailable
| column stats: unavailable
| hosts=2 per-host-mem=56.00MB
| tuple-ids=0 row-size=4B cardinality=unavailable
|
05:EXCHANGE [HASH(dim_experiment.experiment_key)]
| hosts=1 per-host-mem=0B
| tuple-ids=1 row-size=42B cardinality=78
|
01:SCAN HDFS [gwynniebee_bi.dim_experiment dim_experiment, RANDOM]
partitions=1/1 files=1 size=10.56KB
runtime filters: RF000 -> dim_experiment.experiment_key
table stats: 78 rows total
column stats: all
hosts=1 per-host-mem=32.00MB
tuple-ids=1 row-size=42B cardinality=78 I was trying to correlate this issue with available memory, in use etc with CM metrics like tcmalloc_bytes_in_use, mem_rss etc. Around sametime, I am seeing very less usage on memory being used for impalad process. Below graph shows the above metrics around the sametime (when query had memory error) Am I missing any other metric to look for? Please share your thoughts. Thanks, Mani
... View more
Labels: