About Fenton

Fenton · ‎10-14-2020

Thanks for your input. We are running our stage cluster in "non-production mode" using the embedded postgres. This postgres is also used by the two hive metastore servers. The postgres db is hosted on the same node as the other cloudera services. When this host now freezes the impala insert queries freeze as well. We were surprised to see that there seems to be no timeout from the hive metastore servers and their backing db (postgres) and no error either. This probably also happens when backed by an external postgres or mysql database, although not tested by us. I wonder if this might be solved by a newer CDH version. We are currently looking into upgrading and would like to do so for other reasons very much so.

Fenton · ‎10-07-2020

We noticed that when the Cloudera management node is frozen in our cluster that our impala DML queries don't finish anymore and are simply stuck. We did not expect that. This is reproducable when freezing the management node with lxc-freeze but also noticed the same behavior when the node is not available for another reason. We assumed impala just needs statestore, metastore and hive metastore to be happy. But for some reason it insert statements simply get stuck without any notice of what might be going on. Select queries are still running fine. We can't see anything in the logs of catalogd, statestore or hive metastore. All of those services and all impalad run on different nodes from the one being frozen. Is there a hidden dependency from Impala to any of those services that could prevent DMLs from finishing? Cloudera Management Service Activity Monitor Cloudera Management Service Alert Publisher Cloudera Management Service Event Server Cloudera Management Service Host Monitor Cloudera Management Service Service Monitor Those are the only services running on the frozen node. Our cluster looks something like this: node H1: Cloudera Management Service Activity Monitor Cloudera Management Service Alert Publisher Cloudera Management Service Event Server Cloudera Management Service Host Monitor Cloudera Management Service Service Monitor node H2 - H<n>: Catalog Service Statestore HiveServer2 HDFS JournalNode HDFS NameService ... node H<n>-H<m> impalad HDFS Datanode ... When comparing the query output from impalad between frozen cloudera mangement node state and unfrozen state it seems like the difference seems to be in the cleanup process normal insert: not frozen: I1007 20:00:59.769767 49261 admission-controller.cc:440] Schedule for id=a94b13644229aa92:dd551bc800000000 in pool_name=root.impala-default cluster_mem_needed=6.00 GB PoolConfig: I1007 20:00:59.769858 49261 admission-controller.cc:451] Admitted query id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.769892 49261 coordinator.cc:441] Exec() query_id=a94b13644229aa92:dd551bc800000000 stmt=INSERT INTO `sb_sandbox`.`continues_insert_test` VALUES (0) I1007 20:00:59.770063 49261 coordinator.cc:592] starting 1 fragment instances for query a94b13644229aa92:dd551bc800000000 I1007 20:00:59.771142 17639 fragment-mgr.cc:40] ExecPlanFragment() instance_id=a94b13644229aa92:dd551bc800000000 coord=h9.yieldlab.lan:22000 I1007 20:00:59.771531 48326 plan-fragment-executor.cc:119] Prepare(): query_id=a94b13644229aa92:dd551bc800000000 instance_id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772028 49261 coordinator.cc:630] started 1 fragment instances for query a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772033 48326 plan-fragment-executor.cc:175] descriptor table for fragment=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772241 48326 plan-fragment-executor.cc:300] Open(): instance_id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772488 49261 impala-server.cc:895] Query a94b13644229aa92:dd551bc800000000 has timeout of 2m I1007 20:00:59.955390 18884 coordinator.cc:1536] Fragment instance completed: id=a94b13644229aa92:dd551bc800000000 host=h9.yieldlab.lan:22000 remaining=0 I1007 20:00:59.955576 48328 coordinator.cc:1031] Finalizing query: a94b13644229aa92:dd551bc800000000 I1007 20:00:59.955627 48326 fragment-mgr.cc:99] PlanFragment completed. instance_id=a94b13644229aa92:dd551bc800000000 .b/sb_sandbox/continues_insert_test/_impala_insert_staging/a94b13644229aa92_dd551bc800000000/ I1007 20:00:59.769767 49261 admission-controller.cc:440] Schedule for id=a94b13644229aa92:dd551bc800000000 in pool_name=root.impala-default cluster_mem_needed=6.00 GB PoolConfig: I1007 20:00:59.769858 49261 admission-controller.cc:451] Admitted query id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.769892 49261 coordinator.cc:441] Exec() query_id=a94b13644229aa92:dd551bc800000000 stmt=INSERT INTO `sb_sandbox`.`continues_insert_test` VALUES (0) I1007 20:00:59.770063 49261 coordinator.cc:592] starting 1 fragment instances for query a94b13644229aa92:dd551bc800000000 I1007 20:00:59.771142 17639 fragment-mgr.cc:40] ExecPlanFragment() instance_id=a94b13644229aa92:dd551bc800000000 coord=h9.yieldlab.lan:22000 I1007 20:00:59.771531 48326 plan-fragment-executor.cc:119] Prepare(): query_id=a94b13644229aa92:dd551bc800000000 instance_id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772028 49261 coordinator.cc:630] started 1 fragment instances for query a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772033 48326 plan-fragment-executor.cc:175] descriptor table for fragment=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772241 48326 plan-fragment-executor.cc:300] Open(): instance_id=a94b13644229aa92:dd551bc800000000 I1007 20:00:59.772488 49261 impala-server.cc:895] Query a94b13644229aa92:dd551bc800000000 has timeout of 2m I1007 20:00:59.955390 18884 coordinator.cc:1536] Fragment instance completed: id=a94b13644229aa92:dd551bc800000000 host=h9.yieldlab.lan:22000 remaining=0 I1007 20:00:59.955576 48328 coordinator.cc:1031] Finalizing query: a94b13644229aa92:dd551bc800000000 I1007 20:00:59.955627 48326 fragment-mgr.cc:99] PlanFragment completed. instance_id=a94b13644229aa92:dd551bc800000000 .b/sb_sandbox/continues_insert_test/_impala_insert_staging/a94b13644229aa92_dd551bc800000000/ I1007 20:01:00.379140 49261 impala-hs2-server.cc:679] CloseOperation(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379181 49261 impala-server.cc:906] UnregisterQuery(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379195 49261 impala-server.cc:992] Cancel(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379209 49261 coordinator.cc:1351] Cancel() query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379233 49261 coordinator.cc:1417] CancelFragmentInstances() query_id=a94b13644229aa92:dd551bc800000000, tried to cancel 0 fragment instances I1007 20:01:00.379140 49261 impala-hs2-server.cc:679] CloseOperation(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379181 49261 impala-server.cc:906] UnregisterQuery(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379195 49261 impala-server.cc:992] Cancel(): query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379209 49261 coordinator.cc:1351] Cancel() query_id=a94b13644229aa92:dd551bc800000000 I1007 20:01:00.379233 49261 coordinator.cc:1417] CancelFragmentInstances() query_id=a94b13644229aa92:dd551bc800000000, tried to cancel 0 fragment instances frozen version (insert statement never finishes and just hangs): frozen: I1007 20:03:32.718174 49317 admission-controller.cc:440] Schedule for id=8541dc1ac1e3542e:e991eaf600000000 in pool_name=root.impala-default cluster_mem_needed=6.00 GB PoolConfig: I1007 20:03:32.718283 49317 admission-controller.cc:451] Admitted query id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.718322 49317 coordinator.cc:441] Exec() query_id=8541dc1ac1e3542e:e991eaf600000000 stmt=INSERT INTO `sb_sandbox`.`continues_insert_test` VALUES (0) I1007 20:03:32.718523 49317 coordinator.cc:592] starting 1 fragment instances for query 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719136 18513 fragment-mgr.cc:40] ExecPlanFragment() instance_id=8541dc1ac1e3542e:e991eaf600000000 coord=h9.yieldlab.lan:22000 I1007 20:03:32.719440 50128 plan-fragment-executor.cc:119] Prepare(): query_id=8541dc1ac1e3542e:e991eaf600000000 instance_id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719676 50128 plan-fragment-executor.cc:175] descriptor table for fragment=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719667 49317 coordinator.cc:630] started 1 fragment instances for query 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719853 50128 plan-fragment-executor.cc:300] Open(): instance_id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.720300 49317 impala-server.cc:895] Query 8541dc1ac1e3542e:e991eaf600000000 has timeout of 2m I1007 20:03:32.847476 11346 coordinator.cc:1536] Fragment instance completed: id=8541dc1ac1e3542e:e991eaf600000000 host=h9.yieldlab.lan:22000 remaining=0 I1007 20:03:32.847577 50130 coordinator.cc:1031] Finalizing query: 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.847705 50128 fragment-mgr.cc:99] PlanFragment completed. instance_id=8541dc1ac1e3542e:e991eaf600000000 .b/sb_sandbox/continues_insert_test/_impala_insert_staging/8541dc1ac1e3542e_e991eaf600000000/ I1007 20:03:32.718174 49317 admission-controller.cc:440] Schedule for id=8541dc1ac1e3542e:e991eaf600000000 in pool_name=root.impala-default cluster_mem_needed=6.00 GB PoolConfig: I1007 20:03:32.718283 49317 admission-controller.cc:451] Admitted query id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.718322 49317 coordinator.cc:441] Exec() query_id=8541dc1ac1e3542e:e991eaf600000000 stmt=INSERT INTO `sb_sandbox`.`continues_insert_test` VALUES (0) I1007 20:03:32.718523 49317 coordinator.cc:592] starting 1 fragment instances for query 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719136 18513 fragment-mgr.cc:40] ExecPlanFragment() instance_id=8541dc1ac1e3542e:e991eaf600000000 coord=h9.yieldlab.lan:22000 I1007 20:03:32.719440 50128 plan-fragment-executor.cc:119] Prepare(): query_id=8541dc1ac1e3542e:e991eaf600000000 instance_id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719676 50128 plan-fragment-executor.cc:175] descriptor table for fragment=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719667 49317 coordinator.cc:630] started 1 fragment instances for query 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.719853 50128 plan-fragment-executor.cc:300] Open(): instance_id=8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.720300 49317 impala-server.cc:895] Query 8541dc1ac1e3542e:e991eaf600000000 has timeout of 2m I1007 20:03:32.847476 11346 coordinator.cc:1536] Fragment instance completed: id=8541dc1ac1e3542e:e991eaf600000000 host=h9.yieldlab.lan:22000 remaining=0 I1007 20:03:32.847577 50130 coordinator.cc:1031] Finalizing query: 8541dc1ac1e3542e:e991eaf600000000 I1007 20:03:32.847705 50128 fragment-mgr.cc:99] PlanFragment completed. instance_id=8541dc1ac1e3542e:e991eaf600000000 .b/sb_sandbox/continues_insert_test/_impala_insert_staging/8541dc1ac1e3542e_e991eaf600000000/ Impala 2.7, CDH 5.10 if you have any idea what could cause this we would be happy to hear about it.

Fenton · ‎01-23-2020

Thank you for that explanation

Fenton · ‎01-21-2020

We are still using cdh 5.10 and impala 2.7 and there is a startup option `inc_stats_size_limit_bytes` which is described in https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_perf_stats.html section "Maximum Serialized Stats Size" (which is identical to 5.10 docs) but it does not really describe what this setting means and how to predict that this is a problem. > The inc_stats_size_limit_bytes limit is set as a safety check, to prevent Impala from hitting the maximum limit for the table metadata. Note that this limit is only one part of the entire table's metadata all of which together must be below 2 GB. With pretty big tables and a lot of live partitions we had to increase catalogd memory to be able to cope with the amount of metadata. Now additionally singular large tables were not loaded during compute incremental stats because they reached inc_stats_size_limit_bytes which is far less than catalogd memory. But what does that mean? How can we calculate this limit or predict it? We didn't find any metrics in https://docs.cloudera.com/documentation/enterprise/5-10-x/topics/cm_metrics_impala_catalog_server.html nor can we calculate the expected limit / restriction by this config through anything found in https://docs.huihoo.com/cloudera/The-Impala-Cookbook.pdf (which works just fine for catalogd expected memory). In short: - What does inc_stats_size_limit_bytes mean - can we predict / calculate a needed value for inc_stats_size_limit_bytes for our tables

Online	Offline
Last Visited	‎10-15-2020 05:44 PM

Member Since	‎10-10-2018 02:18 AM
Last Visited	‎10-15-2020 05:44 PM
Posts	8

Cloudera Community

Re: Impala DML frozen on CDH manager frozen - hidd...

Impala DML frozen on CDH manager frozen - hidden d...

Re: Impala inc_stats_size_limit_bytes - what does ...

Impala inc_stats_size_limit_bytes - what does it m...