chd 5.8 one node, query: insert into table1_parquet select * from table1_csv;
table1_csv - exsternal table on 4.5Gb csv file
Memory Limit Exceeded Process: memory limit exceeded. Limit=4.00 GB Consumption=4.00 GB RequestPool=root.cloudera: Consumption=3.89 GB Query(5943a24543a11be7:ca86ab5f752abc8f) Limit: Consumption=3.22 GB Fragment 5943a24543a11be7:ca86ab5f752abc90: Consumption=29.76 MB EXCHANGE_NODE (id=2): Consumption=0 DataStreamRecvr: Consumption=29.56 MB Block Manager: Limit=3.20 GB Consumption=3.20 GB Query(45434c5ab9d0a65b:d720c795d8bb1db3) Limit: Consumption=568.01 MB Fragment 45434c5ab9d0a65b:d720c795d8bb1db4: Consumption=8.00 KB EXCHANGE_NODE (id=2): Consumption=0 DataStreamRecvr: Consumption=0 Block Manager: Limit=3.20 GB Consumption=568.00 MB Query(aa463b578023672f:c132c7ffba81d496) Limit: Limit=96.00 GB Consumption=110.98 MB Fragment aa463b578023672f:c132c7ffba81d497: Consumption=110.98 MB HDFS_SCAN_NODE (id=0): Consumption=30.04 MB HdfsTableSink: Consumption=80.93 MB Block Manager: Limit=3.20 GB Consumption=0 WARNING: The following tables are missing relevant table and/or column statistics. default.table1_csv Memory Limit Exceeded Query(aa463b578023672f:c132c7ffba81d496) Limit: Limit=96.00 GB Consumption=38.05 MB Fragment aa463b578023672f:c132c7ffba81d497: Consumption=38.05 MB HDFS_SCAN_NODE (id=0): Consumption=38.04 MB Block Manager: Limit=3.20 GB Consumption=0
Is it OK when Impala try to load full table1_csv file in memory ?
It looks like query 5943a24543a11be7:ca86ab5f752abc8f is consuming all the memory. It's weird because 5943a24543a11be7:ca86ab5f752abc8f looks like it is partially cancelled.
It's probably worth checking if that query is still running.
It's possible otherwise that you could be hitting https://issues.cloudera.org/browse/IMPALA-3633 , where we have "zombie" fragments that sit around consuming resources. That was fixed in Impala 2.7 and backported to other minor version.s