Member since
01-17-2018
4
Posts
1
Kudos Received
0
Solutions
01-20-2018
08:17 PM
I've attached the OOM heapdump analysis. Looks like the memory it is being used by the ORC writers, but would appreciate other opinions. There also seems to be a lot of memory used by org.apache.hadoop.hive.serde2.io.HiveDecimalWritable which has 96 million objects? I'm not a Java expert so this is guesswork on my part right now. I wonder if we can use some of these figures to reverse engineer a formula? Thanks, Steve heapdump-system-overview.zip heapdump-top-components.zip
... View more
01-19-2018
09:27 PM
Thanks for the reply. I realise my original post is short on specific details which may help, so here goes. This is a single node test system whose configuration is as follows: llap_headroom_space = 24576<br>hive.llap.daemon,yarn.container.mb = 11264<br>llap_heap_size = 8192<br>hive.llap.io.memory.size = 2048<br>num_llap_nodes_for_llap_daemons = 1<br>hive.llap.daemon.num.executors = 1 The table being inserted into is an ORC table with default orc.compress.size, it has 168 columns, nearly all DECIMAL(38,x) datatype, and is partitioned on 3 columns. The source table for the insert-select operation is Avro with 165 columns, all string datatype. The dynamic partition insert would generate 3019 partitions if it were to succeed. I know this is a lot of partitions and wouldn't expect this to fit inside an 8G LLAP daemon, but the point of the question is to be able to as accurately as possible predict how much memory this insert-select operation will take with the information we have above. I've obtained a heapdump using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/hadoop/yarn/log/heapdump on the LLAP daemon and am running it through jhat. I will post the results when I have them. Thanks, Steve
... View more
01-19-2018
02:32 PM
1 Kudo
I have a situation where I have configured my LLAP JVM to have a certain heap size (LLAP Daemon Heap Size (MB)). Let's say this is 8192 but the size doesn't really matter. Now when I run an insert-select operation into a partitioned table and I watch the JVM with JConsole I can see the used heap memory run to exhaustion and then the query fails with a java.lang.OutOfMemoryError: Java heap space error as expected. I'm aware that we can just keep increasing the LLAP Daemon Heap Size (MB) value until we reached a value that allows the insert-select operation to complete, but this isn't scientific or desirable. What I would like to be able to do is to determine in advance how much memory the insert-select operation will require. I've been unable to find any information on this, but I figure it must be determined by factors such as number of table columns, number of partitions being created, ORC stripe size etc. Can someone share a formula that can be used to determine the memory required for such an insert-select operation into a partitioned table, if indeed such a formula exists?
... View more