Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Memory required for insert-select operation in LLAP

avatar
Explorer

I have a situation where I have configured my LLAP JVM to have a certain heap size (LLAP Daemon Heap Size (MB)). Let's say this is 8192 but the size doesn't really matter.

Now when I run an insert-select operation into a partitioned table and I watch the JVM with JConsole I can see the used heap memory run to exhaustion and then the query fails with a java.lang.OutOfMemoryError: Java heap space error as expected.

I'm aware that we can just keep increasing the LLAP Daemon Heap Size (MB) value until we reached a value that allows the insert-select operation to complete, but this isn't scientific or desirable.

What I would like to be able to do is to determine in advance how much memory the insert-select operation will require. I've been unable to find any information on this, but I figure it must be determined by factors such as number of table columns, number of partitions being created, ORC stripe size etc.

Can someone share a formula that can be used to determine the memory required for such an insert-select operation into a partitioned table, if indeed such a formula exists?

1 ACCEPTED SOLUTION

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Explorer

Thanks for the reply. I realise my original post is short on specific details which may help, so here goes.

This is a single node test system whose configuration is as follows:

llap_headroom_space = 24576<br>hive.llap.daemon,yarn.container.mb = 11264<br>llap_heap_size = 8192<br>hive.llap.io.memory.size = 2048<br>num_llap_nodes_for_llap_daemons = 1<br>hive.llap.daemon.num.executors = 1

The table being inserted into is an ORC table with default orc.compress.size, it has 168 columns, nearly all DECIMAL(38,x) datatype, and is partitioned on 3 columns.
The source table for the insert-select operation is Avro with 165 columns, all string datatype.
The dynamic partition insert would generate 3019 partitions if it were to succeed. I know this is a lot of partitions and wouldn't expect this to fit inside an 8G LLAP daemon, but the point of the question is to be able to as accurately as possible predict how much memory this insert-select operation will take with the information we have above.

I've obtained a heapdump using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/hadoop/yarn/log/heapdump on the LLAP daemon and am running it through jhat. I will post the results when I have them.

Thanks,
Steve

avatar
Explorer

I've attached the OOM heapdump analysis. Looks like the memory it is being used by the ORC writers, but would appreciate other opinions. There also seems to be a lot of memory used by org.apache.hadoop.hive.serde2.io.HiveDecimalWritable which has 96 million objects? I'm not a Java expert so this is guesswork on my part right now.

I wonder if we can use some of these figures to reverse engineer a formula?

Thanks,
Steve

heapdump-system-overview.zip

heapdump-top-components.zip

avatar
Contributor

Hopefully, you would have solved it. But, for others:

Configuration must be:
hive.llap.daemon >= llap_heap_size + llap.io.memory.size + llap_headroom_space
Which is not the case above.