Support Questions

Find answers, ask questions, and share your expertise

Memory required for insert-select operation in LLAP

avatar
Explorer

I have a situation where I have configured my LLAP JVM to have a certain heap size (LLAP Daemon Heap Size (MB)). Let's say this is 8192 but the size doesn't really matter.

Now when I run an insert-select operation into a partitioned table and I watch the JVM with JConsole I can see the used heap memory run to exhaustion and then the query fails with a java.lang.OutOfMemoryError: Java heap space error as expected.

I'm aware that we can just keep increasing the LLAP Daemon Heap Size (MB) value until we reached a value that allows the insert-select operation to complete, but this isn't scientific or desirable.

What I would like to be able to do is to determine in advance how much memory the insert-select operation will require. I've been unable to find any information on this, but I figure it must be determined by factors such as number of table columns, number of partitions being created, ORC stripe size etc.

Can someone share a formula that can be used to determine the memory required for such an insert-select operation into a partitioned table, if indeed such a formula exists?

1 ACCEPTED SOLUTION

avatar
Rising Star

The general recommended heap allocation for LLAP is ~4Gb of memory /per executor/, so it depends on how many executors you have. You can see this with lots of details at

https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html

Sometimes depending on query size and type, it's possible to get away with as low as 1-3Gb per executor; not sure how many executors you are using so it all depends.

If the values are somewhere in this range, it would be good to get heap dump on OOM and see what's taking up all the memory. There are two main possible causes - something in processing pipeline, or something in ORC writing (ORC may keep lots of buffers open when writing many files in parallel). The former is supposed to run in 4Gb/executor, so if that's the case it would be a bug and we'd like to see the top users from the heap dump 🙂 For ORC, there is unfortunately not a good formula that I know of cc @owen, but if you write more files (lots of partitions, lots of buckets) you'd need more memory. Hard to tell without knowing the details.

View solution in original post

4 REPLIES 4

avatar
Rising Star

The general recommended heap allocation for LLAP is ~4Gb of memory /per executor/, so it depends on how many executors you have. You can see this with lots of details at

https://community.hortonworks.com/articles/149486/llap-sizing-and-setup.html

Sometimes depending on query size and type, it's possible to get away with as low as 1-3Gb per executor; not sure how many executors you are using so it all depends.

If the values are somewhere in this range, it would be good to get heap dump on OOM and see what's taking up all the memory. There are two main possible causes - something in processing pipeline, or something in ORC writing (ORC may keep lots of buffers open when writing many files in parallel). The former is supposed to run in 4Gb/executor, so if that's the case it would be a bug and we'd like to see the top users from the heap dump 🙂 For ORC, there is unfortunately not a good formula that I know of cc @owen, but if you write more files (lots of partitions, lots of buckets) you'd need more memory. Hard to tell without knowing the details.

avatar
Explorer

Thanks for the reply. I realise my original post is short on specific details which may help, so here goes.

This is a single node test system whose configuration is as follows:

llap_headroom_space = 24576<br>hive.llap.daemon,yarn.container.mb = 11264<br>llap_heap_size = 8192<br>hive.llap.io.memory.size = 2048<br>num_llap_nodes_for_llap_daemons = 1<br>hive.llap.daemon.num.executors = 1

The table being inserted into is an ORC table with default orc.compress.size, it has 168 columns, nearly all DECIMAL(38,x) datatype, and is partitioned on 3 columns.
The source table for the insert-select operation is Avro with 165 columns, all string datatype.
The dynamic partition insert would generate 3019 partitions if it were to succeed. I know this is a lot of partitions and wouldn't expect this to fit inside an 8G LLAP daemon, but the point of the question is to be able to as accurately as possible predict how much memory this insert-select operation will take with the information we have above.

I've obtained a heapdump using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/hadoop/yarn/log/heapdump on the LLAP daemon and am running it through jhat. I will post the results when I have them.

Thanks,
Steve

avatar
Explorer

I've attached the OOM heapdump analysis. Looks like the memory it is being used by the ORC writers, but would appreciate other opinions. There also seems to be a lot of memory used by org.apache.hadoop.hive.serde2.io.HiveDecimalWritable which has 96 million objects? I'm not a Java expert so this is guesswork on my part right now.

I wonder if we can use some of these figures to reverse engineer a formula?

Thanks,
Steve

heapdump-system-overview.zip

heapdump-top-components.zip

avatar
Contributor

Hopefully, you would have solved it. But, for others:

Configuration must be:
hive.llap.daemon >= llap_heap_size + llap.io.memory.size + llap_headroom_space
Which is not the case above.