Created on 05-01-2018 08:56 PM - edited 09-16-2022 06:09 AM
When I created hive table as select from another table, in which approximately has data around 100 GB and stored by mongostorage handler, I got "GC overhead limit exceeded" error. My query is
CREATE TABLE traffic as SELECT * FROM test2
and the error that I got is shown below.
2018-05-01 05:09:56,153 FATAL [RMCommunicator Allocator] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[RMCommunicator Allocator,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Collections.singletonIterator(Collections.java:3300) at java.util.Collections$SingletonSet.iterator(Collections.java:3332) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.selectBestAttempt(TaskImpl.java:544) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.getProgress(TaskImpl.java:449) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.computeProgress(JobImpl.java:907) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getProgress(JobImpl.java:891) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.getApplicationProgress(RMCommunicator.java:142) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:196) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:764) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:261) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:282) at java.lang.Thread.run(Thread.java:745)
My assumption is that I inserted too large data into hive table then it turns out an error. But I can't figure out how can I solve this issue. I also try limit query
CREATE TABLE traffic as SELECT * FROM test2 limit 1000;
but it also returns the same error.
Created on 05-02-2018 09:28 AM - edited 05-02-2018 09:31 AM
are you using beeline client tool ?
did you try increasing heap on the below property
HADOOP_CLIENT_OPTS
just curious to know below following
what file format are you using ?
is there any compression ?
is table stats being collected ?
is table being partitioned or buckted ?