Reply
New Contributor
Posts: 2
Registered: ‎10-08-2014

Hive java.lang.OutOfMemoryError with out any further string like Heapspace, permgenspace etc.

Hello All,

Our scenario is like this we have a table with 200 columns , parquet formatted table, data size is around 127 GB over all.

Now we are running query as below

select a.*,rank() OVER(PARTITION BY evnm,sessionid,substr(evts, 1, 10),
                                        evexpncnm,evtypnm,evprtycd,prodcd,chnnm order by sessionid) as sessionidint
                                                from table a where datepart='20150502';

 Just group by query : select col1,col2,col3,col4 from table group by col1,col2,col3,col4; This as well fails with the same error code

Now here whether we use rank over or not, if we simply apply a group by all the columns mentioned above (evexpncnm,evtypnm,evprtycd,prodcd,chnnm,sessionid) the hive query consistently fails. This query fails even if you select 1 year data or just 1 day data (40 small small files for a single day). Parquet type, snappy compressed.

The stack trace shows as below. To my surprise because of impala memory limitations the same query(just group by with out using rank) works fine.Not sure what exactly is the problem with here, Java out of memory even does not say if it is Heap space issue or some thing else

 

2015-06-04 05:33:11,237 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2015-06-04 05:33:11,240 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError
	at sun.misc.Unsafe.allocateMemory(Native Method)
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
	at parquet.hadoop.codec.SnappyDecompressor.setInput(SnappyDecompressor.java:99)
	at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:43)
	at java.io.DataInputStream.readFully(DataInputStream.java:195)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:201)
	at parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:521)
	at parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
	at parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:546)
	at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
	at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
	at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
	at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:267)
	at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:131)
	at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:96)
	at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:136)
	at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:96)
	at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:127)
	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:196)
	at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:144)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:69)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:244)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)

 There are 44 mappers generated for this only 6 gets passed after that all mappers either killed or Failed.

 Any idea what could be the problem. The other important setting are as below

 

mapred.child.java.opts-Djava.net.preferIPv4Stack=true -Xmx2147483648
mapred.child.ulimit4194304

 

io.sort.mb
  
768

 
io.sort.facor-64

mapred.cluster.reduce.memory.mb

-1

 

Cloudera Employee
Posts: 13
Registered: ‎10-16-2013

Re: Hive java.lang.OutOfMemoryError with out any further string like Heapspace, permgenspace etc.

Keep in mind when you use Hive it uses MapReduce.  MR has its own configuration settings.

 

From the stacktrace it's the mapper that's running out of memory.  Is the cluster is using YARN you can increase the memory via the following configs:

mapreduce.map.memory.mb (default 1GB)

mapreduce.map.java.opts.max.heap (default 800GB)

 

 

Try double the values above as a start.  Make sure the max container size is at least as large as the value in mapreduce.map.memory.mb.

 

Hope this helps.

Cloudera Employee
Posts: 13
Registered: ‎10-16-2013

Re: Hive java.lang.OutOfMemoryError with out any further string like Heapspace, permgenspace etc.

Highlighted
New Contributor
Posts: 1
Registered: ‎05-23-2019

Re: Hive java.lang.OutOfMemoryError with out any further string like Heapspace, permgenspace etc.

I ran into a similar issue.  I believe the issue is when you issue a rank over partition command, it calculates the rank in a single partition, so if the entire contents of the partition can't fit into a single node, you'll run out of memory.