Created 01-11-2019 06:18 AM
Hello all,
I'm trying to config Hiveserver2 use Spark and it's working perfect with small file. But with large file ( ~ 1.5GB ) , it will be crash by "GC overhead limit exceeded" .
My flow is simple like this :
1. Load data from text file into table_text ( text file ~ 1.5G )
Sql: load data local path 'home/abc.txt' into table table_text;
2. select data from table_text to insert to table_orc ( crash in this flow )
SQL : Insert into table table_orc select id,time,data,path,size from table_text;
I guess spark have to load all data from table_text and save it in memory before insert to table_orc . I researched and know that spark can config if data does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed ( RDD Persistence ).
My environment:
Ubuntu 16.04
Hive version : 2.3.0
Free memory when launch sql : 4G
My config in hive-site.xml:
<property> <name>hive.execution.engine</name> <value>spark</value> </property> <property> <name>spark.master</name> <value>local[*]</value> </property> <property> <name>spark.eventLog.enabled</name> <value>true</value> </property> <property> <name>spark.driver.memory</name> <value>12G</value> </property> <property> <name>spark.executor.memory</name> <value>12G</value> </property> <property> <name>spark.serializer</name> <value>org.apache.spark.serializer.KryoSerializer</value> </property> <property> <name>spark.yarn.jars</name> <value>/home/cpu60020-local/Documents/Setup/Java/server/spark/jars/*</value> </property> <property> <name>spark.eventLog.enabled</name> <value>false</value> </property> <property> <name>spark.eventLog.dir</name> <value>/home/cpu60020-local/Documents/Setup/Hive/apache-hive-2.3.0-bin/log/</value> </property>
Please tell me if you have any suggess , thanks all !
Created 01-11-2019 09:42 AM
After increase heapsize in hive-env.sh to 4G , it's working perfect without OOM.
export HADOOP_HEAPSIZE=4096
Created 01-11-2019 09:42 AM
After increase heapsize in hive-env.sh to 4G , it's working perfect without OOM.
export HADOOP_HEAPSIZE=4096