Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

OOM | Not able to query spark temporary table

OOM | Not able to query spark temporary table

New Contributor

I have 4.5 million records in a Hive table.

My requirement is to cache this table as a temporary table through Spark thrift server, beeline so that Tableau can query the temporary table and generate reports.

I have 4 node clusters, each node has 50g RAM and 25 vCores. I'm using HDP2.3 with Spark 1.4.1

Issue:

-----

I'm able to cache the table in less than a minute and able to get the correct count from temp table. But the problem is when I try to execute a select query (using beeline, same spark sqlContext) with one column, hitting OOM error.

Tried below configurations without any luck:

  • 1)sudo ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.bind.host=10.74.129.175 --hiveconf hive.server2.thrift.port=10002 --master yarn-client --driver-memory 35g --driver-cores 25 --num-executors 4 --executor-memory 35g --executor-cores 25
  • $SPARK_HOME./bin/beeline> cache table temp1 as select * from hive_table;
  • set below config in spark-default file –
  • spark.driver.maxResultSize 20g
  • spark.kryoserializer.buffer.max 2000mb
  • spark.rdd.compress true
  • spark.speculation true
  • 2)sudo ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.bind.host=10.74.129.175 --hiveconf hive.server2.thrift.port=10002 --master yarn-client --driver-memory 35g --driver-cores 5 --num-executors 11 --executor-memory 35g --executor-cores 5
  • $SPARK_HOME./bin/beeline> cache table temp1 as select * from hive_table;
  • set below config in spark-default file –
  • spark.driver.maxResultSize 20g
  • spark.kryoserializer.buffer.max 2000mb
  • spark.rdd.compress true
  • spark.speculation true

As per my understanding, I have enough RAM in driver machine and should be able to bring the result of select to driver.

5 REPLIES 5

Re: OOM | Not able to query spark temporary table

What is the data size you expect to be in the temp table? With default configuration only 54% of executor's memory allocation is used to hold data. See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/ch_tuning-spark.html#...

Highlighted

Re: OOM | Not able to query spark temporary table

New Contributor

Size of the temp table is 8g

Re: OOM | Not able to query spark temporary table

@Amit Kumar Agarwal

can you post how big your table is? please open up spark UI ,in your case it is application master UI and click on Executors there is column with the name of storage memory per executor.if you sum all the executors storage memory then you will know how big table you can cache with spark executors.

Re: OOM | Not able to query spark temporary table

New Contributor

Please refer to attachment.

5104-executorssc.png

Re: OOM | Not able to query spark temporary table

looking at the executor memory available you can only store 66GB of table after serialization at a time.