I have 4.5 million records in a Hive table.
My requirement is to cache this table as a temporary table through Spark thrift server, beeline so that Tableau can query the temporary table and generate reports.
I have 4 node clusters, each node has 50g RAM and 25 vCores. I'm using HDP2.3 with Spark 1.4.1
I'm able to cache the table in less than a minute and able to get the correct count from temp table. But the problem is when I try to execute a select query (using beeline, same spark sqlContext) with one column, hitting OOM error.
Tried below configurations without any luck:
As per my understanding, I have enough RAM in driver machine and should be able to bring the result of select to driver.
What is the data size you expect to be in the temp table? With default configuration only 54% of executor's memory allocation is used to hold data. See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/ch_tuning-spark.html#...
can you post how big your table is? please open up spark UI ,in your case it is application master UI and click on Executors there is column with the name of storage memory per executor.if you sum all the executors storage memory then you will know how big table you can cache with spark executors.