About carles_sanagust

TimothySpann · ‎08-01-2016

How much memory do you have? How much is assigned to Spark? Do you have logging on so you can check logs and history UI? Turn off everything else you can. For debugging run through the Spark shell, Zeppelin adds over head and takes a decent amount of YARN resources and RAM. Run on Spark 1.6 / HDP 2.4.2 if you can. Allocate as much memory as possible. Spark is an all memory beast. sparkConf.set("spark.cores.max", "16") // all the cores you can sparkConf.set("spark.serializer", classOf[KryoSerializer].getName) sparkConf.set("spark.sql.tungsten.enabled", "true") sparkConf.set("spark.eventLog.enabled", "true") sparkConf.set("spark.app.id", "YourID") sparkConf.set("spark.io.compression.codec", "snappy") sparkConf.set("spark.rdd.compress", "true") I like to maximize my resources and performance.

cstanca · ‎07-28-2016

@Carles San Agustin You need to increase your OS ulimit. Most likely you have some tables with multiple partitions and processes that access them. You will need to restart your servers and change the ulimit on all nodes. This requires downtime. It is a good practice to do it upfront estimating how the cluster will be used in regard to file descriptors. See this: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-729d1fb0-6d1b-459f-a18a-b5eba4540ab5.1.html Also section 1.2.8 here. I cannot tell you what is the magic number for you, it depends on what you do and what the servers can provide as resources, but I have seen ulimit being set from tens of thousands to hundreds of thousands. The minimum requirement for installing Hortonworks Data Platform is 10,000. Try various numbers. If this response helps, please vote/accept it as the best answer.

Online	Offline
Last Visited	‎05-12-2017 07:14 AM

Member Since	‎07-26-2016 08:29 AM
Last Visited	‎05-12-2017 07:14 AM
Posts	3

Cloudera Community

Re: SparkException caused by GC overhead limit exc...

Re: Huge log files: Exception while processing sho...