Member since
07-26-2016
3
Posts
0
Kudos Received
0
Solutions
08-01-2016
02:37 AM
How much memory do you have? How much is assigned to Spark? Do you have logging on so you can check logs and history UI? Turn off everything else you can. For debugging run through the Spark shell, Zeppelin adds over head and takes a decent amount of YARN resources and RAM. Run on Spark 1.6 / HDP 2.4.2 if you can. Allocate as much memory as possible. Spark is an all memory beast. sparkConf.set("spark.cores.max", "16") // all the cores you can sparkConf.set("spark.serializer", classOf[KryoSerializer].getName) sparkConf.set("spark.sql.tungsten.enabled", "true") sparkConf.set("spark.eventLog.enabled", "true") sparkConf.set("spark.app.id", "YourID") sparkConf.set("spark.io.compression.codec", "snappy") sparkConf.set("spark.rdd.compress", "true")
I like to maximize my resources and performance.
... View more
07-28-2016
02:20 AM
5 Kudos
@Carles San Agustin You need to increase your OS ulimit. Most likely you have some tables with multiple partitions and processes that access them. You will need to restart your servers and change the ulimit on all nodes. This requires downtime. It is a good practice to do it upfront estimating how the cluster will be used in regard to file descriptors. See this: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/ref-729d1fb0-6d1b-459f-a18a-b5eba4540ab5.1.html Also section 1.2.8 here. I cannot tell you what is the magic number for you, it depends on what you do and what the servers can provide as resources, but I have seen ulimit being set from tens of thousands to hundreds of thousands. The minimum requirement for installing Hortonworks Data Platform is 10,000. Try various numbers. If this response helps, please vote/accept it as the best answer.
... View more