<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Need Spark Thrift Server Design because STS hang after started about 2 hours in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Need-Spark-Thrift-Server-Design-because-STS-hang-after/m-p/187014#M149116</link>
    <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/48295/anobido.html" nodeid="48295"&gt;@anobi do&lt;/A&gt;&lt;/P&gt;&lt;P&gt;For spark driver memory see this link -&amp;gt; &lt;A href="https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-driver.html" target="_blank"&gt;https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-driver.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Also when you do a collect or take, the result comes to driver, your driver will throw error if the result of collect or take is more than free space. Hence it's kept large to account for that if you have big datasets. However default is set to 1G or 2G because it mainly schedules tasks working with YARN with operations being performed on executors themselves (which actually have data, can cache it and process it).&lt;/P&gt;&lt;P&gt;When you increase sessions, STS daemon memory shall increase too because it has to keep listening and handling sessions.&lt;/P&gt;&lt;P&gt;My thrift server process was started like this:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;hive 27597 13 Nov15 ?00:49:53 /usr/lib/jvm/java-1.8.0/bin/java -Dhdp.version=2.6.1.0-129 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/current/hadoop-client/conf/ -Xmx6000m org.apache.spark.deploy.SparkSubmit --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Note the -Xmx here corresponds to thrift daemon memory rather than driver memory, driver memory is taken from &lt;I&gt;spark2-thriftserver/conf/spark-thrift-sparkconf.conf &lt;/I&gt;&lt;I&gt; &lt;/I&gt;which internally has a symbolic link to one inside /etc.&lt;/P&gt;&lt;P&gt;If you don't override it there it would just pick default. So please have spark.executor.memory, spark.driver.memory defined there.&lt;/P&gt;&lt;P&gt;Can you get in your node, do ps -eaf | grep thrift and paste output here?&lt;/P&gt;&lt;P&gt;I had asked you to set &lt;EM&gt;SPARK_DAEMON_MEMORY=6000m &lt;/EM&gt;?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Are you using HDP/Ambari?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;If yes, please set it directly here as shown:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/43609-screen-shot-2017-11-16-at-104601-am.png"&gt;screen-shot-2017-11-16-at-104601-am.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;And set thrift-server parameters here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/43610-screen-shot-2017-11-16-at-104834-am.png"&gt;screen-shot-2017-11-16-at-104834-am.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Just for example.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;If you're not using HDP/Ambari, &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Set &lt;EM&gt;SPARK_DAEMON_MEMORY &lt;/EM&gt;in spark-env.sh and thrift parameters in /etc/spark2/conf/spark-thrift-sparkconf.conf and start thrift-sever.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;spark.driver.cores 1&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;spark.driver.memory 40G&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;spark.executor.cores 1&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;spark.executor.instances 13&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;spark.executor.memory 40G&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Or you can also give thrift parameters dynamically as mentioned in the IBM link I sent.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;You can cross-check your configuration in Environment Tab when you open your application in Spark History Server.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Even I couldn't find a document explaining thrift-server in detail. &lt;/P&gt;&lt;P&gt;Please confirm that you've done above and cross-check environment in Spark UI.&lt;/P&gt;</description>
    <pubDate>Thu, 16 Nov 2017 13:36:04 GMT</pubDate>
    <dc:creator>tsharma</dc:creator>
    <dc:date>2017-11-16T13:36:04Z</dc:date>
  </channel>
</rss>

