Member since
07-10-2017
68
Posts
30
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4194 | 02-20-2018 11:18 AM | |
3424 | 09-20-2017 02:59 PM | |
18063 | 09-19-2017 02:22 PM | |
3626 | 08-03-2017 10:34 AM | |
2302 | 07-28-2017 10:01 AM |
11-16-2017
05:36 AM
1 Kudo
@anobi do For spark driver memory see this link -> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-driver.html Also when you do a collect or take, the result comes to driver, your driver will throw error if the result of collect or take is more than free space. Hence it's kept large to account for that if you have big datasets. However default is set to 1G or 2G because it mainly schedules tasks working with YARN with operations being performed on executors themselves (which actually have data, can cache it and process it). When you increase sessions, STS daemon memory shall increase too because it has to keep listening and handling sessions. My thrift server process was started like this: hive 27597 13 Nov15 ?00:49:53 /usr/lib/jvm/java-1.8.0/bin/java -Dhdp.version=2.6.1.0-129 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/current/hadoop-client/conf/ -Xmx6000m org.apache.spark.deploy.SparkSubmit --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal Note the -Xmx here corresponds to thrift daemon memory rather than driver memory, driver memory is taken from spark2-thriftserver/conf/spark-thrift-sparkconf.conf which internally has a symbolic link to one inside /etc. If you don't override it there it would just pick default. So please have spark.executor.memory, spark.driver.memory defined there. Can you get in your node, do ps -eaf | grep thrift and paste output here? I had asked you to set SPARK_DAEMON_MEMORY=6000m ? Are you using HDP/Ambari? If yes, please set it directly here as shown: screen-shot-2017-11-16-at-104601-am.png And set thrift-server parameters here: screen-shot-2017-11-16-at-104834-am.png Just for example. If you're not using HDP/Ambari, Set SPARK_DAEMON_MEMORY in spark-env.sh and thrift parameters in /etc/spark2/conf/spark-thrift-sparkconf.conf and start thrift-sever. spark.driver.cores 1 spark.driver.memory 40G spark.executor.cores 1 spark.executor.instances 13 spark.executor.memory 40G Or you can also give thrift parameters dynamically as mentioned in the IBM link I sent. You can cross-check your configuration in Environment Tab when you open your application in Spark History Server. Even I couldn't find a document explaining thrift-server in detail. Please confirm that you've done above and cross-check environment in Spark UI.
... View more
11-14-2017
05:20 PM
Yes, this takes effect on cluster mode too and dictates the memory for Spark History Server and STS daemons. Are you using HDP? If yes you should be able to set it via Ambari, else set it directly in spark-env.sh. Please do try this.
... View more
11-14-2017
09:49 AM
2 Kudos
@anobi do Spark Thrift Server is just a gateway to submit applications to Spark, so standard Spark configurations are applicable directly. Please see below links. I found them very useful. https://developer.ibm.com/hadoop/2016/08/22/how-to-run-queries-on-spark-sql-using-jdbc-via-thrift-server/ https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ Main Properties -> https://spark.apache.org/docs/latest/configuration.html Also STS honors this configuration file -> /etc/spark2/conf/spark-thrift-sparkconf.conf. So set your spark.executor.memory, spark.driver.memory, spark.executor.cores, spark.executor.instances there. Thank You
... View more
11-14-2017
05:40 AM
1 Kudo
Hi, Is your thrift server crashing saying no JVM heap? This may be related to STS daemon itself instead of drivers and executors. Please try increasing daemon memory in spark-env.sh (This isn't the memory for driver/executor, it's for spark daemons- history server and STS). It is 1 GB by default. Increase this to 4-6. #Memory for Master, Worker and history server (default: 1024MB) export SPARK_DAEMON_MEMORY=6000m Thank You
... View more
11-14-2017
05:29 AM
Hi Swaapnika, I've tried using Flume for that and had no issues. Investigate this repository for python https://github.com/edenhill/librdkafka. This is the most exhaustive one I guess.
... View more
11-10-2017
09:49 AM
It is explained in detail here- https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive
... View more
11-10-2017
09:13 AM
1 Kudo
If your text always starts,ends with a ", then you can probably use below transformations: text.map(lambda x:(1,x)).reduceByKey(lambda x,y:' '.join([x,y])).map(lambda x:x[1][1:-2]).flatMap(lambda x:x.split('" "')).collect() where text represents an object that reads below lines "The csv file is about to be loaded into Phoenix" "another line to parse" like: ['"The csv','file is about','to be loaded into','Phoenix",'"another line','to parse"'] While loading lines are split on a \n. This reduces them once again to a single line and splits on " ", so you get a list with portions between successive ".
... View more
11-01-2017
04:29 AM
1 Kudo
What user are you starting spark-history-server.sh as? Do a su spark, before launching shell script. I think you're starting as root user, so it's saying root user doesn't have access to that folder. Since you've given spark ownership, it should be able to access. If you must start as root, then give root access to that directory.
... View more
10-25-2017
04:13 AM
3 Kudos
I assume you're on Spark 2? SparkSession, without explicitly creating SparkConf, SparkContext or
SQLContext, encapsulates them within itself. Also SparkSession has merged SQLContext and HiveContext in one object in Spark 2.0. When building a session object, for example: val spark =
SparkSession
.builder()
.appName(
"SparkSessionZipsExample"
)
.config(
"spark.sql.warehouse.dir"
, warehouseLocation)
.enableHiveSupport()
.getOrCreate() .enableHiveSupport() provides HiveContext functions. So you're able to use catalog functions since spark has provided connectivity to hive metastore on doing .enableHiveSupport() https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SparkSession.Builder.html#enableHiveSupport() You'll get more clarity by reading this https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
... View more
10-13-2017
06:58 AM
Also did you create kerberos database? If not, create it. krb5_newrealm Do check your /etc/krb5.conf again.
... View more