Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
973 | 09-11-2019 10:19 AM | |
6464 | 11-26-2018 07:04 PM | |
1310 | 11-14-2018 12:10 PM | |
2644 | 11-14-2018 12:09 PM | |
1955 | 11-12-2018 01:19 PM |
11-05-2018
12:25 PM
@Muhammad Umar Each executor can have 1 or more threads to perform parallel computation. In yarn master mode the default is 1. These threads can be increased by using the command line parameter --executor-cores. a) if I have specified 8 num_executors for an application and I dont set executor-cores so will each executor going to use all the cores ? In yarn master mode the default is 1, therefore each executor will use only 1 core by default. b) As each node has 8 cores, so what if I specify executor_cores = 4 so that means it will limit executor cores to be used for an executor should not exceed 4 while total cores per node are 8 ? Assignment of cores is static, not dynamic. And it will remain same during the execution of the application. If you set executor cores to 4 this means that each executor will start and run using 4 cores/threads to perform parallel computation. c) What is the criteria to specify executor_cores for a spark application? Increasing executor cores affects performance. You need to take in consideration the number of virtual cores available in each node and my recommendation is you should not increase this over 4 in most cases. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
10-26-2018
12:49 PM
@Nikhil Have you checked the Executor tab in Spark UI, does this helps? RM UI also displays the total memory per application. HTH
... View more
10-18-2018
07:29 PM
1 Kudo
@Aditya Sirna I reproduced the problem. Turns out the issue was caused due metastore location uri poitning to one of the nodes only. To change this you need to run: hive --config /etc/hive/conf/conf.server --service metatool -listFSRoot The above will list and you will be able to spot the locations you need to change. Then issue: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation <new-location> <old-location> For example: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation hdfs://c14/apps/spark/warehouse hdfs://c14-node2.squadron-labs.com:8020/apps/spark/warehouse HTH
... View more
10-18-2018
03:02 PM
Could you try setting nameservice id for the following properties: spark.history.fs.logDirectory=hdfs://<name_service_id>/spark2-history/
spark.eventLog.dir=hdfs://<name_service_id>/spark2-history/
... View more
10-18-2018
01:18 PM
@Aditya Sirna Did you try to copy or symlink the /etc/hadoop/conf/core-site.xml into the /etc/spark2/conf/ ? If not please give it a try and let us know how it goes. HTH
... View more
10-12-2018
07:49 PM
1 Kudo
@Shantanu
Sharma You should not copy the hive-site.xml from hive conf directory for spark. Spark uses a smaller and rather simple hive-site.xml. cat /etc/spark2/conf/hive-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/spark</value>
</property>
<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>5</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-trhift-fqdn:9083</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10002</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10016</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
</configuration> The above is an example, just make sure you change values accordingly to your env (hive.metastore.uris) HTH
... View more
10-09-2018
02:22 PM
@Jakub Igla The answer is it depends on what your usecase is. What I hear customers say, when they use doAs=true, is they like to have audit at hdfs operation level. But at the same time as you well said column level authorization isn't complete when doAs=true, since users have access to underlying data. It's a give and take, and personally I've seen both approaches being used in production. HTH
... View more
09-24-2018
12:25 PM
4 Kudos
@subhash parise From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_spark_hive_connection.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
09-24-2018
12:21 PM
1 Kudo
@subhash parise Have you tried passing the configuration using --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc url taken from ambari hive conf" ? HTH
... View more
09-12-2018
01:46 PM
@Michael Bronson Using Spark UI you can go to executor tab and there is a column with GC time. Also, by using the above configurations I shared the gc details will be printed as part of the log ouput. You can review those using any tool like http://gceasy.io/ HTH
... View more