Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1349 | 09-11-2019 10:19 AM | |
8205 | 11-26-2018 07:04 PM | |
1887 | 11-14-2018 12:10 PM | |
3894 | 11-14-2018 12:09 PM | |
2599 | 11-12-2018 01:19 PM |
10-26-2018
12:49 PM
@Nikhil Have you checked the Executor tab in Spark UI, does this helps? RM UI also displays the total memory per application. HTH
... View more
10-18-2018
07:29 PM
1 Kudo
@Aditya Sirna I reproduced the problem. Turns out the issue was caused due metastore location uri poitning to one of the nodes only. To change this you need to run: hive --config /etc/hive/conf/conf.server --service metatool -listFSRoot The above will list and you will be able to spot the locations you need to change. Then issue: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation <new-location> <old-location> For example: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation hdfs://c14/apps/spark/warehouse hdfs://c14-node2.squadron-labs.com:8020/apps/spark/warehouse HTH
... View more
10-18-2018
03:02 PM
Could you try setting nameservice id for the following properties: spark.history.fs.logDirectory=hdfs://<name_service_id>/spark2-history/
spark.eventLog.dir=hdfs://<name_service_id>/spark2-history/
... View more
10-18-2018
01:18 PM
@Aditya Sirna Did you try to copy or symlink the /etc/hadoop/conf/core-site.xml into the /etc/spark2/conf/ ? If not please give it a try and let us know how it goes. HTH
... View more
10-12-2018
07:49 PM
1 Kudo
@Shantanu
Sharma You should not copy the hive-site.xml from hive conf directory for spark. Spark uses a smaller and rather simple hive-site.xml. cat /etc/spark2/conf/hive-site.xml
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/spark</value>
</property>
<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>5</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-trhift-fqdn:9083</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10002</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10016</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
</configuration> The above is an example, just make sure you change values accordingly to your env (hive.metastore.uris) HTH
... View more
10-09-2018
02:22 PM
@Jakub Igla The answer is it depends on what your usecase is. What I hear customers say, when they use doAs=true, is they like to have audit at hdfs operation level. But at the same time as you well said column level authorization isn't complete when doAs=true, since users have access to underlying data. It's a give and take, and personally I've seen both approaches being used in production. HTH
... View more
09-24-2018
12:25 PM
4 Kudos
@subhash parise From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_spark_hive_connection.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
09-24-2018
12:21 PM
1 Kudo
@subhash parise Have you tried passing the configuration using --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc url taken from ambari hive conf" ? HTH
... View more
09-12-2018
01:46 PM
@Michael Bronson Using Spark UI you can go to executor tab and there is a column with GC time. Also, by using the above configurations I shared the gc details will be printed as part of the log ouput. You can review those using any tool like http://gceasy.io/ HTH
... View more
09-12-2018
01:44 PM
@Daniel Müller As I thought the PushedFilters are empty. I checked the spark.sql.orc.filterPushdown implementation details and looks like LIMIT is not supported. You can read more by looking at inline comments here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more