About falbani

falbani · ‎10-26-2018

@Nikhil Have you checked the Executor tab in Spark UI, does this helps? RM UI also displays the total memory per application. HTH

falbani · ‎10-18-2018

@Aditya Sirna I reproduced the problem. Turns out the issue was caused due metastore location uri poitning to one of the nodes only. To change this you need to run: hive --config /etc/hive/conf/conf.server --service metatool -listFSRoot The above will list and you will be able to spot the locations you need to change. Then issue: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation <new-location> <old-location> For example: hive --config /etc/hive/conf/conf.server --service metatool -updateLocation hdfs://c14/apps/spark/warehouse hdfs://c14-node2.squadron-labs.com:8020/apps/spark/warehouse HTH

falbani · ‎10-18-2018

Could you try setting nameservice id for the following properties: spark.history.fs.logDirectory=hdfs://<name_service_id>/spark2-history/ spark.eventLog.dir=hdfs://<name_service_id>/spark2-history/

falbani · ‎10-18-2018

@Aditya Sirna Did you try to copy or symlink the /etc/hadoop/conf/core-site.xml into the /etc/spark2/conf/ ? If not please give it a try and let us know how it goes. HTH

falbani · ‎10-12-2018

@Shantanu Sharma You should not copy the hive-site.xml from hive conf directory for spark. Spark uses a smaller and rather simple hive-site.xml. cat /etc/spark2/conf/hive-site.xml <configuration xmlns:xi="http://www.w3.org/2001/XInclude"> <property> <name>hive.exec.scratchdir</name> <value>/tmp/spark</value> </property> <property> <name>hive.metastore.client.connect.retry.delay</name> <value>5</value> </property> <property> <name>hive.metastore.client.socket.timeout</name> <value>1800</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://hive-trhift-fqdn:9083</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> <property> <name>hive.server2.thrift.http.port</name> <value>10002</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10016</value> </property> <property> <name>hive.server2.transport.mode</name> <value>binary</value> </property> </configuration> The above is an example, just make sure you change values accordingly to your env (hive.metastore.uris) HTH

falbani · ‎10-09-2018

@Jakub Igla The answer is it depends on what your usecase is. What I hear customers say, when they use doAs=true, is they like to have audit at hdfs operation level. But at the same time as you well said column level authorization isn't complete when doAs=true, since users have access to underlying data. It's a give and take, and personally I've seen both approaches being used in production. HTH

falbani · ‎09-24-2018

@subhash parise From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_spark_hive_connection.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎09-24-2018

@subhash parise Have you tried passing the configuration using --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc url taken from ambari hive conf" ? HTH

falbani · ‎09-12-2018

@Michael Bronson Using Spark UI you can go to executor tab and there is a column with GC time. Also, by using the above configurations I shared the gc details will be printed as part of the log ouput. You can review those using any tool like http://gceasy.io/ HTH

falbani · ‎09-12-2018

@Daniel Müller As I thought the PushedFilters are empty. I checked the spark.sql.orc.filterPushdown implementation details and looks like LIMIT is not supported. You can read more by looking at inline comments here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Online	Offline
Last Visited	‎10-24-2023 05:42 PM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎10-24-2023 05:42 PM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: How to monitor the actual memory allocation of...

Re: Spark thrift server is failing to start when N...

Re: Spark thrift server is failing to start when N...

Re: Spark thrift server is failing to start when N...

Re: spark-sql : Error in session initiation - NoC...

Re: Ranger KMS vs Ranger HDFS policies vs Ranger H...

Re: Spark2 shell is not displaying all Hive databa...

Re: SPARK + LLAP not working in HDP 3.0?

Re: Spark failure detection - why datanode not sen...

Re: Spark SQL: Limit clause performance issues