Member since
03-18-2016
4
Posts
2
Kudos Received
0
Solutions
01-11-2018
07:23 PM
Thanks @Dongjoon Hyun for your reply. We'll hopefully be upgrading to HDP 2.6.3 in the near future and will be able to take advantage of the new speed improvements.
... View more
01-09-2018
07:30 PM
1 Kudo
Hello, I have a simple query of an ORC table which selects a relatively small number of rows from a 10 billion row table. The query is of this form: select * from <table> where <col>=<value> On Hive using Tez it runs in a few seconds. However, using Spark SQL it takes about 5 minutes. Based on everything I see it sure seems like Spark is sweeping through the entire table. I've even set spark.sql.orc.filterPushdown=true, but it doesn't help. Is it reasonable to expect that Spark SQL's performance should be close to that of Hive's? I'm running HDP 2.6.0.3 using Spark 2.1.0. Thanks, Jerrell
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
03-18-2016
11:32 PM
1 Kudo
I'm attempting to run a Spark job via YARN using Gremlin (graph traversal language). However, the Application Master dies with a "bad substitution" error. I can see in the error message that ${hdp.version} isn't being resolved. According to various sources online I should be able to set the following property when I submit my job to fix the issue: spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485 It sure seems like this should work, but it doesn't. Can anybody help?
... View more
Labels:
- Labels:
-
Apache Spark