Member since
10-09-2015
76
Posts
33
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4950 | 03-09-2017 09:08 PM | |
5279 | 02-23-2017 08:01 AM | |
1705 | 02-21-2017 03:04 AM | |
2075 | 02-16-2017 08:00 AM | |
1089 | 01-26-2017 06:32 PM |
01-25-2018
07:13 PM
If you have HDP 2.6.3 then you should be able to find spark 2.2 version of spark-llap available under /usr/hdp/current/ Perhaps you are using the older versions of shc using --packages and thats not compatible with spark 2.2.
... View more
04-24-2017
08:47 PM
/etc/hive/conf/hive-site.xml is the config for Hive service itself and is managed via Ambari through the Hive service config page. /usr/hdp/current/spark-client/conf/hive-site.xml actually points to /etc/spark/conf/hive-site.xml . This is the minimal hive config that Spark needs to access Hive. This is managed via Ambari through the Spark service config page. Ambari correctly configures this hive site for Kerberos. Depending upon your version of HDP you may not have the correct support in Ambari for configuring Livy.
... View more
03-12-2017
01:51 AM
Is this needed even after the HDP 2.5 native Oozie Spark action?
... View more
03-09-2017
09:08 PM
1 Kudo
Apache Spark has traditionally worked sub-optimally with ORC because ORC used to be inside Apache Hive and Apache Spark depends on a very old release of Hive 1.2.1 from mid-2015. We are working on figuring out how to best update Apache Spark's version of ORC either by upgrading Apache Spark's dependency to latest Apache Hive or depending for ORC from the new Apache ORC project.
... View more
03-08-2017
06:54 PM
1 Kudo
For JDBC there is a built-in jar for JDBC support. No need for Simba.
... View more
03-08-2017
06:51 PM
2 Kudos
Probably missing having hbase-site or phoenix conf in the classpath. So it cannot find the ZK info for hbase/phoenix.
... View more
02-23-2017
08:01 AM
What exception/error happens in code 2? Just curious. foreachRDD is the prescriptive method to write to external systems. So you should be using foreachRDD. The outer loop executes on the driver and inner loop on the executors. Executors run on remote machines in a cluster. However in the code above its not clear how dynamoConnection is available to executors since such network connections are usually not serializable. Or is the following line inadvertently missing from snippet 1. val dynamoConnection = setupDynamoClientConnection() If yes, then the slowness could stem from repeatedly creating a dynamoClientConnection for each record. The recommended pattern is to use foreachPartition() to create the connection once per partition and then rdd.foreach() to write the records using that connection. For more info please search for foreachPartition see http://spark.apache.org/docs/latest/streaming-programming-guide.html
... View more
02-21-2017
03:04 AM
1 Kudo
That log4j only affects the service daemons like spark history server and anything you run on the client machines. For executors/drivers that run on YARN machines, the log4j file has to be passed to them using "--files" option during job submit and then referenced via JVM property via JVM arguments "-Dlog4j.configuration" . See here for examples.
... View more
02-16-2017
08:00 AM
1 Kudo
Ambari should be giving the option of which HDP 2.5.x version to install. Choosing higher versions would give higher Apache versions. E.g. HDP 2.5.3 will give Apache Spark 2.0.1. The next HDP 2.5.4+ release will give 2.0.2 HDP 2.6 (not released yet) will have Apache Spark 2.1. You can try a tech preview of that on HDC
... View more