About bikas

bikas · ‎01-25-2018

If you have HDP 2.6.3 then you should be able to find spark 2.2 version of spark-llap available under /usr/hdp/current/ Perhaps you are using the older versions of shc using --packages and thats not compatible with spark 2.2.

bikas · ‎04-24-2017

/etc/hive/conf/hive-site.xml is the config for Hive service itself and is managed via Ambari through the Hive service config page. /usr/hdp/current/spark-client/conf/hive-site.xml actually points to /etc/spark/conf/hive-site.xml . This is the minimal hive config that Spark needs to access Hive. This is managed via Ambari through the Spark service config page. Ambari correctly configures this hive site for Kerberos. Depending upon your version of HDP you may not have the correct support in Ambari for configuring Livy.

bikas · ‎03-12-2017

Is this needed even after the HDP 2.5 native Oozie Spark action?

bikas · ‎03-09-2017

Apache Spark has traditionally worked sub-optimally with ORC because ORC used to be inside Apache Hive and Apache Spark depends on a very old release of Hive 1.2.1 from mid-2015. We are working on figuring out how to best update Apache Spark's version of ORC either by upgrading Apache Spark's dependency to latest Apache Hive or depending for ORC from the new Apache ORC project.

bikas · ‎03-08-2017

For JDBC there is a built-in jar for JDBC support. No need for Simba.

bikas · ‎03-08-2017

Probably missing having hbase-site or phoenix conf in the classpath. So it cannot find the ZK info for hbase/phoenix.

bikas · ‎03-01-2017

Fixed the link. It has the code.

bikas · ‎02-23-2017

What exception/error happens in code 2? Just curious. foreachRDD is the prescriptive method to write to external systems. So you should be using foreachRDD. The outer loop executes on the driver and inner loop on the executors. Executors run on remote machines in a cluster. However in the code above its not clear how dynamoConnection is available to executors since such network connections are usually not serializable. Or is the following line inadvertently missing from snippet 1. val dynamoConnection = setupDynamoClientConnection() If yes, then the slowness could stem from repeatedly creating a dynamoClientConnection for each record. The recommended pattern is to use foreachPartition() to create the connection once per partition and then rdd.foreach() to write the records using that connection. For more info please search for foreachPartition see http://spark.apache.org/docs/latest/streaming-programming-guide.html

bikas · ‎02-21-2017

That log4j only affects the service daemons like spark history server and anything you run on the client machines. For executors/drivers that run on YARN machines, the log4j file has to be passed to them using "--files" option during job submit and then referenced via JVM property via JVM arguments "-Dlog4j.configuration" . See here for examples.

bikas · ‎02-16-2017

Ambari should be giving the option of which HDP 2.5.x version to install. Choosing higher versions would give higher Apache versions. E.g. HDP 2.5.3 will give Apache Spark 2.0.1. The next HDP 2.5.4+ release will give 2.0.2 HDP 2.6 (not released yet) will have Apache Spark 2.1. You can try a tech preview of that on HDC

Online	Offline
Last Visited	‎09-23-2018 04:18 AM

Member Since	‎10-09-2015 06:38 PM
Last Visited	‎09-23-2018 04:18 AM
Posts	76
Kudos received	33

Cloudera Community

Re: Dataframe Insert into ORC table is slow compar...

Re: Spark map vs foreachRdd

Re: How to configure spark-log4j-properties in Amb...

Re: Spark 2 Technical preview with patches

Re: Spark Hbase connector latest version for Spark...

Re: Read Hbase tables using Spark 2

Re: Why do I have two hive-site.xml config files o...

Re: Oozie Spark Action to access Hive using HiveCo...

Re: Dataframe Insert into ORC table is slow compar...

Re: Connecting BI tools to Spark

Re: Using Phoenix in Spark on Zeppelin - Failing o...

Re: Spark map vs foreachRdd

Re: Spark map vs foreachRdd

Re: How to configure spark-log4j-properties in Amb...

Re: Spark 2 Technical preview with patches