About ehanson

ehanson · ‎09-05-2018

Thanks Andrew. I thought that was probably the answer. Hoping there was a work around

wyang · ‎01-25-2018

Hi Eric, Could you use this one http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc/shc/1.1.0.2.6.3.13-5/ instead (which is for spark 2.2)? Updated packages: http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc/shc/ Thanks.

ehanson · ‎12-12-2017

I do have HADOOP_USER_CLASSPATH_FIRST set to true. How do I find where the Hadoop classpath is? In the Hadoop_Env file it's just set as HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}

barodesign_juil · ‎06-08-2017

Development environment IntelliJ 2017.1.4 / jdk 1.8.40 / scala 2.10.5 / spark 1.6.2 / hive 1.2.1 VM : virtualBox / 5 VM (ambari, master slave1, slave2, slave3) The reason is that the jar version is wrong. I use HDP 2.4.3.0-227. I modified and solved the following. ================ modify bash_profile =============== add export SPARK_HOME=/usr/hdp/current/spark-client =================modify pom.xml======================== <repositories> <repository> <id>HDP</id> <name>HDP Releases</name> <url>http://repo.hortonworks.com/content/groups/public</url>  <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>never</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.10.4</version> </dependency>  </dependencies> ====================================================

ehanson · ‎01-27-2017

Thank you Binu, I was thinking that was probably the answer, but I was hoping there was a way to get Hive to work for me. Now, off to figure out HBase......

rlevas · ‎03-24-2017

@Eric Hanson I don't have an official opinion on this. It really depends on the available resources. If the cluster is really large, then it may be beneficial to put the KDC on its own VM; but for a small cluster (<15 hosts), that may be a bit overkill and the least utilized host for the KDC maybe sufficient. That said, the workload could be spread out by placing a one or more slave KDCs around the cluster, There is also the option to separate the kadmin and krb5kdc processes to different hosts - though this is more for security concerns than for performance or resource concerns. One thing to keep in mind. For Ambari server versions 2.5.0 and below, it appears that the cluster does an abnormal amount of kinit's. This is currently being looked into. So far, it is unclear whether this is a bug, expected behavior, or something in between. The effect of this issue on a small cluster is minimal and not noticeable over a short period of time. On a large cluster (say 900 nodes), the Kerberos log files tend to get large quickly. Performance of the KDC on such a cluster, even when the KDC exists on a host with Hadoop services, does not appear to be affected. The main issue is merely log file size. However, if an issue is found and fixed, less kinit's couldn't hurt. 🙂

Online	Offline
Last Visited	‎11-16-2016 09:09 AM

Member Since	‎09-09-2016 12:15 PM
Last Visited	‎11-16-2016 09:09 AM
Posts	31
Kudos received	5

Cloudera Community

Re: Read Hbase tables using Spark 2

Re: Kerberos ldaps subject alternative DNS name

Re: Read Hbase tables using Spark 2

Re: NoSuchMethodError for org.apache.hadoop.ipc.Re...

Re: Local Spark Development against a remote clust...

Re: How best to store near-real time data from Spa...

Re: Is it recommended to use an existing KDC on an...