Member since
09-09-2016
31
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4372 | 01-10-2018 02:25 PM |
09-05-2018
08:39 PM
Thanks Andrew. I thought that was probably the answer. Hoping there was a work around
... View more
01-25-2018
08:17 PM
Hi Eric, Could you use this one http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc/shc/1.1.0.2.6.3.13-5/ instead (which is for spark 2.2)? Updated packages: http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc/shc/ Thanks.
... View more
12-12-2017
04:46 PM
I do have HADOOP_USER_CLASSPATH_FIRST set to true. How do I find where the Hadoop classpath is? In the Hadoop_Env file it's just set as HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}
... View more
06-08-2017
02:18 PM
Development environment IntelliJ 2017.1.4 / jdk 1.8.40 / scala 2.10.5 / spark 1.6.2 / hive 1.2.1 VM : virtualBox / 5 VM (ambari, master slave1, slave2, slave3) The reason is that the jar version is wrong.
I use HDP 2.4.3.0-227. I modified and solved the following. ================ modify bash_profile =============== add export SPARK_HOME=/usr/hdp/current/spark-client =================modify pom.xml======================== <repositories>
<repository>
<id>HDP</id>
<name>HDP Releases</name>
<url>http://repo.hortonworks.com/content/groups/public</url>
<!--url>http://repo.hortonworks.com/content/repositories/releases/</url-->
<layout>default</layout>
<releases>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
<checksumPolicy>warn</checksumPolicy>
</releases>
<snapshots>
<enabled>false</enabled>
<updatePolicy>never</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</snapshots>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.2.2.4.3.0-227</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2.2.4.3.0-227</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.2.2.4.3.0-227</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.2.2.4.3.0-227</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.10</artifactId>
<version>1.6.2.2.4.3.0-227</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.4</version>
</dependency> <!-- Test -->
</dependencies> ====================================================
... View more
01-27-2017
09:40 PM
Thank you Binu, I was thinking that was probably the answer, but I was hoping there was a way to get Hive to work for me. Now, off to figure out HBase......
... View more
03-24-2017
03:26 PM
@Eric Hanson I don't have an official opinion on this. It really depends on the available resources. If the cluster is really large, then it may be beneficial to put the KDC on its own VM; but for a small cluster (<15 hosts), that may be a bit overkill and the least utilized host for the KDC maybe sufficient. That said, the workload could be spread out by placing a one or more slave KDCs around the cluster, There is also the option to separate the kadmin and krb5kdc processes to different hosts - though this is more for security concerns than for performance or resource concerns. One thing to keep in mind. For Ambari server versions 2.5.0 and below, it appears that the cluster does an abnormal amount of kinit's. This is currently being looked into. So far, it is unclear whether this is a bug, expected behavior, or something in between. The effect of this issue on a small cluster is minimal and not noticeable over a short period of time. On a large cluster (say 900 nodes), the Kerberos log files tend to get large quickly. Performance of the KDC on such a cluster, even when the KDC exists on a host with Hadoop services, does not appear to be affected. The main issue is merely log file size. However, if an issue is found and fixed, less kinit's couldn't hurt. 🙂
... View more