Member since
11-08-2016
32
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1079 | 11-09-2016 08:58 AM |
04-20-2017
05:51 AM
Hi @Jay Zhou Can you be a bit more specific what you have changed? What did you exactly do with this line? val hiveSqlContext = new org.apache.spark.sql.hive.HiveContext(sc) I have a similar problem where I get an error WARN Hive: Failed to access metastore. This class should not accessed in runtime. but this is only when I run the job via Oozie. When I use spark submit the code works so I guess the dependencies are right. Do you have any idea what can cause this?
... View more
03-23-2017
06:39 AM
Hi @Vinod Bonthu, your answer really helped but unfortunately it wasn't the complete solution.
As statd in my error log I had to add some jars via the spark-submit --jars
command and I also added my hive site with spark-submit --files /usr/hdp/current/spark-client/conf/hive-site.xml
With those two changes and removing .setMaster("local[2]")
it worked! Thanks for your help!
... View more
03-22-2017
11:21 AM
Yes I did. The reult is -rwxr-xr-x 3 USER USER 12252 2017-03-17 10:49 /path-to-jar/my.jar And the ouput when I run the spark-commit command is stated above.
... View more
03-22-2017
10:33 AM
Hi @Aditya Deshpande, 17/03/22 11:27:53 INFO HiveContext: Initializing execution hive, version 1.2.1
17/03/22 11:27:53 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.5.0.0-1245
17/03/22 11:27:53 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.5.0.0-1245
17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called
17/03/22 11:27:53 WARN HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
...
...
That is the first error I find in the logs in stderr. Another one is 17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called
17/03/22 11:27:53 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
...
...
Can you make use of this?
... View more
03-22-2017
06:41 AM
Hi @Vinod Bonthu, Thanks gor the quick guide but I have a follow up question. So it seems my pom is fine since I can package it with maven. Although when I run it via "spark-submit" I get the error stated in my original post. But when I run one of the spark examples by executing below code works. So why is that? I have the feeling that still something is not right and as a forst step I just want to run an example code packaged by myself via spark-submit. Afterwards I will try an oozie workflow. Thanks again! spark-submit org.apache.spark.exaples.SOMEEXAMPlE --master yarn-cluster hdfs://path-to-spark-examples.jar
... View more
03-21-2017
05:20 PM
Hi pbarna, Oh I am sorry. Yes I could run mvn package without any errors (at the beginninh some dependencies were missing but I fixed that). The error in my original post is when I try to run my paavked jar on HDP. So I do not need any cluster or Hortonworks specific things in my pom, right?
... View more
03-21-2017
02:59 PM
Hi folks, I would like to make a minimal example packed with Maven based on Scala code (like a hello world) and run it on a HDP2.5 sandbox. What do I need to specify in my pom.xml? So far I have this: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test.spark</groupId>
<artifactId>Test</artifactId>
<version>0.0.1</version>
<name>${project.artifactId}</name>
<description>Simple test app</description>
<inceptionYear>2017</inceptionYear>
<!-- change from 1.6 to 1.7 depending on Java version -->
<properties>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.5</scala.version>
<scala.compat.version>2.11</scala.compat.version>
<spark.version>1.6.1</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- Spark dependency -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- Spark sql dependency -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- Spark hive dependency -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<!-- Create JAR with all dependencies -->
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<scalaCompatVersion>${scala.compat.version}</scalaCompatVersion>
</configuration>
<executions>
<execution>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
And my scala code is this: package com.test.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql
import org.apache.commons.lang
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.rdd.RDD
object Test {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Test") .setMaster("local[2]")
val spark = new SparkContext(conf)
println( "Hello World!" )
}
}
I run the code with spark-submit --class com.test.spark.Test --master yarn --deploy-mode cluster hdfs://HDP25/test.jar
Unfortunately it does not run. 😞 See the image attached. Am I missing something? Can you please help me to get a minimal example running? Thanks and kind regards
... View more
Labels:
- Labels:
-
Apache Spark
02-27-2017
10:15 AM
Hi Sunile, Thanks for your answer. We think we will store our initial model and then all alter scripts. But all alter scripts will be included in the initial model in cas a complete re-deployment is wanted. To view a logical model a tool will be used which can reverse engineer the ddl. We try to establish a workflow in that fashion and hope that works.
... View more
02-14-2017
09:38 AM
Hi guys,
I am looking for a best practice using Hive especially for database modeling, software development and if possible a version control. At the moment I struggle at a certain point where the logical world meets the code.
I have found tools which assist modelling databases in Hive, e.g. Embaracadero (Hortonworks Partner?). So I could model my databases there and create DDL scripts, I guess. To get version control I can add those scripts to git or something else. What happens if many users want to work on the logical model? How do you handle such problems? Jumping back and forth between database model versions is only possible with git versioning not only by the tool, or is it? All other scripts regarding the hive databases and tables (ingestion and so on) live in a git repository. So they are under perfect version control but if something changes many adaptions have to be made (at least one in the config file and maybe in Insert statements etc.). What I am missing in the code world is a nice view of databases and tablss even a entity-relation-diagramm but that's not of main interest. What is in your opinion a good way to tackle these problems? I mean someone like Facebook does not want to manage all tables and databases via a Hive View or solely based on code, or do they? How to keep the oversight in the big data world? Any help is really appreciated! Thanks in advance! Kind regards, Ken
... View more
Labels:
- Labels:
-
Apache Hive
02-13-2017
08:02 AM
1 Kudo
Hi Timothy, Okay I had a closer look into it. For me it looks like ApacheNiFi (Hortonworks' DataFlow) is more or less a tool piping your data from a non Hadoopsystem (RDMBS, IoT,...) into Hadoop. Thereafter, an other tool is needed to manage data. Here, Apache Falcon has its strength. Airflow, Luigi, Azkaban are solutions for broader scheduling tasks and need more effort to be installed (next) to your cluster. Quickly dipping my toe into scheduling with Spark I didn't come up with many resources. Last but not least Oozie (e.g. managed via Hue) seems like the easiest fit to manage all kind of workflows (Sqoop, Hive, Shell, Spark,...) within a cluster. Of course, I have dependencies between single action whereas dependencies between single coordinators is missing. In my humble opinion this funcitonality can be added with flagfiles. I think, Oozie is still the best fit although it is cumbersome to handle via xml files. Of course there is the Eclipse plugin to visualize workflows and create them as well. Feel free to correct my views. Thanks!
... View more