About jiiiiken88

jiiiiken88 · ‎04-20-2017

Hi @Jay Zhou Can you be a bit more specific what you have changed? What did you exactly do with this line? val hiveSqlContext = new org.apache.spark.sql.hive.HiveContext(sc) I have a similar problem where I get an error WARN Hive: Failed to access metastore. This class should not accessed in runtime. but this is only when I run the job via Oozie. When I use spark submit the code works so I guess the dependencies are right. Do you have any idea what can cause this?

jiiiiken88 · ‎03-23-2017

Hi @Vinod Bonthu, your answer really helped but unfortunately it wasn't the complete solution. As statd in my error log I had to add some jars via the spark-submit --jars command and I also added my hive site with spark-submit --files /usr/hdp/current/spark-client/conf/hive-site.xml With those two changes and removing .setMaster("local[2]") it worked! Thanks for your help!

jiiiiken88 · ‎03-22-2017

Yes I did. The reult is -rwxr-xr-x 3 USER USER 12252 2017-03-17 10:49 /path-to-jar/my.jar And the ouput when I run the spark-commit command is stated above.

jiiiiken88 · ‎03-22-2017

Hi @Aditya Deshpande, 17/03/22 11:27:53 INFO HiveContext: Initializing execution hive, version 1.2.1 17/03/22 11:27:53 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.5.0.0-1245 17/03/22 11:27:53 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.5.0.0-1245 17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called 17/03/22 11:27:53 WARN HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) ... ... That is the first error I find in the logs in stderr. Another one is 17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called 17/03/22 11:27:53 WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) ... ... Can you make use of this?

jiiiiken88 · ‎03-22-2017

Hi @Vinod Bonthu, Thanks gor the quick guide but I have a follow up question. So it seems my pom is fine since I can package it with maven. Although when I run it via "spark-submit" I get the error stated in my original post. But when I run one of the spark examples by executing below code works. So why is that? I have the feeling that still something is not right and as a forst step I just want to run an example code packaged by myself via spark-submit. Afterwards I will try an oozie workflow. Thanks again! spark-submit org.apache.spark.exaples.SOMEEXAMPlE --master yarn-cluster hdfs://path-to-spark-examples.jar

jiiiiken88 · ‎03-21-2017

Hi pbarna, Oh I am sorry. Yes I could run mvn package without any errors (at the beginninh some dependencies were missing but I fixed that). The error in my original post is when I try to run my paavked jar on HDP. So I do not need any cluster or Hortonworks specific things in my pom, right?

jiiiiken88 · ‎03-21-2017

Hi folks, I would like to make a minimal example packed with Maven based on Scala code (like a hello world) and run it on a HDP2.5 sandbox. What do I need to specify in my pom.xml? So far I have this: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.test.spark</groupId> <artifactId>Test</artifactId> <version>0.0.1</version> <name>${project.artifactId}</name> <description>Simple test app</description> <inceptionYear>2017</inceptionYear>  <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.11.5</scala.version> <scala.compat.version>2.11</scala.compat.version> <spark.version>1.6.1</spark.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.compat.version}</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory>  <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> <plugin>  <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.2</version> <configuration> <scalaVersion>${scala.version}</scalaVersion> <scalaCompatVersion>${scala.compat.version}</scalaCompatVersion> </configuration> <executions> <execution> <phase>compile</phase> <goals> <goal>compile</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project> And my scala code is this: package com.test.spark import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql import org.apache.commons.lang import org.apache.spark.sql.SQLContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.rdd.RDD object Test { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Test") .setMaster("local[2]") val spark = new SparkContext(conf) println( "Hello World!" ) } } I run the code with spark-submit --class com.test.spark.Test --master yarn --deploy-mode cluster hdfs://HDP25/test.jar Unfortunately it does not run. 😞 See the image attached. Am I missing something? Can you please help me to get a minimal example running? Thanks and kind regards

jiiiiken88 · ‎02-27-2017

Hi Sunile, Thanks for your answer. We think we will store our initial model and then all alter scripts. But all alter scripts will be included in the initial model in cas a complete re-deployment is wanted. To view a logical model a tool will be used which can reverse engineer the ddl. We try to establish a workflow in that fashion and hope that works.

jiiiiken88 · ‎02-14-2017

Hi guys, I am looking for a best practice using Hive especially for database modeling, software development and if possible a version control. At the moment I struggle at a certain point where the logical world meets the code. I have found tools which assist modelling databases in Hive, e.g. Embaracadero (Hortonworks Partner?). So I could model my databases there and create DDL scripts, I guess. To get version control I can add those scripts to git or something else. What happens if many users want to work on the logical model? How do you handle such problems? Jumping back and forth between database model versions is only possible with git versioning not only by the tool, or is it? All other scripts regarding the hive databases and tables (ingestion and so on) live in a git repository. So they are under perfect version control but if something changes many adaptions have to be made (at least one in the config file and maybe in Insert statements etc.). What I am missing in the code world is a nice view of databases and tablss even a entity-relation-diagramm but that's not of main interest. What is in your opinion a good way to tackle these problems? I mean someone like Facebook does not want to manage all tables and databases via a Hive View or solely based on code, or do they? How to keep the oversight in the big data world? Any help is really appreciated! Thanks in advance! Kind regards, Ken

jiiiiken88 · ‎02-13-2017

Hi Timothy, Okay I had a closer look into it. For me it looks like ApacheNiFi (Hortonworks' DataFlow) is more or less a tool piping your data from a non Hadoopsystem (RDMBS, IoT,...) into Hadoop. Thereafter, an other tool is needed to manage data. Here, Apache Falcon has its strength. Airflow, Luigi, Azkaban are solutions for broader scheduling tasks and need more effort to be installed (next) to your cluster. Quickly dipping my toe into scheduling with Spark I didn't come up with many resources. Last but not least Oozie (e.g. managed via Hue) seems like the easiest fit to manage all kind of workflows (Sqoop, Hive, Shell, Spark,...) within a cluster. Of course, I have dependencies between single action whereas dependencies between single coordinators is missing. In my humble opinion this funcitonality can be added with flagfiles. I think, Oozie is still the best fit although it is cumbersome to handle via xml files. Of course there is the Eclipse plugin to visualize workflows and create them as well. Feel free to correct my views. Thanks!

Online	Offline
Last Visited	‎06-19-2017 11:48 AM

Member Since	‎11-08-2016 02:47 PM
Last Visited	‎06-19-2017 11:48 AM
Posts	32
Kudos received	7

Cloudera Community

Re: Oozie Workflow with Sqoop Action to Oracle Dat...

Re: Spark job failed when new HiveContext object

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Re: Minimal executable jar based on Scala code pac...

Minimal executable jar based on Scala code packed ...

Re: Database modeling, software development, and v...

Database modeling, software development, and versi...

Re: Apache Airflow