Member since
04-23-2017
21
Posts
1
Kudos Received
0
Solutions
10-24-2017
10:15 AM
@nkumar, I have doubts over hive-site.xml, because what I observe is that it creates its own DB instance rather than reference the pre-existing Metastore derby db. Is there something that I should do regarding the classpath?
... View more
10-23-2017
10:23 PM
hivemetastorelog.txt @nkumar attached is the log from metastore.
... View more
10-23-2017
03:00 PM
Hi, I am using Spark 1.6 for my current setup in HDP. I have a task to work with hive tables using Spark in Java. I have noticed that I am able to connect with my DB "TCGA" in Spark-shell. scala> sqlContext.sql("show tables in TCGA")
res0: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]
scala> sqlContext.sql("show tables in TCGA").show
17/10/22 21:02:11 INFO SparkContext: Starting job: show at <console>:26
17/10/22 21:02:17 INFO DAGScheduler: Got job 0 (show at <console>:26) with 1 output partitions
17/10/22 21:02:17 INFO DAGScheduler: Final stage: ResultStage 0 (show at <console>:26)
17/10/22 21:02:17 INFO DAGScheduler: Parents of final stage: List()
17/10/22 21:02:18 INFO DAGScheduler: Missing parents: List()
17/10/22 21:02:18 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26), which has no missing parents
17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 511.1 MB)
17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1197.0 B, free 511.1 MB)
17/10/22 21:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45108 (size: 1197.0 B, free: 511.1 MB)
17/10/22 21:02:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1008
17/10/22 21:02:28 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26)
17/10/22 21:02:28 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/22 21:02:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 3156 bytes)
17/10/22 21:02:35 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/22 21:02:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2013 bytes result sent to driver
17/10/22 21:02:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7863 ms on localhost (1/1)
17/10/22 21:02:41 INFO DAGScheduler: ResultStage 0 (show at <console>:26) finished in 12.361 s
17/10/22 21:02:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/22 21:02:42 INFO DAGScheduler: Job 0 finished: show at <console>:26, took 30.668589 s
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| cbioportal_new| false|
| cbioportal_new_feed| false|
|cbioportal_new_in...| false|
|cbioportal_new_pr...| false|
|cbioportal_new_valid| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
| test| false|
| test_feed| false|
| test_invalid| false|
| test_profile| false|
| test_valid| false|
+--------------------+-----------+
Whereas, I try the same setup in JAVA, i am shown empty list of tables in my Database TCGA. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/10/23 00:28:28 INFO SparkContext: Running Spark version 1.6.3
17/10/23 00:28:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/23 00:28:30 INFO SecurityManager: Changing view acls to: root
17/10/23 00:28:30 INFO SecurityManager: Changing modify acls to: root
17/10/23 00:28:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/10/23 00:28:32 INFO Utils: Successfully started service 'sparkDriver' on port 43887.
17/10/23 00:28:32 INFO Slf4jLogger: Slf4jLogger started
17/10/23 00:28:33 INFO Remoting: Starting remoting
17/10/23 00:28:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:33809]
17/10/23 00:28:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 33809.
17/10/23 00:28:33 INFO SparkEnv: Registering MapOutputTracker
17/10/23 00:28:34 INFO SparkEnv: Registering BlockManagerMaster
17/10/23 00:28:34 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a9fe7e0c-5862-4293-b14c-c218d0a85121
17/10/23 00:28:34 INFO MemoryStore: MemoryStore started with capacity 1579.1 MB
17/10/23 00:28:34 INFO SparkEnv: Registering OutputCommitCoordinator
17/10/23 00:28:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/23 00:28:34 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
17/10/23 00:28:34 INFO Executor: Starting executor ID driver on host localhost
17/10/23 00:28:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40068.
17/10/23 00:28:34 INFO NettyBlockTransferService: Server created on 40068
17/10/23 00:28:34 INFO BlockManagerMaster: Trying to register BlockManager
17/10/23 00:28:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40068 with 1579.1 MB RAM, BlockManagerId(driver, localhost, 40068)
17/10/23 00:28:34 INFO BlockManagerMaster: Registered BlockManager
17/10/23 00:28:37 INFO HiveContext: Initializing execution hive, version 1.2.1
17/10/23 00:28:37 INFO ClientWrapper: Inspected Hadoop version: 2.5.1
17/10/23 00:28:37 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1
17/10/23 00:28:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/10/23 00:28:38 INFO ObjectStore: ObjectStore, initialize called
17/10/23 00:28:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/10/23 00:28:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/10/23 00:28:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:02 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/10/23 00:29:02 INFO ObjectStore: Initialized ObjectStore
17/10/23 00:29:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/10/23 00:29:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/10/23 00:29:07 INFO HiveMetaStore: Added admin role in metastore
17/10/23 00:29:07 INFO HiveMetaStore: Added public role in metastore
17/10/23 00:29:09 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/10/23 00:29:13 INFO HiveMetaStore: 0: get_all_databases
17/10/23 00:29:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/10/23 00:29:14 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/10/23 00:29:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/10/23 00:29:14 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/1361a727-483a-424a-8936-ee5979fb5a02_resources
17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02
17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/root/1361a727-483a-424a-8936-ee5979fb5a02
17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02/_tmp_space.db
17/10/23 00:29:20 INFO HiveContext: default warehouse location is /user/hive/warehouse
17/10/23 00:29:20 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/10/23 00:29:20 INFO ClientWrapper: Inspected Hadoop version: 2.5.1
17/10/23 00:29:33 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1
17/10/23 00:29:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/10/23 00:29:39 INFO ObjectStore: ObjectStore, initialize called
17/10/23 00:29:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/10/23 00:29:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/10/23 00:29:44 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
17/10/23 00:29:51 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/10/23 00:29:51 INFO ObjectStore: Initialized ObjectStore
17/10/23 00:30:00 INFO HiveMetaStore: Added admin role in metastore
17/10/23 00:30:00 INFO HiveMetaStore: Added public role in metastore
17/10/23 00:30:03 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/10/23 00:30:12 INFO HiveMetaStore: 0: get_all_databases
17/10/23 00:30:12 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/10/23 00:30:13 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/10/23 00:30:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/10/23 00:30:13 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/bc93d037-7940-4e83-bc24-02d93dff54bf_resources
17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf
17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/root/bc93d037-7940-4e83-bc24-02d93dff54bf
17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf/_tmp_space.db
17/10/23 00:30:41 INFO HiveMetaStore: 0: get_tables: db=TCGA pat=.*
17/10/23 00:30:41 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_tables: db=TCGA pat=.*
17/10/23 00:31:08 INFO SparkContext: Starting job: show at MavenMainHbase.java:46
17/10/23 00:31:14 INFO DAGScheduler: Got job 0 (show at MavenMainHbase.java:46) with 1 output partitions
17/10/23 00:31:14 INFO DAGScheduler: Final stage: ResultStage 0 (show at MavenMainHbase.java:46)
17/10/23 00:31:14 INFO DAGScheduler: Parents of final stage: List()
17/10/23 00:31:14 INFO DAGScheduler: Missing parents: List()
17/10/23 00:31:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46), which has no missing parents
17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1824.0 B, free 1579.1 MB)
17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1175.0 B, free 1579.1 MB)
17/10/23 00:31:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40068 (size: 1175.0 B, free: 1579.1 MB)
17/10/23 00:31:31 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/23 00:31:33 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46)
17/10/23 00:31:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/23 00:31:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2105 bytes)
17/10/23 00:31:38 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/23 00:31:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 940 bytes result sent to driver
17/10/23 00:31:39 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1815 ms on localhost (1/1)
17/10/23 00:31:39 INFO DAGScheduler: ResultStage 0 (show at MavenMainHbase.java:46) finished in 5.187 s
17/10/23 00:31:39 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/23 00:31:40 INFO DAGScheduler: Job 0 finished: show at MavenMainHbase.java:46, took 32.074415 s
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+
Here, is the sample JAVA code that I used to get the above result SparkConf conf = new SparkConf().setAppName("SparkHive").setMaster("local").setSparkHome("/usr/hdp/2.5.6.0-40/spark/").set("HADOOP_CONF_DIR","/usr/hdp/2.5.6.0-40/hive/conf/").set("spark.driver.extraClassPath","/usr/hdp/2.5.6.0-40/hive/conf");
conf.set("spark.sql.hive.thriftServer.singleSession", "true");
SparkContext sc = new SparkContext(conf);
HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);
hiveContext.setConf("hive.metastore.uris","thrift://sandbox.kylo.io:9083");
hiveContext.setConf("spark." +
"sql.warehouse.dir","/user/hive/warehouse");
DataFrame df= hiveContext.sql("show tables in TCGA");
df.show(); And, here is my pom.xml: <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-core -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.4.0-HBase-1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.thrift/libfb303 -->
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient-osgi</artifactId>
<version>4.3-beta2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-contrib -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-contrib</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.4.4</version>
</dependency>
</dependencies>
I think it has got to do with the code unable to find hive-site.xml , so i gave all possible classpaths in SparkConf to make it work. but no luck yet. What other configrations I have to set?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-07-2017
11:54 AM
@Constantin Stanca I am using Zeppelin with Spark Interpreter to access the hive-hbase table. I get the following error while running the query: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.hbase.HBaseStorageHandler
I have added even the required jars in classpath for Spark interpreter but still the issue persists: Could you please let me know what could be the problem?
... View more
08-11-2017
03:23 PM
@Josh Elser I did make change as you said and also added nifi in sudoers as root. But, now I am getting JNI Error: Tried even after chown as nifi:nifi to folder where java is stored. But I still get the sam JNI error. Where am I going wrong? Thanks
... View more
08-10-2017
03:35 PM
1 Kudo
Hello, I am using NIFI in HDP sandbox to execute a java jar. This jar contains a simple Hbase which creates a table based on argument supplied from NIFI instance. I used ExecuteStreamComand processor to call this jar with properties as said shown but it throws error as follows: I tried again but by changing the Working Directory to: root/IdeaProjects/mavenhbasetest/target/ I now get the following error : Fyi, I tried testing the jar in shell and it works fine. Could someone provide me help on where am I going wrong? Thanks and regards, Jasim
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi