About jasim_waheed

jasim_waheed · ‎10-24-2017

@nkumar, I have doubts over hive-site.xml, because what I observe is that it creates its own DB instance rather than reference the pre-existing Metastore derby db. Is there something that I should do regarding the classpath?

jasim_waheed · ‎10-23-2017

hivemetastorelog.txt @nkumar attached is the log from metastore.

jasim_waheed · ‎10-23-2017

Hi, I am using Spark 1.6 for my current setup in HDP. I have a task to work with hive tables using Spark in Java. I have noticed that I am able to connect with my DB "TCGA" in Spark-shell. scala> sqlContext.sql("show tables in TCGA") res0: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean] scala> sqlContext.sql("show tables in TCGA").show 17/10/22 21:02:11 INFO SparkContext: Starting job: show at <console>:26 17/10/22 21:02:17 INFO DAGScheduler: Got job 0 (show at <console>:26) with 1 output partitions 17/10/22 21:02:17 INFO DAGScheduler: Final stage: ResultStage 0 (show at <console>:26) 17/10/22 21:02:17 INFO DAGScheduler: Parents of final stage: List() 17/10/22 21:02:18 INFO DAGScheduler: Missing parents: List() 17/10/22 21:02:18 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26), which has no missing parents 17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 511.1 MB) 17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1197.0 B, free 511.1 MB) 17/10/22 21:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45108 (size: 1197.0 B, free: 511.1 MB) 17/10/22 21:02:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1008 17/10/22 21:02:28 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26) 17/10/22 21:02:28 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 17/10/22 21:02:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 3156 bytes) 17/10/22 21:02:35 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 17/10/22 21:02:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2013 bytes result sent to driver 17/10/22 21:02:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7863 ms on localhost (1/1) 17/10/22 21:02:41 INFO DAGScheduler: ResultStage 0 (show at <console>:26) finished in 12.361 s 17/10/22 21:02:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/10/22 21:02:42 INFO DAGScheduler: Job 0 finished: show at <console>:26, took 30.668589 s +--------------------+-----------+ | tableName|isTemporary| +--------------------+-----------+ | cbioportal_new| false| | cbioportal_new_feed| false| |cbioportal_new_in...| false| |cbioportal_new_pr...| false| |cbioportal_new_valid| false| |firebrowse_simple...| false| |firebrowse_simple...| false| |firebrowse_simple...| false| |firebrowse_simple...| false| |firebrowse_simple...| false| | test| false| | test_feed| false| | test_invalid| false| | test_profile| false| | test_valid| false| +--------------------+-----------+ Whereas, I try the same setup in JAVA, i am shown empty list of tables in my Database TCGA. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/10/23 00:28:28 INFO SparkContext: Running Spark version 1.6.3 17/10/23 00:28:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/10/23 00:28:30 INFO SecurityManager: Changing view acls to: root 17/10/23 00:28:30 INFO SecurityManager: Changing modify acls to: root 17/10/23 00:28:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/10/23 00:28:32 INFO Utils: Successfully started service 'sparkDriver' on port 43887. 17/10/23 00:28:32 INFO Slf4jLogger: Slf4jLogger started 17/10/23 00:28:33 INFO Remoting: Starting remoting 17/10/23 00:28:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:33809] 17/10/23 00:28:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 33809. 17/10/23 00:28:33 INFO SparkEnv: Registering MapOutputTracker 17/10/23 00:28:34 INFO SparkEnv: Registering BlockManagerMaster 17/10/23 00:28:34 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a9fe7e0c-5862-4293-b14c-c218d0a85121 17/10/23 00:28:34 INFO MemoryStore: MemoryStore started with capacity 1579.1 MB 17/10/23 00:28:34 INFO SparkEnv: Registering OutputCommitCoordinator 17/10/23 00:28:34 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/10/23 00:28:34 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040 17/10/23 00:28:34 INFO Executor: Starting executor ID driver on host localhost 17/10/23 00:28:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40068. 17/10/23 00:28:34 INFO NettyBlockTransferService: Server created on 40068 17/10/23 00:28:34 INFO BlockManagerMaster: Trying to register BlockManager 17/10/23 00:28:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40068 with 1579.1 MB RAM, BlockManagerId(driver, localhost, 40068) 17/10/23 00:28:34 INFO BlockManagerMaster: Registered BlockManager 17/10/23 00:28:37 INFO HiveContext: Initializing execution hive, version 1.2.1 17/10/23 00:28:37 INFO ClientWrapper: Inspected Hadoop version: 2.5.1 17/10/23 00:28:37 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1 17/10/23 00:28:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/10/23 00:28:38 INFO ObjectStore: ObjectStore, initialize called 17/10/23 00:28:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 17/10/23 00:28:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 17/10/23 00:28:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:02 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 17/10/23 00:29:02 INFO ObjectStore: Initialized ObjectStore 17/10/23 00:29:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/10/23 00:29:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 17/10/23 00:29:07 INFO HiveMetaStore: Added admin role in metastore 17/10/23 00:29:07 INFO HiveMetaStore: Added public role in metastore 17/10/23 00:29:09 INFO HiveMetaStore: No user is added in admin role, since config is empty 17/10/23 00:29:13 INFO HiveMetaStore: 0: get_all_databases 17/10/23 00:29:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 17/10/23 00:29:14 INFO HiveMetaStore: 0: get_functions: db=default pat=* 17/10/23 00:29:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 17/10/23 00:29:14 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/1361a727-483a-424a-8936-ee5979fb5a02_resources 17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02 17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/root/1361a727-483a-424a-8936-ee5979fb5a02 17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02/_tmp_space.db 17/10/23 00:29:20 INFO HiveContext: default warehouse location is /user/hive/warehouse 17/10/23 00:29:20 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 17/10/23 00:29:20 INFO ClientWrapper: Inspected Hadoop version: 2.5.1 17/10/23 00:29:33 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1 17/10/23 00:29:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/10/23 00:29:39 INFO ObjectStore: ObjectStore, initialize called 17/10/23 00:29:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 17/10/23 00:29:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 17/10/23 00:29:44 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:29:51 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 17/10/23 00:29:51 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 17/10/23 00:29:51 INFO ObjectStore: Initialized ObjectStore 17/10/23 00:30:00 INFO HiveMetaStore: Added admin role in metastore 17/10/23 00:30:00 INFO HiveMetaStore: Added public role in metastore 17/10/23 00:30:03 INFO HiveMetaStore: No user is added in admin role, since config is empty 17/10/23 00:30:12 INFO HiveMetaStore: 0: get_all_databases 17/10/23 00:30:12 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 17/10/23 00:30:13 INFO HiveMetaStore: 0: get_functions: db=default pat=* 17/10/23 00:30:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 17/10/23 00:30:13 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/bc93d037-7940-4e83-bc24-02d93dff54bf_resources 17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf 17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/root/bc93d037-7940-4e83-bc24-02d93dff54bf 17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf/_tmp_space.db 17/10/23 00:30:41 INFO HiveMetaStore: 0: get_tables: db=TCGA pat=.* 17/10/23 00:30:41 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_tables: db=TCGA pat=.* 17/10/23 00:31:08 INFO SparkContext: Starting job: show at MavenMainHbase.java:46 17/10/23 00:31:14 INFO DAGScheduler: Got job 0 (show at MavenMainHbase.java:46) with 1 output partitions 17/10/23 00:31:14 INFO DAGScheduler: Final stage: ResultStage 0 (show at MavenMainHbase.java:46) 17/10/23 00:31:14 INFO DAGScheduler: Parents of final stage: List() 17/10/23 00:31:14 INFO DAGScheduler: Missing parents: List() 17/10/23 00:31:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46), which has no missing parents 17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1824.0 B, free 1579.1 MB) 17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1175.0 B, free 1579.1 MB) 17/10/23 00:31:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40068 (size: 1175.0 B, free: 1579.1 MB) 17/10/23 00:31:31 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 17/10/23 00:31:33 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46) 17/10/23 00:31:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 17/10/23 00:31:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2105 bytes) 17/10/23 00:31:38 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 17/10/23 00:31:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 940 bytes result sent to driver 17/10/23 00:31:39 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1815 ms on localhost (1/1) 17/10/23 00:31:39 INFO DAGScheduler: ResultStage 0 (show at MavenMainHbase.java:46) finished in 5.187 s 17/10/23 00:31:39 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/10/23 00:31:40 INFO DAGScheduler: Job 0 finished: show at MavenMainHbase.java:46, took 32.074415 s +---------+-----------+ |tableName|isTemporary| +---------+-----------+ +---------+-----------+ Here, is the sample JAVA code that I used to get the above result SparkConf conf = new SparkConf().setAppName("SparkHive").setMaster("local").setSparkHome("/usr/hdp/2.5.6.0-40/spark/").set("HADOOP_CONF_DIR","/usr/hdp/2.5.6.0-40/hive/conf/").set("spark.driver.extraClassPath","/usr/hdp/2.5.6.0-40/hive/conf"); conf.set("spark.sql.hive.thriftServer.singleSession", "true"); SparkContext sc = new SparkContext(conf); HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc); hiveContext.setConf("hive.metastore.uris","thrift://sandbox.kylo.io:9083"); hiveContext.setConf("spark." + "sql.warehouse.dir","/user/hive/warehouse"); DataFrame df= hiveContext.sql("show tables in TCGA"); df.show(); And, here is my pom.xml: <dependencies>  <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-core</artifactId> <version>4.4.0-HBase-1.1</version> </dependency>   <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-hbase-handler</artifactId> <version>1.2.1</version> </dependency>  <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency>  <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-metastore</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.thrift</groupId> <artifactId>libthrift</artifactId> <version>0.9.0</version> <type>pom</type> </dependency>  <dependency> <groupId>org.apache.thrift</groupId> <artifactId>libfb303</artifactId> <version>0.9.0</version> <type>pom</type> </dependency>  <dependency> <groupId>commons-httpclient</groupId> <artifactId>commons-httpclient</artifactId> <version>3.1</version> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient-osgi</artifactId> <version>4.3-beta2</version> </dependency>  <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-contrib</artifactId> <version>1.2.1</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.3</version> </dependency>  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.3</version> </dependency>   <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.3</version> </dependency>  <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.4.4</version> </dependency>  <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <version>2.4.4</version> </dependency>  <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-annotations</artifactId> <version>2.4.4</version> </dependency> </dependencies> I think it has got to do with the code unable to find hive-site.xml , so i gave all possible classpaths in SparkConf to make it work. but no luck yet. What other configrations I have to set?

jasim_waheed · ‎09-07-2017

@Constantin Stanca I am using Zeppelin with Spark Interpreter to access the hive-hbase table. I get the following error while running the query: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.hbase.HBaseStorageHandler I have added even the required jars in classpath for Spark interpreter but still the issue persists: Could you please let me know what could be the problem?

jasim_waheed · ‎08-11-2017

@Josh Elser I did make change as you said and also added nifi in sudoers as root. But, now I am getting JNI Error: Tried even after chown as nifi:nifi to folder where java is stored. But I still get the sam JNI error. Where am I going wrong? Thanks

jasim_waheed · ‎08-10-2017

Hello, I am using NIFI in HDP sandbox to execute a java jar. This jar contains a simple Hbase which creates a table based on argument supplied from NIFI instance. I used ExecuteStreamComand processor to call this jar with properties as said shown but it throws error as follows: I tried again but by changing the Working Directory to: root/IdeaProjects/mavenhbasetest/target/ I now get the following error : Fyi, I tried testing the jar in shell and it works fine. Could someone provide me help on where am I going wrong? Thanks and regards, Jasim

Online	Offline
Last Visited	‎10-12-2018 04:16 PM

Member Since	‎04-23-2017 05:08 PM
Last Visited	‎10-12-2018 04:16 PM
Posts	21
Kudos received	1

Cloudera Community

Re: HiveContext unable to connect to exisiting Met...

Re: HiveContext unable to connect to exisiting Met...

HiveContext unable to connect to exisiting Metasto...

Re: Spark sqlContext - unable to access hbase tabl...

Re: Unable to execute ExecuteStreamCommand process...

Unable to execute ExecuteStreamCommand processor i...