Member since
04-23-2017
21
Posts
1
Kudos Received
0
Solutions
10-12-2018
04:16 PM
Hello all, I am in a situation where I would like to store data as respective monthly CSVs using SQL query into SFTP server. For instance, my query is : select fooId, bar from FooBar where query_date>=20180101 and query_date<20180201 (for the month of January 2018) I would like to store it as 20180101_FooBar.csv on to my SFTP server. Similarly, other files for other months follow the same process with different query_date interval. Important consideration to make : I have to store the fooId as MD5 Hash string. How may I automate this flow in NIFI? Roughly, the flow that I foresee is: ExecuteSQL(but not sure how to paramterize the counter for query_date) -> ConvertAvroToJson -> EvaluateJsonPath (to extract the fooID ) -> HashContent -> MergeContent Please advice me on the same, how I may take this forward. Thanks
... View more
Labels:
10-24-2017
10:15 AM
@nkumar, I have doubts over hive-site.xml, because what I observe is that it creates its own DB instance rather than reference the pre-existing Metastore derby db. Is there something that I should do regarding the classpath?
... View more
10-23-2017
10:23 PM
hivemetastorelog.txt @nkumar attached is the log from metastore.
... View more
10-23-2017
03:00 PM
Hi, I am using Spark 1.6 for my current setup in HDP. I have a task to work with hive tables using Spark in Java. I have noticed that I am able to connect with my DB "TCGA" in Spark-shell. scala> sqlContext.sql("show tables in TCGA")
res0: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]
scala> sqlContext.sql("show tables in TCGA").show
17/10/22 21:02:11 INFO SparkContext: Starting job: show at <console>:26
17/10/22 21:02:17 INFO DAGScheduler: Got job 0 (show at <console>:26) with 1 output partitions
17/10/22 21:02:17 INFO DAGScheduler: Final stage: ResultStage 0 (show at <console>:26)
17/10/22 21:02:17 INFO DAGScheduler: Parents of final stage: List()
17/10/22 21:02:18 INFO DAGScheduler: Missing parents: List()
17/10/22 21:02:18 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26), which has no missing parents
17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 511.1 MB)
17/10/22 21:02:23 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1197.0 B, free 511.1 MB)
17/10/22 21:02:23 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:45108 (size: 1197.0 B, free: 511.1 MB)
17/10/22 21:02:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1008
17/10/22 21:02:28 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at show at <console>:26)
17/10/22 21:02:28 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/22 21:02:34 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 3156 bytes)
17/10/22 21:02:35 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/22 21:02:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2013 bytes result sent to driver
17/10/22 21:02:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7863 ms on localhost (1/1)
17/10/22 21:02:41 INFO DAGScheduler: ResultStage 0 (show at <console>:26) finished in 12.361 s
17/10/22 21:02:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/22 21:02:42 INFO DAGScheduler: Job 0 finished: show at <console>:26, took 30.668589 s
+--------------------+-----------+
| tableName|isTemporary|
+--------------------+-----------+
| cbioportal_new| false|
| cbioportal_new_feed| false|
|cbioportal_new_in...| false|
|cbioportal_new_pr...| false|
|cbioportal_new_valid| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
|firebrowse_simple...| false|
| test| false|
| test_feed| false|
| test_invalid| false|
| test_profile| false|
| test_valid| false|
+--------------------+-----------+
Whereas, I try the same setup in JAVA, i am shown empty list of tables in my Database TCGA. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/10/23 00:28:28 INFO SparkContext: Running Spark version 1.6.3
17/10/23 00:28:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/23 00:28:30 INFO SecurityManager: Changing view acls to: root
17/10/23 00:28:30 INFO SecurityManager: Changing modify acls to: root
17/10/23 00:28:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/10/23 00:28:32 INFO Utils: Successfully started service 'sparkDriver' on port 43887.
17/10/23 00:28:32 INFO Slf4jLogger: Slf4jLogger started
17/10/23 00:28:33 INFO Remoting: Starting remoting
17/10/23 00:28:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:33809]
17/10/23 00:28:33 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 33809.
17/10/23 00:28:33 INFO SparkEnv: Registering MapOutputTracker
17/10/23 00:28:34 INFO SparkEnv: Registering BlockManagerMaster
17/10/23 00:28:34 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a9fe7e0c-5862-4293-b14c-c218d0a85121
17/10/23 00:28:34 INFO MemoryStore: MemoryStore started with capacity 1579.1 MB
17/10/23 00:28:34 INFO SparkEnv: Registering OutputCommitCoordinator
17/10/23 00:28:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/23 00:28:34 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
17/10/23 00:28:34 INFO Executor: Starting executor ID driver on host localhost
17/10/23 00:28:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40068.
17/10/23 00:28:34 INFO NettyBlockTransferService: Server created on 40068
17/10/23 00:28:34 INFO BlockManagerMaster: Trying to register BlockManager
17/10/23 00:28:34 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40068 with 1579.1 MB RAM, BlockManagerId(driver, localhost, 40068)
17/10/23 00:28:34 INFO BlockManagerMaster: Registered BlockManager
17/10/23 00:28:37 INFO HiveContext: Initializing execution hive, version 1.2.1
17/10/23 00:28:37 INFO ClientWrapper: Inspected Hadoop version: 2.5.1
17/10/23 00:28:37 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1
17/10/23 00:28:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/10/23 00:28:38 INFO ObjectStore: ObjectStore, initialize called
17/10/23 00:28:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/10/23 00:28:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/10/23 00:28:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:00 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:02 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/10/23 00:29:02 INFO ObjectStore: Initialized ObjectStore
17/10/23 00:29:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/10/23 00:29:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/10/23 00:29:07 INFO HiveMetaStore: Added admin role in metastore
17/10/23 00:29:07 INFO HiveMetaStore: Added public role in metastore
17/10/23 00:29:09 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/10/23 00:29:13 INFO HiveMetaStore: 0: get_all_databases
17/10/23 00:29:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/10/23 00:29:14 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/10/23 00:29:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/10/23 00:29:14 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/1361a727-483a-424a-8936-ee5979fb5a02_resources
17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02
17/10/23 00:29:19 INFO SessionState: Created local directory: /tmp/root/1361a727-483a-424a-8936-ee5979fb5a02
17/10/23 00:29:19 INFO SessionState: Created HDFS directory: /tmp/hive/root/1361a727-483a-424a-8936-ee5979fb5a02/_tmp_space.db
17/10/23 00:29:20 INFO HiveContext: default warehouse location is /user/hive/warehouse
17/10/23 00:29:20 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/10/23 00:29:20 INFO ClientWrapper: Inspected Hadoop version: 2.5.1
17/10/23 00:29:33 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.5.1
17/10/23 00:29:38 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/10/23 00:29:39 INFO ObjectStore: ObjectStore, initialize called
17/10/23 00:29:39 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/10/23 00:29:39 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/10/23 00:29:44 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:50 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:29:51 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
17/10/23 00:29:51 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/10/23 00:29:51 INFO ObjectStore: Initialized ObjectStore
17/10/23 00:30:00 INFO HiveMetaStore: Added admin role in metastore
17/10/23 00:30:00 INFO HiveMetaStore: Added public role in metastore
17/10/23 00:30:03 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/10/23 00:30:12 INFO HiveMetaStore: 0: get_all_databases
17/10/23 00:30:12 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases
17/10/23 00:30:13 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/10/23 00:30:13 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/10/23 00:30:13 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/bc93d037-7940-4e83-bc24-02d93dff54bf_resources
17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf
17/10/23 00:30:22 INFO SessionState: Created local directory: /tmp/root/bc93d037-7940-4e83-bc24-02d93dff54bf
17/10/23 00:30:22 INFO SessionState: Created HDFS directory: /tmp/hive/root/bc93d037-7940-4e83-bc24-02d93dff54bf/_tmp_space.db
17/10/23 00:30:41 INFO HiveMetaStore: 0: get_tables: db=TCGA pat=.*
17/10/23 00:30:41 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_tables: db=TCGA pat=.*
17/10/23 00:31:08 INFO SparkContext: Starting job: show at MavenMainHbase.java:46
17/10/23 00:31:14 INFO DAGScheduler: Got job 0 (show at MavenMainHbase.java:46) with 1 output partitions
17/10/23 00:31:14 INFO DAGScheduler: Final stage: ResultStage 0 (show at MavenMainHbase.java:46)
17/10/23 00:31:14 INFO DAGScheduler: Parents of final stage: List()
17/10/23 00:31:14 INFO DAGScheduler: Missing parents: List()
17/10/23 00:31:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46), which has no missing parents
17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1824.0 B, free 1579.1 MB)
17/10/23 00:31:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1175.0 B, free 1579.1 MB)
17/10/23 00:31:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40068 (size: 1175.0 B, free: 1579.1 MB)
17/10/23 00:31:31 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/23 00:31:33 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at show at MavenMainHbase.java:46)
17/10/23 00:31:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/23 00:31:37 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2105 bytes)
17/10/23 00:31:38 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/23 00:31:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 940 bytes result sent to driver
17/10/23 00:31:39 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1815 ms on localhost (1/1)
17/10/23 00:31:39 INFO DAGScheduler: ResultStage 0 (show at MavenMainHbase.java:46) finished in 5.187 s
17/10/23 00:31:39 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/23 00:31:40 INFO DAGScheduler: Job 0 finished: show at MavenMainHbase.java:46, took 32.074415 s
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+
Here, is the sample JAVA code that I used to get the above result SparkConf conf = new SparkConf().setAppName("SparkHive").setMaster("local").setSparkHome("/usr/hdp/2.5.6.0-40/spark/").set("HADOOP_CONF_DIR","/usr/hdp/2.5.6.0-40/hive/conf/").set("spark.driver.extraClassPath","/usr/hdp/2.5.6.0-40/hive/conf");
conf.set("spark.sql.hive.thriftServer.singleSession", "true");
SparkContext sc = new SparkContext(conf);
HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);
hiveContext.setConf("hive.metastore.uris","thrift://sandbox.kylo.io:9083");
hiveContext.setConf("spark." +
"sql.warehouse.dir","/user/hive/warehouse");
DataFrame df= hiveContext.sql("show tables in TCGA");
df.show(); And, here is my pom.xml: <dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-core -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.4.0-HBase-1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.thrift/libfb303 -->
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient-osgi</artifactId>
<version>4.3-beta2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-contrib -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-contrib</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.4.4</version>
</dependency>
</dependencies>
I think it has got to do with the code unable to find hive-site.xml , so i gave all possible classpaths in SparkConf to make it work. but no luck yet. What other configrations I have to set?
... View more
Labels:
09-07-2017
11:54 AM
@Constantin Stanca I am using Zeppelin with Spark Interpreter to access the hive-hbase table. I get the following error while running the query: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.hbase.HBaseStorageHandler
I have added even the required jars in classpath for Spark interpreter but still the issue persists: Could you please let me know what could be the problem?
... View more
09-04-2017
04:46 PM
zeppelin-hive-hbase-handler.png
Hello, I am using Zeppelin to analyze the table that I created using Hive-Hbase handler. However, when I execute, it gives handler.org.apache.hadoop.hive.hbase.HBaseStorageHandler. I tried it with HiveContext as someone pointed out in one of the threads. But, result is the same. What would be the workaround for this? Thanks [Update] I also tried re-updating hive-site.xml with the following property in hive folder, but still the same problem. Just to verify, I also cross checked in beeline and hive cli , the query works perfectly fine in them. Could someone possibly help me with this issue in Zeppelin? <property><name>hive.aux.jars.path</name><value>file:///usr/hdp/2.5.3.0-37/hive/lib/zookeeper-3.4.6.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hive/lib/hive-hbase-handler-1.2.1000.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hive/lib/guava-14.0.1.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-client-1.1.2.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-common-1.1.2.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-protocol-1.1.2.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-server-1.1.2.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-shell-1.1.2.2.5.3.0-37.jar,file:///usr/hdp/2.5.3.0-37/hbase/lib/hbase-thrift-1.1.2.2.5.3.0-37.jar
</value></property>
... View more
Labels:
08-25-2017
01:28 PM
Hi all. I am wiriting a JAVA wrapper to push data on Hbase based on field values from some Hive Table. I call this wrapper iteratively on each field value from hive. I am able to push the data to hbase but I receive the following exception on each 'put' method call : Is it about increasing log flush interval time? What seems to be the issue? org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:208)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1689)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:208)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1449)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1040) Following is the wrapper: try {
Put p = new Put(Bytes.toBytes(rowkey));
p.add(Bytes.toBytes(family+"1"),Bytes.toBytes(qualifier),Bytes.toBytes(value.get(0)));
tablename.put(p);
p.add((family+"2").getBytes(),qualifier.getBytes(),value.get(1).getBytes());
tablename.put(p);
p.add((family+"3").getBytes(),qualifier.getBytes(),value.get(2).getBytes());
tablename.put(p);
}
catch (IOException e)
{e.printStackTrace();}
... View more
Labels:
08-18-2017
06:24 PM
Hello, I am using the following table on which I intend to overwrite : This is the schema of the table I plan to write it from: Following are the insert overwrite queries I tried but leads to error: As you notice, I just want to insert data in my bigger table only in 10 columns (with key being introduced as a UUID) and leave the others fields empty. What could be the workaround to this problem using hive query? thanks
... View more
- Tags:
- Data Processing
- Hive
Labels:
08-16-2017
12:33 PM
It worked. I added the following dependency: <dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient-osgi</artifactId>
<version>4.3-beta2</version>
</dependency>
... View more
08-16-2017
12:10 PM
Hello, I am creating a java wrapper for my thesis work with Hive-Hbase integration. So far, Hbase JAVA api worked well, while I tried to connect Hive JDBC , it issued "Service Unavailable Retry Strategy error. Following are the dependencies in pom.xml: <dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.1.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-core -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.4.0-HBase-1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-hbase-handler -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>1.2.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.thrift/libfb303 -->
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
</dependencies> Following is the part of the simple code where I am just testing the hive jdbc connection: private static String driverName = "org.apache.hive.jdbc.HiveDriver";
try {
Class.forName(driverName);
}
catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection con1= DriverManager.getConnection("jdbc:hive2://localhost:10000/firehose", "hive"
,"hive");
System.out.println("hello");
Statement stmt = con1.createStatement();
stmt.executeQuery("CREATE TABLE IF NOT EXISTS employee (eid int, name String);");
String tableName = "testHiveDriverTable";
stmt.execute("drop table if exists " + tableName);
stmt.execute("create table " + tableName + " (key int, value string)");
// show tables
// String sql = "show tables '" + tableName + "'";
String sql = ("show tables");
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
} While I execute my program, I get the following error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/http/client/ServiceUnavailableRetryStrategy
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at hbasepingtest.MavenMainHbase.connectHive(MavenMainHbase.java:63)
at hbasepingtest.MavenMainHbase.main(MavenMainHbase.java:48)
Caused by: java.lang.ClassNotFoundException: org.apache.http.client.ServiceUnavailableRetryStrategy
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
To cross check, I beeline'd my jdbc url with the database, it logs in fine. Am I missing some more dependency in my pom file?
... View more
Labels:
08-11-2017
03:23 PM
@Josh Elser I did make change as you said and also added nifi in sudoers as root. But, now I am getting JNI Error: Tried even after chown as nifi:nifi to folder where java is stored. But I still get the sam JNI error. Where am I going wrong? Thanks
... View more
08-10-2017
03:35 PM
1 Kudo
Hello, I am using NIFI in HDP sandbox to execute a java jar. This jar contains a simple Hbase which creates a table based on argument supplied from NIFI instance. I used ExecuteStreamComand processor to call this jar with properties as said shown but it throws error as follows: I tried again but by changing the Working Directory to: root/IdeaProjects/mavenhbasetest/target/ I now get the following error : Fyi, I tried testing the jar in shell and it works fine. Could someone provide me help on where am I going wrong? Thanks and regards, Jasim
... View more
Labels:
08-04-2017
11:12 AM
@Jay SenSharma could you please elaborate how to manually install that package? Attached is the yum command on yum.repo.d. Yum what provides on Desktop Yum repo
... View more
08-03-2017
05:07 PM
Hi , Attached are the network setup of my sandbox. network-1.png network-2.png Thank you.
... View more
08-03-2017
05:05 PM
Hi, While trying to deploy Ambari VNC Service in Ambari I got error as follows: stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/VNCSERVER/package/scripts/master.py", line 132, in <module>
Master().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.5/services/VNCSERVER/package/scripts/master.py", line 31, in install
Execute('yum groupinstall -y Desktop >> '+params.log_location)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'yum groupinstall -y Desktop >> /var/log/vnc-stack.log' returned 1. Warning: group Desktop does not exist.
Maybe run: yum groups mark install (see man yum)
Error: No packages in any requested group available to install or update
stdout:
2017-08-03 15:41:53,481 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.3.0-37
2017-08-03 15:41:53,483 - Checking if need to create versioned conf dir /etc/hadoop/2.5.3.0-37/0
2017-08-03 15:41:53,485 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.3.0-37', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2017-08-03 15:41:53,506 - call returned (1, '/etc/hadoop/2.5.3.0-37/0 exist already', '')
2017-08-03 15:41:53,506 - checked_call[('ambari-python-wrap', u'/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.3.0-37', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2017-08-03 15:41:53,525 - checked_call returned (0, '')
2017-08-03 15:41:53,526 - Ensuring that hadoop has the correct symlink structure
2017-08-03 15:41:53,526 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-08-03 15:41:53,527 - Group['livy'] {}
2017-08-03 15:41:53,528 - Group['spark'] {}
2017-08-03 15:41:53,528 - Group['zeppelin'] {}
2017-08-03 15:41:53,529 - Group['hadoop'] {}
2017-08-03 15:41:53,529 - Group['users'] {}
2017-08-03 15:41:53,529 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,530 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,530 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2017-08-03 15:41:53,530 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,531 - User['livy'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,531 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,532 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'users']}
2017-08-03 15:41:53,532 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,533 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,533 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,534 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,534 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,535 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': [u'hadoop']}
2017-08-03 15:41:53,535 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2017-08-03 15:41:53,537 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2017-08-03 15:41:53,549 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2017-08-03 15:41:53,549 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2017-08-03 15:41:53,550 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2017-08-03 15:41:53,551 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2017-08-03 15:41:53,558 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2017-08-03 15:41:53,558 - Group['hdfs'] {}
2017-08-03 15:41:53,558 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': [u'hadoop', u'hdfs']}
2017-08-03 15:41:53,559 - FS Type:
2017-08-03 15:41:53,559 - Directory['/etc/hadoop'] {'mode': 0755}
2017-08-03 15:41:53,572 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2017-08-03 15:41:53,572 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2017-08-03 15:41:53,589 - Initializing 2 repositories
2017-08-03 15:41:53,590 - Repository['HDP-2.5'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.3.0', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None}
2017-08-03 15:41:53,596 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.5]\nname=HDP-2.5\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.3.0\n\npath=/\nenabled=1\ngpgcheck=0'}
2017-08-03 15:41:53,596 - Repository['HDP-UTILS-1.1.0.21'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None}
2017-08-03 15:41:53,599 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.21]\nname=HDP-UTILS-1.1.0.21\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7\n\npath=/\nenabled=1\ngpgcheck=0'}
2017-08-03 15:41:53,599 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-08-03 15:41:53,661 - Skipping installation of existing package unzip
2017-08-03 15:41:53,662 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-08-03 15:41:53,670 - Skipping installation of existing package curl
2017-08-03 15:41:53,670 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-08-03 15:41:53,678 - Skipping installation of existing package hdp-select
2017-08-03 15:41:53,868 - Execute['echo "installing Desktop" >> /var/log/vnc-stack.log'] {}
2017-08-03 15:41:53,872 - Execute['yum groupinstall -y Desktop >> /var/log/vnc-stack.log'] {}
Command failed after 1 tries
What could be the possible solution for group Desktop installation ? Thank you.
... View more
Labels:
08-03-2017
03:20 PM
Hi @Geoffrey , i used the sandbox's ifconfig ip in /etc/hosts as shown in attachment. The result is still the same, unable to access it.
... View more
08-03-2017
02:29 PM
ambari.pngzeppelinui.pnglog-files-of-zeppelin.png Hi all, For my task, I would like to use Zeppelin from my Hortonworks sandbox. I installed Zeppelin service via Ambari successfully but when I try to launch the UI, it does not respond. What could be the issue? I checked the logs, changed the ports in env files of zeppelin, telnet the port 9996 with the hostname mentioned, but still no success. I have attached the snapshots.
... View more
Labels:
06-14-2017
04:03 PM
Hello,
I would like to build a pipeline in NIFI that runs a given shell script file (firehose-get-latest-2.zip) and then execute results according to user's input parameter. For example: firehose_get analyses latest, firehose_get -tasks mutsig gistic analyses latest brca ucec, firehose_get -tasks mut analyses latest prad ? I know of ExecuteProcess processor, however, I am not sure how exactly to use it. Could not find any helpful examples either anywhere. Thank you.
... View more
Labels:
06-02-2017
12:38 PM
Hi, I did check and found both the ports i.e. default Zeppelin port and 9096 are free. I restarted the service and it does not work.
... View more
06-02-2017
12:07 PM
Hello,
I am using Kylo, datalake management solution. I am particularly using pre-configured Kylo instance under HDP sandbox.
In the sandbox, I installed the Zeppelin service through Ambari, however, the webUI is not loading. I have changed the default port number to 9096 through Zeppelin's config information in Ambari, but the result is still the same.
Attached is issue-var-log-zeppelin.png of the log under /var/log/zeppelin directory.
Where possibly am I going wrong?
Thanks and regards,
Jasim
... View more
Labels:
04-23-2017
05:12 PM
Hello all, I am using Apache Nifi for Master's thesis work for integration of genomic data sources. I would like to, for example, ingest this API response into MongoDB. As you notice, the native response data is in flat format, could someone please guide me on how to go by it?
Thank and regards,
Jasim
... View more
Labels: