Member since
03-25-2016
5
Posts
6
Kudos Received
0
Solutions
12-27-2017
02:11 PM
2 Kudos
With hadoop 3.0 release we are excited with it's new storage based on erasure encoding. When it will be avaible in HDP?
... View more
Labels:
10-21-2017
09:55 AM
Same problem here. Is it fixed in HDP 2.6?
... View more
04-18-2016
02:05 PM
1 Kudo
When can I expect HDP realse with Storm 1.0.0 included?
... View more
Labels:
03-29-2016
02:12 PM
1 Kudo
I'm using HDP v2.3.4 with included storm version 0.10.0. My topology is reading data from kafka and write it into hive using Kafka Spout and Hive Bolt respectively. I'm runing simple configuration with one worker, one spout and one bolt. After a short period of time topology completly stops, there's nothing happens at all. Hive bolt configuration: HiveOptions hiveOptions = new HiveOptions(sourceMetastoreUrl, databaseName, hiveTableName, mapper)
.withTxnsPerBatch(txnsPerBatch) //15000
.withBatchSize(batchSize) //100
.withTickTupleInterval(tickTupleInterval) //15
.withHeartBeatInterval(heartBeatInterval) //60
.withCallTimeout(callTimeout); //0
HiveBolt hiveBolt = new HiveBolt(hiveOptions); Thats what i see in logs: 2016-03-29 16:27:09.331 o.a.s.h.c.HiveWriter [DEBUG] Committing Txn id 38945255 to {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160321] }
2016-03-29 16:27:09.347 o.a.s.h.c.HiveWriter [DEBUG] Switching to next Txn for {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160321] }
2016-03-29 16:27:09.354 o.a.s.h.c.HiveWriter [DEBUG] Committing Txn id 38945355 to {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160324] }
2016-03-29 16:27:09.355 o.a.s.h.b.HiveBolt [DEBUG] Start sending heartbeat on all writers
2016-03-29 16:27:09.370 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38944844...38944943] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160326] }
2016-03-29 16:27:09.370 o.a.s.h.c.HiveWriter [DEBUG] Switching to next Txn for {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160324] }
2016-03-29 16:27:10.578 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38944944...38945043] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160325] }
2016-03-29 16:27:10.670 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38945044...38945143] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160327] }
2016-03-29 16:27:10.765 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38945144...38945243] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160322] }
2016-03-29 16:27:10.858 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38945244...38945343] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160321] }
2016-03-29 16:37:10.625 o.a.s.h.c.HiveWriter [INFO] Sending heartbeat on batch TxnIds=[38945344...38945443] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160324] }
2016-03-29 16:37:10.626 b.s.d.executor [ERROR]
org.apache.storm.hive.common.HiveWriter$TxnFailure: Failed switching to next Txn in TxnBatch TxnIds=[38945344...38945443] on endPoint = {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160324] }
at org.apache.storm.hive.common.HiveWriter.flush(HiveWriter.java:138) ~[stormjar.jar:?]
at org.apache.storm.hive.bolt.HiveBolt.flushAllWriters(HiveBolt.java:229) ~[stormjar.jar:?]
at org.apache.storm.hive.bolt.HiveBolt.execute(HiveBolt.java:130) [stormjar.jar:?]
at backtype.storm.daemon.executor$fn__3697$tuple_action_fn__3699.invoke(executor.clj:670) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.daemon.executor$mk_task_receiver$fn__3620.invoke(executor.clj:426) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.disruptor$clojure_handler$reify__3196.onEvent(disruptor.clj:58) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.daemon.executor$fn__3697$fn__3710$fn__3761.invoke(executor.clj:808) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at backtype.storm.util$async_loop$fn__544.invoke(util.clj:475) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable to acquire lock on {metaStoreUri='thrift://sorm-master02.msk.mts.ru:9083', database='default', table='cdr1', partitionVals=[20160324] }
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578) ~[stormjar.jar:?]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) ~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.nextTxn(HiveWriter.java:336) ~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.flush(HiveWriter.java:133) ~[stormjar.jar:?]
... 12 more
Caused by: org.apache.hadoop.hive.metastore.api.TxnAbortedException: Transaction txnid:38945356 already aborted
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$lock_result$lock_resultStandardScheme.read(ThriftHiveMetastore.java) ~[stormjar.jar:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$lock_result$lock_resultStandardScheme.read(ThriftHiveMetastore.java) ~[stormjar.jar:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$lock_result.read(ThriftHiveMetastore.java) ~[stormjar.jar:?]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) ~[stormjar.jar:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:3906) ~[stormjar.jar:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:3893) ~[stormjar.jar:?]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1869) ~[stormjar.jar:?]
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152) ~[stormjar.jar:?]
at com.sun.proxy.$Proxy23.lock(Unknown Source) ~[?:?]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573) ~[stormjar.jar:?]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) ~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.nextTxn(HiveWriter.java:336) ~[stormjar.jar:?]
at org.apache.storm.hive.common.HiveWriter.flush(HiveWriter.java:133) ~[stormjar.jar:?]
... 12 more
Topology was doing nothing for 10 minutes. It seems all hangs waiting hive responce, but i don't see the reason why. Here's few test, that may help understand whats going on: When I run topology with callTimeout set to 10, it starts failing with CallTimeoutException. When I turn off heartbeat (heartBeatInterval = -1) all works fine (at least it doesnt fail in an hour, didnt check for a longer period) The problem appeared after upgrade from HDP 2.3.0 to 2.3.4. New version included fixes STORM-1030 for hive bolt which, probably causing this affect on my topology.
... View more
Labels:
03-25-2016
07:01 PM
2 Kudos
I'm using Hortonworks Data Platform version 2.3.4 with included Storm version 0.10. After deploying topology on cluster it takes too much time to distribute code over nodes. Cluster have 10Gb/s network and downloading jar shouldnt take too long. Theres logs: Numbus: 2016-02-25 17:10:13.040 b.s.d.nimbus [INFO] Uploading file from client to /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar
2016-02-25 17:10:14.915 b.s.d.nimbus [INFO] Finished uploading file from client: /opt/hadoop/storm/nimbus/inbox/stormjar-7352b097-8829-4268-a81f-29d820c0f311.jar Supervisor: 2016-02-25 17:10:15.198 b.s.d.supervisor [INFO] Downloading code for storm id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414
2016-02-25 17:10:15.199 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:10:15.205 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:10:17.405 b.s.c.LocalFileSystemCodeDistributor [INFO] Attempting to download meta file /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormjar.jar from remote sorm-master02.msk.mts.ru:6627
2016-02-25 17:10:17.406 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:13:29.002 b.s.c.LocalFileSystemCodeDistributor [INFO] Attempting to download meta file /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormconf.ser from remote sorm-master02.msk.mts.ru:6627
2016-02-25 17:13:29.003 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:13:31.148 b.s.c.LocalFileSystemCodeDistributor [INFO] Attempting to download meta file /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414/stormcode.ser from remote sorm-master02.msk.mts.ru:6627
2016-02-25 17:13:31.148 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2016-02-25 17:13:34.124 b.s.d.supervisor [INFO] Finished downloading code for storm id CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 from /opt/hadoop/storm/nimbus/stormdist/CdrPerformanceTestTopology-phoenix-bolt-4-2-1456409414 Nimbus download jar in a second, why it takes 3min to supervisor? P.S. With HDP version 2.3.0 i have no problems with that.
... View more
Labels: